28 May 2012

167. ECCE/Nwchem on An Australian University computational cluster using qsub with g09/nwchem

I've just learned the First Rule of Remote Computing:
always start by checking the number of concurrent processes you're allowed on the head node, or you can lock yourself out faster that you can say "IT support'.

ulimit -u
If it's anywhere under 1000, then you need to be careful.
Default ulimit on ROCKS: 73728
Default ulimit on Debian/Wheezy:  63431
Ulimit on the Oz uni cluster: 32

ECCE launches FIVE processes per job.
Each pipe you add to a command launches another proc. Logging in launches a proc -- if you've reached your quota, you can't log in until a processes finishes.

cat test.text|sed 's/\,/\t/g'|gawk '{print $2,$3,$4}' 
yields three processes -- ten percent of my entire quota.

Running something on a cluster where you have limited access is very different from a cluster you're managing yourself. Apart from knowing the physical layout, you normally have sudo powers on a local cluster.

On potential issue is excessive disk usage -- both in terms of storage space and in terms of raw I/O (writing to an nfs mounted disk is not efficient anyway)
So in order to cut down on that:
1. Define a scratch directory using e.g. (use the correct path)
scratch_dir /scratch
The point being that /scratch is a local directory on the execution node

2. Make sure that you specify
or even
to do as little disk caching as possible.

I accidentally ended up storing 52 GB of aoints files from a single job. It may have been what locked me out of the submit node for three hours...

A good way to check your disk-usage is
ls -d * |xargs du -hs

Now, continue reading:

Setting everything up the first time:
First figure out where the mpi libs are:

#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=00:14:00
#$ -l h_vmem=4G
#$ -j y
locate libmpi.so
Assuming that the location is /usr/lib/openmpi/1.3.2-gcc/lib/, put 
export LD_LIBRARY_PATH=/usr/lib/openmpi/1.3.2-gcc/lib/
in your ~/.bashrc

Next, look at ls /opt/sw/nwchem-6.1/data -- if there's a default.nwchemrc file, then
ln -s /opt/sw/nwchem-6.1/data/default.nwchemrc ~/.nwchemrc

If not, create ~/.nwchemrc with the locations of the different basis sets, amber files and plane-wave sets listed as follows:

nwchem_basis_library /opt/sw/nwchem-6.1/data/libraries/
nwchem_nwpw_library /opt/sw/nwchem-6.1/data/libraryps/
ffield amber
amber_1 /opt/sw/nwchem-6.1/data/amber_s/
amber_2 /opt/sw/nwchem-6.1/data/amber_q/
amber_3 /opt/sw/nwchem-6.1/data/amber_x/
amber_4 /opt/sw/nwchem-6.1/data/amber_u/
spce /opt/sw/nwchem-6.1/data/solvents/spce.rst
charmm_s /opt/sw/nwchem-6.1/data/charmm_s/
charmm_x /opt/sw/nwchem-6.1/data/charmm_x/

Using nwchem:
A simple qsub file would be:

#$ -S /bin/sh
#$ -cwd
#$ -l h_rt=00:14:00
#$ -l h_vmem=4G
#$ -j y
#$ -pe orte 4
module load nwchem/6.1
time mpirun -n 4 nwchem  test.nw > nwchem.out

with test.nw being the actual nwchem input file which is present in your cwd (current working directory).

Using nwchem with ecce:
This is the proper way of using nwchem. If you haven't already, look here: http://verahill.blogspot.com.au/2012/05/setting-up-ecce-with-qsub-on-australian.html

Then edit your  ecce-6.3/apps/siteconfig/CONFIG.msgln4  file:

NWChem: /opt/sw/nwchem-6.1/bin/nwchem
Gaussian-03: /usr/local/bin/G09
perlPath: /usr/bin/perl
qmgrPath: /usr/bin/qsub

#$ -S /bin/csh
#$ -cwd
#$ -l h_rt=$wallTime
#$ -l h_vmem=4G
#$ -j y

NWChemFilesToDelete{ core *.aoints.* }

    LD_LIBRARY_PATH /usr/lib/openmpi/1.3.2-gcc/lib/

NWChemCommand {
#$ -pe mpi_smp4  4
module load nwchem/6.1

mpirun -n $totalprocs $nwchem $infile > $outfile

Gaussian-03Command {
#$ -pe g03_smp4 4
module load gaussian/g09

time G09< $infile > $outfile }

Gaussian-03FilesToDelete{ core *.rwf }

find /scratch/* -name "*" -user $USER |xargs -I {} rm {} -rf

And you should be good to go. IMPORTANT: don't copy the settings blindly -- what works at your uni might be different from what works at my uni. But use the above as an inspiration and validation of your thought process. The most important thing to look out for in terms of performance is probably your -pe switch.

Since I'm having problems with the low ulimit, I wrote a small bash script which I've set to run every ten minutes as a cronjob. Of course, if you've used up your 32 procs you can't run the script...also, instead of piping stuff right and left (each pipe creates another fork/proc) I've written it so it dumps stuff to disk. That way you have a list over procs in case you need to kill something manually:

 The script: ~/clean_ps.sh
ps ux>~/.job.list
ps ux|gawk 'END {print NR}'

cat ~/.job.list|grep "\-sh \-i">~/.job2.list
cat ~/.job2.list|gawk '{print$2}'>~/.job3.list
cat ~/.job3.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "echo">~/.job4.list
cat ~/.job4.list|gawk '{print$2}'>~/.job5.list
cat ~/.job5.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "notty">~/.job6.list
cat ~/.job6.list|gawk '{print$2}'>~/.job7.list
cat ~/.job7.list|xargs -I {} kill -15 {}

cat ~/.job.list|grep "perl">~/.job8.list
cat ~/.job8.list|gawk '{print$2}'>~/.job9.list
cat ~/.job9.list|xargs -I {} kill -15 {}

qstat -u ${USER} 
ps ux |gawk 'END {print NR}' 
echo "***" 

and the cron job is set up using
crontab -e
 */10 * * * * sh ~/clean_ps.sh>> ~/.cronout

Obviously this kills any job monitoring from the point of view of ecce. However, it keeps you from being locked out. You can manually check the job status using qstat -u ${USER}, then reconnect when a job is ready. Not that convenient, but liveable.


  1. Thank you very much for your contributions on your blog, I have a question regarding the use of the qsub command in rocks clusters using NWChem.
    as I can use it?
    I have already compiled the parallel NWChem 6.1

    greetings from mexico

    1. Jorge, I'm not completely sure that I understand your question. ROCKS does come with SGE (which includes qsub) installed, and it /should/ work out of the box. So yes, you should be able to use qsub on ROCKS -- we certainly do.
      You might need to tinker with your qsub script though -- afaik ROCKS doesn't come with module installed, and so you should account for that in your script.
      Have a look at this post for ECCE + ROCKS with the qsub parameters we use: http://verahill.blogspot.com.au/2012/06/ecce-and-rocks-cluster-step-by-step.html

    2. Thanks for responding

      I am new to the subject, I'm trying to compile NWChem on rocks cluster, with the help of your blog the NWChem parallel compile, but I can only work on it in the frontend, my earlier question was in relation to the compilation of NWChem on the nodes, and sending jobs to nodes using SGE

      Do I need to install ecce to run the NWChem on the nodes?

    3. You don't need ECCE at any point for anything when it comes to nwchem -- it's purely a GUI to make life easier for you.

      I presume that you have compiled nwchem in an exported directory and that everything nwchem is linked to (e.g. mpi, blas) is also in an exported directory?

      What happens if you log on to a node and try to run nwchem manually? Any informative errors? Again, ecce and SGE are not needed for basic operation -- they are just tools that make using a cluster easier.

      To submit with ECCE you don't have to install ECCE on the cluster either -- you can run it on a different machine, on your personal laptop or wherever you want. ECCE generates an SGE script and a server script in perl. The perl script monitors the job and communicates back to whatever computer is running ECCE and lets ECCE know about the status of the job. Once the SGE job finishes, the perl server detects it, copies everything back to the ECCE server and quits.

      So first try to get nwchem running directly on the nodes, then work on getting SGE running, and first after that look at using ECCE.

      Finally, you will want to consider using Nwchem 6.3 or higher --there are some real improvements, in particular in relation to using COSMO:

    4. Thank you very much for your recommendations

      I made ​​the job easier

      I'll be watching your blog :D