If you need better control over your mpi program execution than is given by
the submission commands in the user guide, then you may wish to create your
own pbs scripts and submit jobs directly to PBS.
This is best explained with a tutorial. Please follow the steps below
to run an example program, then modify the steps as needed to reflect
the needs of your own job.
To use this example as the basis for submission of a different program, you
must do the following:
- Create a directory to store the tutorial files: "mkdir tutorial; cd tutorial".
- Download this example program and
example submission script and place them in the directory. This is a trivial MPI program that computes the value of pi.
- Compile the example program: "mpicc cpi.c -o cpi".
- Submit the job to the batch scheduler: "qsub cpi.pbs". It will display the job number and name for your submission.
- When the job completes, it will create a file called cpi.pbs.o* containing
the standard output from the program, and a file called cpi.pbs.e* containing
the standard error messages from the program.
- If the job does not complete fairly quickly, you can check its status using the qstat command: "qstat". This shows all jobs that are currently in the queue, including those that must wait until more resources are available. Currently running jobs will have an "R" status field, queued jobs will have a "Q" status field.
- If you change your mind and decide not to run the submitted job, it can be
removed at any time with the "qdel" command, which takes as its argument the job name as shown in the qstat command.
- Rename cpi.pbs script to reflect the name of the program that you wish to run.
- Edit the following lines of the script (most of these are commented more in depth in the file itself):
- "#PBS -l nodes=4:ppn=2" should be modified to reflect the number of nodes that you wish to use, and the number of processors per node.
- "#PBS -l walltime=5:00" should be modified to reflect the estimated maximum run time of your program. Shorter jobs are more likely to be scheduled, but your job will be terminated if it does not complete in the allotted time.
- "mpirun -v -machinefile $PBS_NODEFILE -np $NPROCS cpi" should be modified to run the new program instead of "cpi".
Advanced batch scheduler usage
PBS includes many advanced features beyond those used in this tutorial.
Check the qsub man page for information on how to:
You can also run arbitrary shell commands within the batch script, in addition to or in place of the mpi program. For example, some people find it useful to print the date and a list of nodes that are being used to help in catalogging output (use the "date" and "cat $PBS_NODEFILE" commands).
- send email when the job starts or stops
- send stdout and stderr to alternative locations
There are also many more options to qstat (see man page). For example, "qstat -a" or "qstat -f" provide different views of the queue, with more detail than the default listing.
Using alternative versions of MPI (including MPICH2)
The default MPI installation is sufficient for most users. However, if you
have special needs (such as support for MPI-IO, or features only included
in MPICH2), then you may need to select from one of the other MPI packages available on adenine.
To see a list of available MPI packages, run this command: "use -l". There should be multiple versions of both mpich and mpich2 listed, among other software packages. To select one, simply run "use [package name]" (for example: "use mpich2"). This will update your environment (including the path and man page locations) to point to this version of MPI. If you intend to always use this same version of MPI, then you may want to add the appropriate use command(s) to your .cshrc or .bashrc file.
Important notes for MPICH2 users!
MPICH2 uses a significantly different start up proceedure than
MPICH1. If you intend to use MPICH2, then you must take the following additional steps:
Aside from those three steps, MPICH2 usage should not differ from the default MPI usage.
- Use the mpich2 example pbs script as a starting point for submitting jobs, rather than the one from the earlier tutorial. This script will only work with programs compiled using MPICH2.
- Create a file in your home directory called ".mpd.conf" like this: "echo secretword=foo > ~/.mpd.conf".
- Change the permissions of that file as follows: "chmod 600 ~/.mpd.conf".