You are here: Foswiki>T2K Web>JobSubmission (07 Apr 2015, AndrewCudd)Edit Attach
-- AndrewCudd - 19 Mar 2015

HPCC Job Sumission

To run a job on the Computing Nodes a job script must be submitted to the job queue; once in the queue it will eventually be scheduled to run based on priority and requested / available resources. The job script that is submitted to the queue is a script that contains everything that is to be run along with some special comments for the scheduler that contains information about resource requirements and other options. Much of the information below is paraphrased from the HPCC wiki page here:

The first thing that goes into a job script are the comments for the scheduler. These comments tell the scheduler how much computing time you need, how much memory you need, etc. These must go at the beginning of the script and some of them are required for any job script that gets submitted to the queue. The formatting of a job script is very simple; write a script that will execute the programs and other executables you may need, and then add the scheduler comments at the beginning. The most common and important scheduler comments are summerized below, check the HPCC wiki page for more commands and information.

The format of each scheduler (PBS) comment is as follows:
#PBS -flag argument
Flag Description Example
-e / -o Location of the STDOUT and STDERR files. If not supplied, default location is (maybe) directory from which you submitted the job. #PBS -e /path/to/ouputfile
-j Combine the STDOUT and STDERR into a single file. Use oe to combine into STDOUT file, use eo to combine into STDERR file. #PBS -j oe

Defines the resources requested for the job.

  • nodes- number of nodes to be reserved for exclusive use by the job.
  • ppn- number of processors per node requested
  • walltime- total requested run time in the form of HH:MM:SS or DD:HH:MM:SS
  • mem- maximum amount of memory requested in the form of #gb

#PBS -l nodes=1:ppn=4

#PBS -l walltime=12:00:00

#PBS -l mem=4gb

-M Defines the email account to send updates about job state. #PBS -M

Sets the conditions for an email to be sent. One or more may be set in any combination.

  • a- sends mail when job is aborted
  • b- sends mail when job begins execution
  • e- sends mail when job ends
#PBS -m be
-N Sets the name of job which is used for the STDOUT, STDERR files and is the name that appears in the queue. #PBS -N SuperCoolName
The only required scheduler comments for a job are the resources options: walltime, nodes, and memory. Otherwise the rest of the scheduler comments are optional, but still useful. These comments as stated before must go at the beginning of your job script, and specifically they must be above the first non-commented line of code. So you may have normal comments in the beginning of your script with the scheduler comments.

The other peculiar thing with the job script is that when the job is executed on the queue, it has no knowledge of your environment variables. It starts the job with a completely clean state, and thus any variables you may have set while testing, debugging, etc. need to be set explicitly; this includes your .bashrc and anything else that is automatic upon log in. Also, you must change directories or give full path names for commands as the scheduler does not start your job in a specific directory. The computing nodes have access to all your directories and files, you just have to either change directories explicitly or give full path names so the computing nodes know where to find everything. Basically treat the job script as if you had logged in without anything being set for you. For example, here is a job script that I ran:
#!/bin/bash -login

#Processing Time
#PBS -l walltime=18:00:00
#Nodes and Processors per Node
#PBS -l nodes=1:ppn=8
#Memory Needed
#PBS -l mem=8gb
#Email job status
#PBS -m abe
#Output and Error same file
#PBS -j oe
#Name of job
#PBS -N createWeightsNEUT

#Set up environment variables, run setup scripts, etc.
source /mnt/home/cuddandr/.bashrc
source /mnt/research/T2K/nd280Software/
source /mnt/research/T2K/nd280Software/v10r11n21/xsTool/v0r7/cmt/
#Change directories to my work directory
cd /mnt/research/T2K/nd280Software/v10r11n21/xsTool/v0r7/work
#Run the program
./ fgd2CCBANFF_R3c_NEUT.root postfit_banff_v7_all_params.old.root ./fgd2CCBANFF_NEUT oa.parameters.neut.txt
#Output detailed job stats
qstat -f ${PBS_JOBID} > jobstats.dat

Finally, to submit the job run the qsub command with the job script you wish to submit to the scheduler like so:
user@host $ qsub myjobscript.qsub

Note that the extension of the file does not matter. The .qsub is just a convention to differentiate a job script from other scripts.

Once the job is submitted to the queue, it is assigned a job ID number which should be displayed once the qsub command completes. The job ID is a string of numbers that uniquely identifies a given job and is used as an input to many commands. The output from qsub or other commands will usually look something like: 23335211.mgr-04, and the string of numbers preceding the mgr-04 is the job ID for that job.

Queue Commands

There exists a number of commands for interacting with the job queue itself, such as commands to check status of a job, the number of jobs in the queue, etc. These commands are summerized below.
Name Description
qsub <filename> Submit a job to the queue
qstat Shows the entire queue of jobs submitted with job number, name, and user
qstat -u <username> Shows status of all jobs of a single user
qstat -f <jobID> Shows detailed information about a single job
showq -u <username> Similar to qstat, but more verbose information
qdel <jobID> Removes single job from queue (or cancels an active job)
qdel $(qselect -u <user>) Removes all jobs belonging to user
Topic revision: r3 - 07 Apr 2015, AndrewCudd
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback