High Performance Computing
High End Visualization
About CCR Contact Us Events Facilities Faculty Project Highlights History Job Opportunities News Partners Staff
Bioinformatics Consulting Services Grid Computing Visualization
Accounts Application Software Consulting Getting Started Hardware Resources Frequently Asked Questions Machine Status MyStats Overview Request Help Storage Resources Training/Courses Web Portals
Access Grid Training Outreach
Events Logos Media News Photo Album Videos
Contact Information Directions
  Dashboard > CCR Web > Getting Started > Batch computing
Log In   View a printable version of the current page.
Batch computing

In general, jobs are submitted to CCR resources using a batch queuing system. This provides the best means to insure an equitable distribution of CCR's computing resources among all users. Each submitted job is assigned a queue priority, which essentially determines when the job will run. There are several factors which determine the priority that is assigned to a user's job. Details of the priority scheme are available at CCR Job Priority. Here we describe the basics of how to submit a job using CCR's batch queuing system.

Running Jobs

CCR uses a batch scheduler to run jobs on the U2 cluster. The user submits a job to the Scheduler specifying the required resources. These resources include number of processors, memory, as well as the amount of time needed by the job.

PBS Execution Model Schematic

PDF Version

Testing on the Front-end (login machine).

  • The front-end machine (bono) can be used for tests that run for a few minutes and do not use an extensive amount of memory.

Batch System

The batch scheduler used in CCR is the PBS/Torque scheduler.

  • Torque is the Scalable Open Resource Manager based on OpenPBS.
    • Torque is also known as Scalable PBS (SPBS).
    • More extensive Documentation can be found at NASA Ames, the original developers of PBS.
  • The PBS/Torque scheduler is extended with the Maui Scheduler.
    • Acts as a plug-in scheduler for PBS, replacing the default PBS scheduling component.
    • Supports a large array of fairness policies, dynamic priorities, extensive reservations, and fairshare.
    • Performs "backfilling," allowing small/short jobs to run during intervals while resources are being drained to run large jobs.
    • Maui User's Manual.

Examples of Using the Batch Scheduler:

  • Running an interactive job:
    • The job will be submitted to the scheduler and wait for nodes. Once nodes are assigned, the user is logged into the first node in the list.
    • In this example, one node for 30 minutes is requested. The debug queue is specified.
[bono:~]$ qsub -I -q debug -lnodes=1:ppn=2 -lwalltime=00:30:00
qsub: waiting for job 624107.bono.ccr.buffalo.edu to start
qsub: job 624107.bono.ccr.buffalo.edu ready
#############################PBS Prologue##############################
PBS prologue script run on host c15n29 at Mon Oct 29 13:42:51 EDT 2007
PBSTMPDIR is /scratch/624107.bono.ccr.buffalo.edu
[c15n29:~]$
  • Running a job with a PBS script:
    • The job is submitted with a PBS script that specified required resources and command to be executed.
    • In this example, pbsCPI is the PBS script.
[bono:~]$ qsub pbsCPI
624117.bono.ccr.buffalo.edu
[bono:~]$

U2 queues

Sample PBS Scripts

Job Limits and Priorities

There is no limit to the number of jobs that can be submitted by an individual user. The U2 cluster has over 1000 compute nodes.

The priority of jobs is determined by the CCR Job Priority Policy.

Node Properties

Available node properties for the U2 cluster:

  • GM, (e.g. "-lnodes=2:GM:ppn=2") for Myrinet connected nodes
  • MEM4GB, (e.g. "-lnodes=2:MEM4GB:ppn=2") for nodes with 4GB of RAM
  • MEM8GB, (e.g. "-lnodes=2:MEM8GB:ppn=2") for nodes with 8GB of RAM
    Note that the 4GB and 8GB nodes all automatically have the GM property, so it is not necessary to combine them.
  • Most of the compute nodes have cpu clock speed of 3.2 GHz, however the newest nodes are 3.0 GHz. The node properties for the cpu clock speed are:
    • CPU32, (e.g. "-lnodes=2:CPU32:ppn=2")
    • CPU30, (e.g. "-lnodes=2:CPU30:ppn=2")

Common PBS Commands

  • qsub [flags] [pbs_script]
    • Submits a PBS batch script called pbs_script
    • See examples above.
  • qstat [flags]
    • Check status of PBS queues.
[bono:~]$ qstat -an -u user1
bono.ccr.buffalo.edu:
                                                             Req'd  Req'd    Elap
Job ID               Username Queue  Jobname  SessID NDS TSK Memory Time  S  Time
-------------------- -------- -----  -------  ------ --- --- ------ ----- -  ----
624165.bono.ccr.buff user1    ccr      STDIN   --      1  --    --  04:00 Q  ----
624169.bono.ccr.buff user1    debug    test    --      2  --    --  00:15 Q  ----
[bono:~]$
  • qdel [flags] jobid
    • Delete a job.
    • pbsnodes* [hostname]
    • Displays that properties of the compute nodes in the cluster.
[bono:~]$ pbsnodes c15n01
c15n01
    state = free
    np = 2
    properties = MEM2GB,GM,CPU32
    ntype = cluster
    status = opsys=linux,uname=Linux c15n01.ccr.buffalo.edu    
2.6.9-55.EL.perfctr.2.6.27smp #1 SMP Thu Jun 28 11:15:45 EDT 2007 
x86_64,sessions=2238,nsessions=1,nusers=1,idletime=1050869,totmem=6152664kb,
availmem=5852668kb,physmem=2056100kb,ncpus=2,loadave=0.00,
netload=278309857700,state=free,jobs=? 0,rectime=1193685254
[bono:~]$

Maui Commands

  • showq
    • Display a listing of queued jobs.
    • Also displays number of active jobs and compute nodes:
[bono:~]$ showq | more
ACTIVE JOBS--------------------
JOBNAME            USERNAME      STATE  PROC   REMAINING            STARTTIME
...
  389 Active Jobs    1948 of 2070 Processors Active (94.11%)
                      974 of 1035 Nodes Active      (94.11%)
IDLE JOBS----------------------
JOBNAME            USERNAME      STATE  PROC     WCLIMIT            QUEUETIME
...
  • showbf [flags]
    • See exactly what resources are available for immediate use (and configure your job such that it can start immediately).
  • showstart jobid
    • Provides an estimate of job start time.
[bono:~]$ showstart 624187
job 624187 requires 32 procs for 4:00:00
Earliest start in          2:11:33 on Mon Oct 29 17:00:00
Earliest completion in     6:11:33 on Mon Oct 29 21:00:00
Best Partition: DEFAULT
[bono:~]$
  • checkjob jobid
    • View detailed status of each job.

NB Maui augments the usual PBS commands with the above; the standard PBS commands (with the minor exception of the -s flag to qstat will continue to function normally).

Monitoring a Running Job

  • jobvis jobid
    • Graphical display of resource utilization.
    • Must be run from the front-end machine (bono).
    • Requires X-Display.

Using Scratch Space in a Job

  • The local scratch space, that is the scratch disk on a compute node, is available while a job is running. A directory is created in /scratch on each compute node in the job. The variable $PBSTMPDIR is set to /scratch/$PBS_JOBID.
    • Users can copy data to and from this scratch space in the PBS script.
    • All files are removed from the $PBSTMPDIR directory at the end of the job.
  • The global scratch spaces, /san/scratch and /ibrix/scratch may also be used during a job.

Tutorial on Running Batch Jobs on U2

Center for Computational Research - University at Buffalo - State University of New York