Job Submission Strategies

The simplest use case of the cluster is submitting a single job and using the scheduler/queue to launch the job without any user intervention. This is not much different than using your workstation to run a program, thus the following job submission strategies can be used. Not all of them are supported by each program available and in the case of multi-threaded or parallel jobs, the software must be developed specifically for that situation.

Job Dependencies

This feature allows you to specify that a job must wait for another job to complete first before running. This is useful when you need to run multiple jobs in succession and each job depends on the output of a previous job. Use the following example when submitting a job:
# qsub -hold_jid <<jobID/jobName>> jobScript

You can use either a job id number or a job name when specifing the job dependency. In the case of using a job name, you can have several jobs with the same name and then the job submitted with hold_jid jobName will wait for all jobs with that name to complete first.

Job Array

The job array is a feature of the grid engine that allows one to programmatically submit many identical jobs from a single job script. The primary example of this would be a user who has a simulation they want to run, but run many of them with variations on the starting parameters. Without the job array, one would have to create an individual job script for each run.

The following is a very basic example. More information can be found here.

Take this simple job script:

array code

If you had 1000 input files, then they must be numbered (e.g. input.0001 to input.1000) and the job script with the job array feature would look like this:

array code

Here the grid engine creates an environment variable $SGE_TASK_ID for each task and the script will simply replace that variable with its value before executing. This would then schedule 1000 tasks under one particular job id. Using qdel on that job id would kill all tasks, though using qdel jobid.taskid would only kill that particular task.

Multi-Threaded Programs

Some programs come with the feature to specify the number of threads to use. To use the cluster fairly and not overload a node, here is an example job script:


Note: No more then 8 slots may be requested, but DO NOT submit jobs with a smaller slot count than what the program will use.

Job Limitations

Currently there are two limitations placed on the grid engine:no user can have more then 50 running jobs and the maximum number of slots per job is 8. However, there is a way around the last limitation. The number of slots per job is limited to protect parallel jobs from incurring the large overhead that network communication has when using MPI. In order to accommodate all users, we provide a way around this limitation with the warning that you are on your own.

Example job script: