Skip to Search Skip to Navigation Skip to Content

Special Queue: Infiniband Nodes

The Infiniband nodes are not part of the general queue and require special steps to access. These nodes are only for MPI workloads; any other jobs found running on these nodes will be terminated.

Infiniband is a special type of networking fabric that has very low latency compared to standard Ethernet based networks. Its enables the use of larger scale MPI jobs that can spread over several nodes. The maximum number of slots a user can request is 60 and there are three parallel environments to choose from.

Parallel Environments

This environment is the standard parallel environment. It restricts jobs to only be as large as the size of node in that queue. In the ib.q queue, that would be 4 slots. OpenMPI would use shared memory for interprocess communication, not infiniband.

This environment will allow you to specify up to the maximum slots/jobs per user (60). It distributes the slots for the job by first filling up a node and then moving to the next available node. OpenMPI would use both Infiniband and shared memory for interprocess communication.

This environment will allow up to 60 slots. It distributes the slots in a round robin fashion, placing a single process per node until it has gone all the way around and starts again. OpenMPI would use Infiniband, and if more then one process is started on a node, shared memory.

Job Script

Use the following to enable a job script to run on the Infiniband queue:
#$ -N ibJob
#$ -pe orte3 4-16
#$ -q ib.q
#$ -l ib_only
mpirun -np $Nslots /path/to/executable inputFile

It should be noted that bigger is not always better. Requesting more slots for your job run will not always equate to faster runtimes. It depends on the algorithm in use and how the program is structured. It is best to check the documentation and test to get an idea of how well your workload will perform at a higher number of slots.

Also note that some software packages have a separate version built for Infiniband. Please check the coresponding software page for details.