Compute Cluster
Last modified: 2023-03-21
This is a short introduction on how to use our compute cluster. If you have any suggestions on how to improve this or if in your opinion something is missing, please let us know and we will try to incorporate this such that future students (and colleagues) can profit from this.Introduction
The general idea of a computational grid, or compute cluster, is to have a unique interface to several possibly heterogeneous machines. From the software point of view, the system consists of one large machine with multiple cores/CPUs providing all the same technical infrastructure with respect to environmental settings, e.g., paths to libraries.
The Purpose of a Batch-Queuing System
The purpose of a batch-queuing system like Sun Grid Engine (SGE) is to provide a unique user interface to be able to efficiently handle a crowd of (heterogenous) hardware. On the one side this implies that the user needs only a few commands to perform all tasks. On the other side it is necessary that some restrictions are accepted. These restrictions are, however, almost minimal and acceptable.
IMPORTANT
Things you should always remember:
- Do not use more memory than you asked for. Using more memory as you asked for leads to swapping, which will affect the runtime of your jobs and the jobs of other users. See qsub options (mem_free, s_vmem, h_vmem).
- Do not write to and read from /home1 extensively. If you do, that will lead to high i/o, which will affect the runtime of your jobs and the jobs of other users. See $TMPDIR for a solution.
- If your job uses multiple threads you have to request that when submitting the job. See Multithreaded Jobs.
Clusternode Hardware
Nodes | Host- names |
Host- group |
Cores/ Node |
SMT | Slots/ Node |
RAM/ Node |
$TMPDIR/ Node |
Description/ Node |
---|---|---|---|---|---|---|---|---|
14 | bs01-bs14 | bc1 | 8 | OFF | 8 | 24GB | 38GB | 2× Intel Xeon E5540, 2.53 GHz Quad Core |
1 | bs15 | bc2 | 12 | OFF | 12 | 72GB | 105GB | 2× Intel Xeon E5649, 2.53 GHz 6-core |
2 | bs16-bs17 | bc2 | 12 | OFF | 12 | 60GB | 105GB | 2× Intel Xeon E5649, 2.53 GHz 6-core |
2 | bs31-bs32 | bc3 | 12 | OFF | 12 | 48GB | 80GB | 2× Intel Xeon E5-2630 v2, 2.60GHz 6-core |
16 | b401-b416 | bc4 | 20 | OFF | 20 | 160GB | 188GB | 2× Intel Xeon E5-2640 v4, 2.40GHz 10-core |
3 | b501-b503 | bc5 | 48 | ON | 96 | 1024GB | 1TB | 2× AMD EPYC 7402, 2.80GHz 24-core |
Clusternode Overview
To get an overview of the clusternodes and their load you can use the command:
qhost
Running Jobs
This sections provides a short introduction on how to submit one or several jobs to the SGE. In fact, this is easy as long as you understand a little bit of shell scripting. For submitting there are two different kinds of jobs, simple jobs and array jobs:
Simple Jobs
Always keep in mind that a single job should be single threaded, i.e. it should use one available slot and therefore not more than 100% CPU time.
A simple job can be submitted by first creating a shell script and then typing one command on your command line. First create a shell script looking something like the following:
#!/bin/bash your_program your_parameters
Let us assume you saved your script into the file script.sh. Then you simply type
qsub -N JobName -l h_vmem=2G -l variables_list -r y -e /dev/null -o /dev/null path-to-script/script.sh parameters
on your console, with the following meanings:
- -N JobName -- assigns a name to your job
- -l h_vmem=2G -- request a memory limit of 2GB for the job
- -l variables_list -- see below at Variables to be Requested
- -r y -- ensures that a job is rescheduled (restarted) when the execution host the job was currently running on crashes
- -e /dev/null -- writes error to the given path (i.e. currently to /dev/null)
- -o /dev/null -- writes output to the given path (i.e. currently to /dev/null)
- path-to-script/script.sh parameters -- calls your script with the given command line parameters
Multithreaded Jobs
Submitting a job that uses multiple threads is possible.
For example if you submit a job "job01.sh" that will use 4 threads, you would request 4 slots when submitting a job like the following:
qsub -pe pthreads 4 -l h_vmem=2G job01.sh
The gridengine does not check how many threads are started, it is just a value important for the scheduler so the CPU load will not rise above 100%!
The requested memory is multiplied by the number of requested slots. If you start a job requesting 4 slots/threads and 2GB of memory the scheduler will reserve 8GB of memory for this job.
Array Jobs
Array jobs are useful when performing hundreds of runs of the same parameter setting. In fact, this is just a shortcut for iteratively submitting simple jobs. The job itself is then split into tasks numbered according to the parameters given (see below). Anyhow, you again need your script.sh specified above. Submitting an array job is then done using the following syntax:
qsub -N JobName -l h_vmem=2G -l variables_list -r y -e /dev/null -o /dev/null -t 1-10:1 path-to-script/script.sh parameters
with the same meanings as above. Only thing changes is:
-
- -t 1-10:1 -- specifies that your job is split into tasks numbered 1 to 10 using step size 1, i.e. 5-20:5 would generate tasks with the ids 5, 10, 15 and 20
- The variable $SGE_TASK_ID is known and set in script.sh, i.e. the current task id can be accessed by this variable. This might be useful for writing log files etc.
Example Job
Assume you want to test two different algorithms on 5 instances of graphs of 4 different sizes.
First a script script.sh looking something like
#!/bin/bash logFile=`printf "%s_r%02d.log" $1 $SGE_TASK_ID`; your_program -log $logFile;
is defined (of course your parameter for logging might be named different). Then a second script callingLoop.sh is specified looking like
#!/bin/bash for alg in alg1 alg2 do for size in 005 010 020 040 do for (( i=1; i < 6; ++i )) do instName=`printf "inst_%s_%02d.graph" $size $i`; logName=`printf "log_%s_%s_%02d" $alg $size $i`; qsub -N jobName -l h_vmem=2G -l variables_list -t 1-10:1 -r y -e /dev/null -o /dev/null path-to-script/script.sh $logName $instName $alg; done; done; done;
Then you need to type
./callingLoop.sh
into your console.
$TMPDIR
If you want to use local disk space on the compute nodes you can use the variable $TMPDIR as a directory name in your job script. It will create a temporary directory on the node for each job when the job script is started and it will automatically delete the directory when the job script finishes. Please do not use /tmp directly!
Job Status
To gather information on the status of your jobs simply type
qstat
on the console.
man qstat
might be useful if you are interested in more details.
Deleting Jobs
Sometimes you might want to delete (possible) wrongly submitted jobs. This can be easily done by typing
qdel <job_id>
The parameter <job_id> corresponds to the id of the job displayed either during submission or by the qstat-command.
Variables to be Requested
As indicated above there is the possibility to request some variables or also called attributes. This is useful to ensure that the requested features are provided by the executing host, i.e., the CPU used to compute the submitted job. In addition to the standard variables as implemented by each SGE the following variables can be requested.
Variable | Meaning |
---|---|
noX | requests that jobs are executed on members of the noX cluster. i.e. no1-5 = 5x (1× Intel Core 2 Quad, 2.83GHz; 8GB RAM) |
bc1 | requests that jobs are executed on members of the blade center 1 cluster. i.e. bs01-14 = 14x (2× Intel Xeon E5540, 2.53 GHz Quad Core; 24GB RAM) |
bc2 | requests that jobs are executed on members of the blade center 2 cluster. i.e. bs15 = (2× Intel Xeon E5649, 2.53 GHz 6-core; 72GB RAM), bs16-bs17 = 2x (2× Intel Xeon E5649, 2.53 GHz 6-core; 60GB RAM) |
bc3 | requests that jobs are executed on members of the blade center 3 cluster. i.e. bs31-bs32 = 2x (2× Intel Xeon E5-2630 v2, 2.60GHz 6-core; 48GB RAM) |
bc4 | requests that jobs are executed on members of the blade center 4 cluster. i.e. b401-b416 = 16x (2× Intel Xeon E5-2640 v4, 2.40GHz 10-core; 160GB RAM) |
bladeX | requests that jobs are executed on members of the blade center 1, 2, 3 and 4 clusters. |
longrun | If you expect your job to take more than 10 hours to complete please submit with option longrun: -l longrun=1 . It ensures that there are no more than 250 long running jobs running at the same time, i.e. there are nodes free for normal jobs. |
mem_free | (consumable=yes default=1.9G) requests that jobs are only submitted to nodes having at least the specified amount of space left. The specified amount is subtracted from the available memory of the node but it is not checked whether the job actually uses more memory! Jobs that exceed their requested memory amount might have a negative influence on other jobs! If you are unsure w.r.t. the memory consumption of your jobs, it is strongly recommended to use h_vmem and/or s_vmem. |
s_vmem | (consumable=no default=0) requests a soft memory limit, i.e., if a job exceeds the specified memory limit it receives signal SIGXCPU that can be used to terminate gracefully (and write some last logging information). Should be set slightly lower than h_vmem to take effect. (default=0 means no limit) |
h_vmem | (consumable=no default=0) requests a hard memory limit, i.e., if a job exceeds the specified memory limit it is aborted via a SIGKILL signal. (default=0 means no limit) |
Requesting the variables should be done similar to the following exemplary statement
qsub -l noX -l bladeX -l mem_free=1.9G -l s_vmem=1.8G -l h_vmem=1.9G script.sh
By the way, if using exactly this statement your jobs will never be processed, since no CPU is (and will be) at the same member of noX and bladeX cluster!
Troubleshooting
In some situations it might happen that a program perfectly runs on your machine but fails when submitted to the grid. In this case it is sometimes helpful if you know on which actual machine the (failing) job was executed. For this purpose you can add the following line in your calling script (script.sh in the above examples):
echo "running this job on $HOSTNAME"
The effect of this line is that on standard out a line similar to
running this job on eowyn.ac.tuwien.ac.at
will appear which, obviously, indicates the machine used to execute your job. This information might be requested by your advisor (or our technician) when trying to find the error(s).
Possible Mistakes
Make sure that the option "-r y" is provided for your submitted jobs (this can be changed using qalter - even for running jobs). If this option is not set then your jobs will not be (automatically) rescheduled (restarted) if the execution host the job was running on crashes.
Problems and Questions
If you have any questions please contact your supervisor. S/He will try to assist you as far as possible. You can also contact Andreas Müller if there are some technical issues.