Genius quick start guide¶
Genius is the default KU Leuven/UHasselt Tier-2 cluster. It can be used for most workloads, and has nodes with a lot of memory, as well as nodes with GPUs.
Login infrastructure¶
Direct login using SSH is possible to all login infrastructure without restrictions.
You can access Genius through: login-genius.hpc.kuleuven.be
This will loadbalance your connection to one of the 4 Genius login nodes. Two types of login nodes are available:
classic login nodes, i.e., terminal SSH access:
login1-tier2.hpc.kuleuven.be
login2-tier2.hpc.kuleuven.be
login3-tier2.hpc.kuleuven.be
login4-tier2.hpc.kuleuven.be
login node that provides a desktop environment that can be used for, e.g., visualization, see the NX clients section:
nx.hpc.kuleuven.be
Warning
This node should not be accessed using terminal SSH, it serves only as a gateway to the actual login nodes your NX sessions will be running on.
The NX login node will start a session on a login node that has a GPU, i.e., either
login3-tier2.hpc.kuleuven.be
login4-tier2.hpc.kuleuven.be
For example, to log in to any of the login node using SSH:
$ ssh [email protected]
Running jobs on Genius¶
There are several type of nodes in the Genius cluster: normal compute nodes, GPU nodes, big memory nodes. The resources specifications for jobs have to be tuned to use these nodes properly.
In case you are not yet familiar with the system, you read more information on
There are several type of nodes in the Genius cluster: normal compute nodes, GPU nodes, big memory nodes. For information on systems, see the hardware specification.
The charge rate for the various node types of Genius are listed in the table below. Information on obtaining credits and credit system basics is available.
node type |
credit/hour |
---|---|
skylake |
10.00 |
skylake bigmem |
12.00 |
GPU |
5.00 per GPU |
The maximum walltime for any job on Genius is 7 days (168 hours). Job requests with walltimes between 3 and 7 days are furthermore only allowed to request up to 10 compute nodes per job. No such limitation is imposed on jobs with walltimes of 3 days or less.
Note
There is a limit on the number of jobs you can have in a queue. This number includes idle, running, and blocked jobs. If you try to submit more jobs than the maximum number, these jobs will be deferred and will not start. Therefore you should always respect the following limits on how many jobs you have in a queue at the same time:
q1h: max_user_queueable = 200
q24h: max_user_queueable = 250
q72h: max_user_queueable = 150
q7d: max_user_queueable = 20
qsuperdome: max_user_queueable = 20
These limits can be checked on the cluster by executing:
$ qstat -f -Q
Submit to a compute node¶
Submitting a compute job boils down to specifying the required number of nodes, cores-per-node, memory and walltime. You may e.g. request two full nodes like this:
$ qsub -l nodes=2:ppn=36 -l walltime=2:00:00 -A myproject myjobscript.pbs
You may also request only a part of the resources on a node. For instance, to test a multi-threaded application which performs optimally using 4 cores, you may submit your job like this:
$ qsub -l nodes=1:ppn=4 -l walltime=2:00:00 -A myproject myjobscript.pbs
In the two above examples, the jobs may start on Skylake or Cascadelake nodes.
Please bear in mind to not exceed the maximum allowed resources on compute nodes for the targeted partition.
E.g. you can request at most 36 cores per node (ppn=36
). In general, we advise you to only request as much resources as needed by your application.
Advanced node usage¶
In certain cases (such as performance tests) you may want to be sure that your job runs on a specific type of node (i.e. only Skylake nodes or only Cascadelake nodes). You can do this by selecting a node feature via e.g. -l nodes=1:ppn:8:skylake
or -l feature=skylake
(same for cascadelake
).
When doing so, you should take into account that all jobs on the Skylake nodes are subjected to the SINGLEUSER
node access policy. This means that once a Skylake node is allocated to a job, no job from other users can land on this node (even if the original job only requested a small part of the node’s resources). This is different on the Cascadelake nodes, where small jobs (less than 18 cores and with default memory requirements) are given the SHARED
node access policy instead. This allows multiple small jobs from different users to run on the same node.
Submit to a GPU node¶
The GPU nodes are located in a separate cluster partition so you will need to explicitly specify it when submitting your job. We also configured the GPU nodes as shared resources, meaning that different users can simultaneously use a portion of the same node. However every user will have exclusive access to the number of GPUs requested. If you want to use only 1 GPU of type P100 (which are on nodes with SkyLake architecture) you can submit for example like this:
$ qsub -l nodes=1:ppn=9:gpus=1:skylake -l partition=gpu -l pmem=5gb -A myproject myscript.pbs
Note that in case of 1 GPU you have to request 9 cores. In case you need more GPUs you have to multiply the 9 cores with the number of GPUs requested, so in case of for example 3 GPUs you will have to specify this:
$ qsub -l nodes=1:ppn=27:gpus=3:skylake -l partition=gpu -l pmem=5gb -A myproject myscript.pbs
To specifically request V100 GPUs (which are on nodes with CascadeLake architecture), you can submit for example like this:
$ qsub -l nodes=1:ppn=4:gpus=1:cascadelake -l partition=gpu -l pmem=20gb -A myproject myscript.pbs
For the V100 type of GPU, it is required that you request 4 cores for each GPU. Also notice that these nodes offer much larger memory bank.
Advanced GPU usage¶
There are different GPU compute modes available, which are explained on this documentation page.
exclusive_process: only one compute process is allowed to run on the GPU
default: shared mode available for multiple processes
exclusive_thread: only one compute thread is allowed to run on the GPU
To select the mode of your choice, you can for example submit like this:
$ qsub -l nodes=1:ppn=9:gpus=1:skylake:exclusive_process -l partition=gpu -A myproject myscript.pbs
$ qsub -l nodes=1:ppn=9:gpus=1:skylake:default -l partition=gpu -A myproject myscript.pbs
$ qsub -l nodes=1:ppn=9:gpus=1:skylake:exclusive_thread -l partition=gpu -A myproject myscript.pbs
If no mode is specified, the exclusive_process
mode is selected by default.
Submit to a big memory node¶
The big memory nodes are also located in a separate partition. In case of the big memory nodes it is also important to add your memory requirements, for example:
$ qsub -l nodes=1:ppn=36 -l pmem=20gb -l partition=bigmem -A myproject myscript.pbs
Submit to an AMD node¶
All jobs on AMD nodes are given the SINGLEUSER
node access policy
(see above “Advanced node usage” paragraph for more information).
Besides specifying the partition, it is also important to specify the memory
per process (pmem
) since the AMD nodes have 256 GB of RAM, which implies
that the default value is too high, and your job will never run.
For example:
$ qsub -l nodes=2:ppn=64 -l pmem=3800mb -l partition=amd -A myproject myscript.pbs
This resource specification for the memory is a few GB less than 256 GB, leaving some room for the operating system to function properly.
Running debug jobs¶
Debugging on a busy cluster can be taxing due to long queue times. To mitigate this, two skylake CPU nodes and a skylake GPU node has been reserved for debugging purposes.
A few restrictions apply to a debug job:
it has to be submitted with
-l qos=debugging
it can only use at most two nodes for CPU jobs, a single node for GPU jobs,
its walltime is at most 30 minutes,
you can only have a single debug job in the queue at any time.
To run a debug job for 20 minutes on two CPU nodes, you would use:
$ qsub -A myproject -l nodes=2:ppn=36 -l walltime=00:20:00 \
-l qos=debugging myscript.pbs
To run a debug job for 15 minutes on a GPU node, you would use:
$ qsub -A myproject -l nodes=1:ppn=9:gpus=1 -l partition=gpu \
-l walltime=00:15:00 -l qos=debugging myscript.pbs