Using SLURM to access the GPU cluster


Information:

juur$ sinfo
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
short*       up      30:00      8    mix idu[02-03,05-06,38-41]
short*       up      30:00     33   idle idu[01,04,07-37]
long         up   infinite      8    mix idu[02-03,05-06,38-41]
long         up   infinite     32   idle idu[04,07-37]
gpu          up   infinite      8    mix idu[02-03,05-06,38-41]
gpu          up   infinite     12   idle idu[01,04,07-16]

juur$ scontrol show partition
PartitionName=short
   AllocNodes=ALL AllowGroups=ALL Default=YES
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=00:30:00 MinNodes=1
   Nodes=idu[01-41]
   Priority=1 RootOnly=NO ReqResv=NO Shared=YES:4 PreemptMode=OFF
   State=UP TotalCPUs=392 TotalNodes=41 DefMemPerCPU=2048 MaxMemPerCPU=12288

PartitionName=long
   AllocNodes=ALL AllowGroups=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
   Nodes=idu[02-41]
   Priority=1 RootOnly=NO ReqResv=NO Shared=YES:4 PreemptMode=OFF
   State=UP TotalCPUs=384 TotalNodes=40 DefMemPerCPU=2048 MaxMemPerCPU=12288

PartitionName=gpu
   AllocNodes=ALL AllowGroups=ALL Default=NO
   DefaultTime=NONE DisableRootJobs=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=UNLIMITED MinNodes=1
   Nodes=idu[01-16],idu[38-41]
   Priority=1 RootOnly=NO ReqResv=NO Shared=YES:4 PreemptMode=OFF
   State=UP TotalCPUs=176 TotalNodes=20 DefMemPerCPU=2048 MaxMemPerCPU=12288

A simple 1 GPU job:

juur$ srun --gres=gpu:1 gpujob

You can use a maxiumum of 2 GPUs per one node:

juur$ srun --gres=gpu:2 gpujob

Sending jobs to multiple nodes:

juur$ srun --gres=gpu:2 -n4 gpujob

Using a certain type of GPUs:

juur$ srun --constraint=K20 --gres=gpu:1 gpujob

srun is useful for testing and sending very simple jobs. In most cases you should use salloc or sbatch.

Documentation:
http://www.schedmd.com/slurmdocs/quickstart.html
http://www.schedmd.com/slurmdocs/gres.html
http://www.schedmd.com/slurmdocs/documentation.html

Viimati muudetud: 30. jaan. 2017