$ echo "deb http://archive.ubuntu.com/ubuntu bionic universe" | sudo tee -a /etc/apt/sources.list
Update package list
$ sudo apt update
Install slurm-wlm
$ sudo apt install slurm-wlm -y
Install slurm documentation. This is useful to generate slurm.conf using configurator.easy.html page
$ sudo apt install slurm-wlm-doc -y
Get a machine with a web browser, and open /usr/share/doc/slurm-wlm-doc/html/configurator.easy.html to easily generate slurm.conf.
You can also access the configurator online at https://slurm.schedmd.com/configurator.easy.html, but depending on your slurm version, the online version might not be suitable.
Fill up the form, some of the information can be retrieved using command
$ slurmd -C
Some of the configuration that I changed from the default
- Make sure the hostname of the system is ControlMachine and NodeName
- State Preservation: set StateSaveLocation to /var/spool/slurm-llnl
- Process tracking: use Pgid instead of Cgroup
- Process ID logging: set this to /var/run/slurm-llnl/slurmctld.pid and /var/run/slurm-llnl/slurmd.pid
# slurm.conf file generated by configurator easy.html.
# Put this file on all nodes of your cluster.
# See the slurm.conf man page for more information.
#
ControlMachine=myserver
#ControlAddr=
#
#MailProg=/bin/mail
MpiDefault=none
#MpiParams=ports=#-#
ProctrackType=proctrack/pgid
ReturnToService=1
SlurmctldPidFile=/var/run/slurm-llnl/slurmctld.pid
#SlurmctldPort=6817
SlurmdPidFile=/var/run/slurm-llnl/slurmd.pid
#SlurmdPort=6818
SlurmdSpoolDir=/var/spool/slurmd
SlurmUser=slurm
#SlurmdUser=root
StateSaveLocation=/var/spool/slurm-llnl
SwitchType=switch/none
TaskPlugin=task/none
#
#
# TIMERS
#KillWait=30
#MinJobAge=300
#SlurmctldTimeout=120
#SlurmdTimeout=300
#
#
# SCHEDULING
FastSchedule=1
SchedulerType=sched/backfill
SelectType=select/linear
#SelectTypeParameters=
#
#
# LOGGING AND ACCOUNTING
AccountingStorageType=accounting_storage/none
ClusterName=cluster
#JobAcctGatherFrequency=30
JobAcctGatherType=jobacct_gather/none
#SlurmctldDebug=3
#SlurmctldLogFile=
#SlurmdDebug=3
#SlurmdLogFile=
#
#
# COMPUTE NODES
NodeName=myserver CPUs=1 State=UNKNOWN
PartitionName=debug Nodes=myserver Default=YES MaxTime=INFINITE State=UP
DebugFlags=NO_CONF_HASH
$ sudo mkdir /var/spool/slurm-llnl
$ sudo chown -R slurm.slurm /var/spool/slurm-llnl
Create slurm pid directory
$ sudo mkdir /var/run/slurm-llnl/
$ sudo chown -R slurm.slurm /var/run/slurm-llnl
Start and enable the slurm manager on boot
$ sudo systemctl start slurmctld
$ sudo systemctl enable slurmctld
Start slurmd and enable on boot
$ sudo systemctl start slurmd
$ sudo systemctl enable slurmd
If somehow slurmcrld or slurmd failed to start, run the applications interactively with debug options, to check for any errors. If there is any error, adjust slurm.conf accordingly.
$ sudo -u slurm slurmctld -Dcvvv
$ sudo slurmd -Dcvvv
Check slurm ndoes using scontrol command
$ scontrol show node
No comments:
Post a Comment