Linux HPC Cluster Setup Guide

Guide to Building your Linux High-performance Cluster
Edmund Ochieng March 2, 2012
Abstract In modern day where computer simulation forms a critical part in research, high-performance clusters have become a need in about every educational or research institution. This paper aims to give you the instructions you need to setup your personal computer. So if you are looking forward to setting up a cluster, this is the guide for you. This guide is prepared with climate simulation in mind. However, besides the software required for climate simualtion, steps required to setup the cluster remain more or less the same. The setup aims to grant you the ability to run modelling, simulation and visualisation applications across multiple processors. Probably more than you can have in a single server unit.
Contents
I Master node Conguration 5
6 6 6 6 7 7 7 9 11 11 12 13
1 Network conguration 1.1 Internal interface conguration . . . . . . . . . . . . . . . . . . . 1.2 External interface conguration . . . . . . . . . . . . . . . . . . . 2 MAC address acquisition 2.1 System Documentation / Manuals . . . . . . . . . . . . . . . . . 2.2 Netwotk Trac Monitoring . . . . . . . . . . . . . . . . . . . . . 2.3 TFTP Conguration . . . . . . . . . . . . . . . . . . . . . . . . . 3 DHCP conguration 4 Local Repository 5 EPEL Repository 6 NFS conguration 7 SSH Key Generation Script
II Software and Compiler installation and conguration 14

8 Torque conguration 9 Maui conguration 15 19
10 Compiler Installation 21 10.1 GCC Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 10.2 Intel Compilers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 11 OpenMPI installation 21 11.1 OpenMPI Compiled with GCC Compilers . . . . . . . . . . . . . 22 11.2 OpenMPI Compiled with Intel Compilers . . . . . . . . . . . . . 22 12 Environment Modules installation 13 C3 Tools installation 14 Password Syncing 15 NetCDF, HDF5 and GrADs installation 16 NCL and NCO installation 17 R Statistical package installation 22 23 24 24 25 25
III
Computing Node Installation
26
27 28
18 Node OS installtion 19 Name resolution
Part I
Master node Conguration
1
1.1
Network conguration
Internal interface conguration
Set the network interface through which the DHCP service will listen for IP address request to be static and to start on system boot up. This is should appear similar to the congurations below.
1. With a text editor of your choice, edit your master node network conguration for the network interface to be used to communicate with other nodes in your cluster.
[root@master ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 # Broadcom Corporation NetXtreme BCM5715 Gigabit Ethernet DEVICE=eth0 #BOOTPROTO=dhcp BOOTPROTO=static HWADDR=00:16:36:E7:8B:A3 IPADDR=192.168.10.1 NETMASK=255.255.255.0 ONBOOT=yes DHCP_HOSTNAME=master.cluster
2. Once the changes have been made, you can save the le and start the interface. 3. Finally, you should invoke, the ifcong instruction to conrm the settings are active as illustrated below.
[root@master ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:16:36:E7:8B:A3 inet addr:192.168.10.1 Bcast:192.168.10.127 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:0 (0.0 b) TX bytes:0 (0.0 b) Interrupt:74 Memory:fdfc0000-fdfd0000
1.2
External interface conguration
The eth1 interface shall be connected to the organizational network and will acquire network conguration via DHCP. So to have the inetrface working, all that needs to be done is to set the ONBOOT option in /etc/syscong/networkscripts/ifcfg-eth1 and connect a cable to the interface.
MAC address acquisition
The MAC address acquisition step is important as it allows the master node to uniquely identify the nodes that make up the cluster and as a result give them customized conguration. 6
Each network interface has a unique MAC address which can be obtained either from the system manuals/documentation or from listening to the network trac from the master node interface on which the dhcp shall be listening on.
2.1
System Documentation / Manuals
This could either be on the hardware such as is the case on Sun servers and a couple of HP servers Ive seen or on the booklets provided alongside the server. However, this could at times be deceiving. If that is the case, you could always listen on the network to obtain the desired MAC address.
2.2
Netwotk Trac Monitoring
Using the tcpdump command, we can acquire the hardware interfaces MAC address. For easy identication, each node should be turned on at any given time during the MAC address collection process. From the tcpdump output below, we can identify the network interface MAC address of the rst node as 00:1b:24:3d:f1:a3 since the column just before the second greater than symbol is 0.0.0.0.68 - which basically means it has no ip address and expects a response on UDP port 68.
[root@master ~]# tcpdump -i eth0 -nn -qtep port bootpc and port bootps \ and ip broadcast tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 96 bytes 00:1b:24:3d:f1:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 590: 0.0.0.0.68 > 255.255.255.255.67: UDP, length 548 00:16:36:e7:8b:a3 > ff:ff:ff:ff:ff:ff, IPv4, length 342: 192.168.10.1 .67 > 255.255.255.255.68: UDP, length 300
Repeat the above process for all nodes to which you would like to issue static IP addresses.
2.3
TFTP Conguration
The TFTP service is trivial for a PXE server to work as they serve provide a netinstall kernel and a ramdisk to the clients when they attempt to do a network boot. By default, tftp which is part of xinetd.d is disabled. You can have it enabled by opening the conguration le and changing the value of the option disabled from yes to no. Your completed conguration le should be similar to the one shown below 1. Enable tftp which is part of the xinetd stack
[root@master ~]# vi /etc/xinetd.d/tftp [root@master ~]# cat /etc/xinetd.d/tftp # default: off service tftp
{ socket_type protocol wait user server server_args disable per_source cps flags } = = = = = = = = = = dgram udp yes root /usr/sbin/in.tftpd -s /tftpboot no 11 100 2 IPv4
2. Once done, restart the service xinetd to start tftp alongside other services on the next start.
[root@master ~]# service xinetd restart Stopping xinetd: Starting xinetd:
[ [
OK OK
] ]
3. Check if a tftpboot directory has been created on the root directory tree as is shown below
[root@master ~]# file /tftpboot/ /tftpboot/: directory
4. Create a directory tree into which the pxe les shall be placed.
[root@master ~]# mkdir -p /tftpboot/pxe/pxelinux.cfg
5. Copy the netboot kernel image and an initial ramdisk.

[root@master ~]# ls /distro/centos/images/pxeboot/ initrd.img README TRANS.TBL vmlinuz [root@master ~]# cp /distro/centos/images/pxeboot/{vmlinuz, initrd.img} /tftpboot/pxe/
6. Locate the pxelinux.0 le and copy to the /tftpboot/pxe directory from where it should be accessible via tftp daemon.
[root@master ~]# locate pxelinux.0 /usr/lib/syslinux/pxelinux.0 [root@master ~]# cp -av /usr/lib/syslinux/pxelinux.0 /tftpboot/pxe/ /usr/lib/syslinux/pxelinux.0 -> /tftpboot/pxe/pxelinux.0
NOTE: Keenly note the location of the pxelinux.0 le as its relative path(i.e. from the tftp root directory - /tftpboot) will be used in the DHCP daemon conguration section. 7. Create a default boot conguration le for machines that may not have a specic boot le in the pxelinux.cfg directory.
[root@master ~]# vi /tftpboot/pxe/pxelinux.cfg/default [root@master ~]# cat /tftpboot/pxe/pxelinux.cfg/default # /tftpboot/pxe/pxelinux.cfg/default prompt 1 timeout 100 default local label local LOCALBOOT 0 label install kernel vmlinuz append initrd=initrd.img network ip=dhcp lang=en US keymap=us \ ksdevice=eth0 ks=http://192.168.10.1/ks/node-ks.cfg \ loadramdisk=1 prompt_ramdisk=0 ramdisksize=16384 vga=normal \ selinux=0
8. Get the hexadecimal equivalent of the nodes ip address used to creat a per client pxe conguration.
[root@master pxelinux.cfg]# gethostip node01 node01 192.168.10.2 C0A80A02 [root@master pxelinux.cfg]# cp default C0A80A02
9. Copy the default le to a le with the hex equivalent obtained above. Open the le and change the line default local to default install. This should commence installation on rebooting node01. The same should be done for all other nodes.
[root@master ~]# cp /tftpboot/pxe/pxelinux.cfg/default /tftpbo ot/pxe/pxelinux.cfg/C0A80A02
DHCP conguration
To issue static ip addresses via the DHCP daemon, the network interface hardware(or MAC) addresses collected in the MAC address collection section will be necessary. DHCP daemon conguration for the cluster should carried out as outlined in the steps below. 1. Enter the name of the interface through which the DHCP daemon will be listening on.
[root@master ~]# cat /etc/sysconfig/dhcpd # Command line options here DHCPDARGS="eth0"
2. Create your DHCP conguration le, from the sample le in the location below. 9
[root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample \ /etc/dhcpd.conf cp: overwrite /etc/dhcpd.conf? y
3. You could edit your your congurations to look more or less like my congurations issuing addresses to desired hosts using their MAC addresses as illustrated below.
[root@master ~]# cat /etc/dhcpd.conf ddns-update-style interim; ignore client-updates; allow booting; allow bootp; subnet 192.168.10.0 netmask 255.255.255.0 { # --- default gateway # option routers option subnet-mask # option nis-domain option domain-name option domain-name-servers option time-offset option ntp-servers option netbios-name-servers
192.168.0.1; 255.255.255.0; "domain.org"; "cluster"; 192.168.10.1; 10800; # EAT 192.168.1.1; 192.168.1.1;
# # #
range dynamic-bootp 192.168.10.4 192.168.10.20; default-lease-time 21600; max-lease-time 43200; filename "pxe/pxelinux.0"; next-server 192.168.10.1; # we want the nameserver to appear at a fixed address host node01 { hardware ethernet 00:1b:24:3d:f1:a3; fixed-address 192.168.10.2; option host-name "node01"; } host node02 { hardware ethernet 00:1b:24:3e:05:d1; fixed-address 192.168.10.3; option host-name "node02"; } host node03 { hardware ethernet 00:1b:24:3e:04:f6; fixed-address 192.168.10.4; option host-name "node03"; }
4. Finally, save the conguration le and start the server. 10
[root@master ~]# service dhcpd start Starting dhcpd:
OK
5. Should the starting of DHCP daemon fail, you could look at the logs at /var/logs/messages and identify any DHCP daemon related errors. This could be done using the GNU/Linux editor but for better troubleshooting, Id proceed as below.
[root@master ~]# tail -f /var/log/messages
Local Repository
1. Create a directory on the system and make it copy all the contents of the installation disk into it.
[root@master ~]# mkdir -p /distro/centos [root@master ~]# cp -ar /media/CentOS_5.6_Final/*
A local repository is very crucial in cases of poor Internet connectivity.
/distro/centos
2. Create a new repository le that would point to the location created above.
[root@master ~]# cat /etc/yum.repos.d/CentOS-Local.repo [Local] name=CentOS- - Local baseurl=file:///distro/centos gpgcheck=0 enabled=1
3. Clear the cache and any other repository information saved locally
[root@master ~]# yum clean all
4. Make a cache of the new available repositories.

[root@master ~]# yum makecache
EPEL Repository
The addition of the EPEL(Extraa Packages for Enterprise Linux) repository was crucial in the facilitation of the installation of some of the software needed in the cluster and which installation from source was not quite a simple process. These are such as: 1. R - R Statistical package http://www.r-project.org/ 2. NCO - NetCDF Operator http://nco.sourceforge.net/ 3. CDO - Climate Data Operators 4. NCL - NCAR Command Language http://www.ncl.ucar.edu/Applications/rcm.shtml 11
5. GrADS - Grid Analysis and Display System http://www.iges.org/ This is done as illustrated below:
[root@master ~]# rpm -Uvh http://download.fedora.redhat.com/pub/epel/5 /x86_64/epel-release-5-4.noarch.rpm Retrieving http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-re lease-5-4.noarch.rpm warning: /var/tmp/rpm-xfer.Ln8ILG: Header V3 DSA signature: NOKEY, key ID 217521f6 Preparing... ########################################### [100%] 1:epel-release ########################################### [100%]
NFS conguration
We shall export some of the master nodes lesystem to reduce the need for repetitive conguration. 1. Populate the /etc/exports conguration le with the directories youd wish to have exported via nfs.
[root@master ~]# vi /etc/exports /distro *(ro,root_squash) /home *(rw,root_squash) /distro/centos *(ro,root_squash) /distro/ks *(ro,root_squash) /opt *(ro,root_squash) /usr/local *(ro,root_squash) /scratch *(rw,root_squash)
2. Start the nfs daemon. Which should start succesfully should your congurations.
[root@master Starting NFS Starting NFS Starting NFS Starting NFS ~]# service nfs start services: quotas: daemon: mountd:
[ [ [ [
OK OK OK OK
] ] ] ]
3. Make the nfs daemon to autostart without on system start up.

[root@master ~]# chkconfig nfs on [root@master ~]# exportfs -vra exporting *:/distro/centos exporting *:/distro/ks exporting *:/usr/local exporting *:/scratch exporting *:/distro exporting *:/home exporting *:/opt
12
SSH Key Generation Script
To allow jobs to be succesfully submitted to the cluster, passwordless ssh login should be possible for all users on the cluster. So the script below will create a key pair and copy it over to the authorized keys le in the .ssh/ directory in each users home directory. This shall be automated by the script below which we shall place in systemwide /etc/prole.d directory.
[root@master modulefiles]# cat /etc/profile.d/passwordless-ssh.sh
Listing 1: /etc/prole.d/passwordless-ssh.sh
#! / b i n / b a s h # # / e t c / p r o f i l e . d/ p a s s w o r d l e s s s s h . sh # i f [ ! d $ {HOME} / . s s h / o ! f $ {HOME} / . s s h / i d d s a . pub ] then echo ne G e n e r a t i n g s s h k e y s : \ t sshkeygen t dsa N f $ {HOME} / . s s h / i d d s a i f [ $ ? eq 0 ] ; then echo e [ \ 0 3 3 [ 3 2 ; 1m done \ 0 3 3 [ 0m] ; c a t $ {HOME} / . s s h / i d d s a . pub >> $ {HOME} / . s s h / a u t h o r i z e d k e y s chmod R u+rwX , go= $ {HOME} / . s s h / else echo e [ \ 0 3 3 [ 3 5 ; 1m f a i l e d \ 0 3 3 [ 0m] fi fi
13
Part II
Software and Compiler installation and conguration
14
Torque conguration
1. Untar the source and execute the congure script with the following below.
[root@master src]# tar xvfz torque-2.4.14.tar.gz [root@master src]# cd torque-2.4.14 [root@master torque-2.4.14]# mkdir build [root@master torque-2.4.14]# cd build [root@master build]# ../configure --help [root@master build]# ../configure --prefix=/opt/torque -enable-server --enable-mom --enable-clients --disable-gui --with-rcp=scp
2. Compile the code to create binary les by executing make, followed by make install to install the binaries.
[root@master build]# make [root@master build]# make install
3. Add the path for the sbin directory to the root users .bashrc le.
[root@master torque-2.4.14]# echo "export PATH=/opt/torqu e/sbin:\$PATH" >> /root/.bashrc [root@master torque-2.4.14]# tail -n 1 ~/.bashrc export PATH=/opt/torque/sbin:$PATH
4. Copy the pbs mom script in the contrib/init.d directory of the installation source /opt/torque/pbs mom.init. Open the le in an editor of your choice and ammend any erroneous paths.
[root@master torque-2.4.14]# cp contrib/init.d/pbs_mom \ /opt/torque/pbs_mom.init [root@master torque-2.4.14]# vi /opt/torque/pbs_mom.init
5. Copy the node install.sh script into the torque install directory. It will be used to install pbs mom on the computing nodes. Listing 2: node install.sh
#! / b i n / b a s h # / o p t / t o r q u e / n o d e i n s t a l l . sh # h t t p : / / e p i c o . e s c i e n c e l a b . o r g # mailto : baro@democritos . i t TORQUEHOME=/opt / t o r q u e / TORQUEBIN= $TORQUEHOME/ b i n MAUIBIN=/opt / maui / b i n SPOOL=/v a r / s p o o l / t o r q u e mkdir vp $SPOOL cd $SPOOL | | exit
# ===========================================================#
15
mkdir vp aux mom priv / j o b s mom logs c h e c k p o i n t s p o o l undelivered chmod v 1777 s p o o l u n d e l i v e r e d for s in prologue epilogue do t e s t e $TORQUEHOME/ s c r i p t s / $ s && \ l n sv $TORQUEHOME/ s c r i p t s / $ s $SPOOL/ mom priv / done # ===========================================================# c a t << EOF > p b s e n v i r o n m e n t PATH=/b i n : / u s r / b i n LANG =C EOF # ===========================================================# echo master > s e r v e r n a m e # ===========================================================# c a t << EOF > mom priv / c o n f i g \ $clienthost master \ $logevent 0 x7f \ $usecp : / u /u \ $usecp : / home /home \ $usecp :/ scratch / scratch EOF # ===========================================================# MOM INIT=/ e t c / i n i t . d/pbs mom cp va / opt / t o r q u e /pbs mom . i n i t $MOM INIT chmod +x $MOM INIT c h k c o n f i g add pbs mom c h k c o n f i g pbs mom on # i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( pbs mom i s NOT p a m l i m i t s aware ) e g r e p u l i m i t [ [ : s p a c e : ] ] + . l [ [ : s p a c e : ] ] $MOM INIT | | \ p e r l e w h i l e (<>) { print ; i f ( / [ \ t ]+ s t a r t \ ) / ) { p r i n t << EOF ; # # # i n c r e a s e l i m i t s f o r i n f i n i b a n d s t u f f ( nop a m l i m i t s aware ) # max l o c k e d memory , s o f t and hard l i m i t s f o r a l l PBS children u l i m i t H l u n l i m i t e d u l i m i t S l 4096000 # s t a c k s i z e , s o f t and hard l i m i t s f o r a l l PBS children u l i m i t H s u n l i m i t e d
16
u l i m i t S s 1024000 # # EOF } } i $MOM INIT # ===========================================================# c a t << EOF > / e t c / p r o f i l e . d/ pbs . sh e x p o r t PATH =$TORQUEBIN: $MAUIBIN : \$PATH EOF # EOF
6. In an editor of your choice, enter the fully qualied domain name of your master node in the le below.
[root@master torque-2.4.14]# vi /var/spool/torque/server_name master.cluster
7. Add your nodes and the their properties into the nodes le as shown below.
[root@master torque-2.4.14]# vi /var/spool/torque/server_priv/nodes node01 np=4 node02 np=4 node03 np=4
8. Initialize the serverdb and start the torque pbs server as shown below
[root@master ~]# pbs_server -t create [root@master ~]# service pbs_server start Starting TORQUE Server:
OK
9. Create a queue(s) to suit your conguration and make at least one of default using the torque qmgr command. An easier way would be to create a le as below
[root@master ~]# vi qmgr.cluster create queue default set queue default queue_type = Execution set queue default Priority = 60 set queue default max_running = 128 set queue default resources_max.walltime = 168:00:00 set queue default resources_default.walltime = 01:00:00 set queue default max_user_run = 12 set queue default enabled = True set queue default started = True set set set set set set server server server server server server scheduling = True managers = maui@master managers += root@master operators = maui@master operators += root@master default_queue = default
17
10. Load the enter the le containing the qmgr conguration as illustrated below
[root@master ~]# qmgr -c < qmgr.cluster
11. A print of the pbs server conguration looks as below

[root@master ~]# qmgr -c p s # # Create queues and set their attributes. # # # Create and define queue default # create queue default set queue default queue_type = Execution set queue default Priority = 60 set queue default max_running = 128 set queue default resources_max.walltime = 168:00:00 set queue default resources_default.walltime = 01:00:00 set queue default max_user_run = 12 set queue default enabled = True set queue default started = True # # Set server attributes. # set server scheduling = True set server acl_hosts = master.cluster set server managers = maui@master set server managers += root@master set server operators = maui@master set server operators += root@master set server default_queue = default set server log_events = 511 set server mail_from = adm set server query_other_jobs = True set server scheduler_iteration = 600 set server node_check_rate = 150 set server tcp_timeout = 6 set server next_job_number = 26
12. Restart both the pbs server on the master node and the pbs mom on the nodes and execute, pbsnodes to see a print out on all free nodes.
[root@master ~]# pbsnodes node01 state = free np = 2 ntype = cluster status = rectime=1308321567,varattr=,jobs=,state=free, netload=1205591,gres=,loadave=0.18,ncpus=4,physmem=4051184 kb,availmem=5021068kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node01 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux
18
node02 state = free np = 2 ntype = cluster status = rectime=1308321569,varattr=,jobs=,state=free, netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184 kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux node03 state = free np = 2 ntype = cluster status = rectime=1308321569,varattr=,jobs=,state=free, netload=1209868,gres=,loadave=0.38,ncpus=4,physmem=4051184 kb,availmem=5020892kb,totmem=5103400kb,idletime=0,nusers=0, nsessions=? 0,sessions=? 0,uname=Linux node02 2.6.18-238. el5 #1 SMP Thu Jan 13 15:51:15 EST 2011 x86_64,opsys=linux
Maui conguration
1. Untar, congure, make binaries and install maui from source as shown in the next sequence of steps
[root@master ~]# tar xvfz maui-3.3.1.tar.gz [root@master ~]# cd maui-3.3.1 [root@master maui-3.3.1]# ./configure --help [root@master maui-3.3.1]# ./configure --prefix=/opt/maui --with-spooldir=/var/spool/maui --with-pbs=/opt/torque/ [root@master maui-3.3.1]# make [root@master maui-3.3.1]# make install
2. Create a system user maui through which maui shall be run

[root@master maui-3.3.1]# useradd -d /var/spool/maui -r -g daemon \ maui
3. Edit the maui.cfg le changing the SERVERHOST, ADMIN1, ADMIN3 and resouce manager denition(RMCFG) as shown in the snipett below
[root@master maui-3.3.1]# vi /var/spool/maui/maui.cfg # maui.cfg 3.3.1 SERVERHOST master # primary admin must be first in list ADMIN1 maui root ADMIN3 ALL # Resource Manager Definition RMCFG[MASTER] TYPE=PBS
19
# Allocation Manager Definition AMCFG[bank] .... EOF TYPE=NONE
4. Copy the init script in the maui source package to /etc/init.d/ and, edit the le changing the MAUI PREFIX to point to your installation directory.
[root@master maui-3.3.1]# cp contrib/service-scripts/redhat. \ maui.d /etc/init.d/maui [root@master maui-3.3.1]# vi /etc/init.d/maui [root@master maui-3.3.1]# cat /etc/init.d/maui #!/bin/sh # # maui This script will start and stop the MAUI Scheduler # # chkconfig: 345 85 85 # description: maui # ulimit -n 32768 # Source the library functions . /etc/rc.d/init.d/functions MAUI_PREFIX=/opt/maui # let see how we were called case "$1" in start) echo -n "Starting MAUI Scheduler: " daemon --user maui $MAUI_PREFIX/sbin/maui echo ;; stop) echo -n "Shutting down MAUI Scheduler: " killproc maui echo ;; status) status maui ;; restart) $0 stop $0 start ;; *) echo "Usage: maui {start|stop|restart|status}" exit 1 esac
5. Create a le maui.sh in the /etc/prole.d directory and to it add the environment variables PATH, INCLUDE and LD LIBRARY PATH and make it executable. 20
[root@master maui]# vi /etc/profile.d/maui.sh [root@master maui]# chmod +x /etc/profile.d/maui.sh
10
Compiler Installation
A compilers is necessary in a cluster as they aid in the changing of source code into executables that can be run or understood by the computer. Of interest are C, C++ and fortran compilers popular of which are the GCC and Intel compilers. Another, option is the PGI compilers which we shall not have installed.
10.1
GCC Compilers
From the CentOS repositories we shall install the GCC compilers using the yum package management utility.
[root@master src]# yum -y install gcc.x86_64 gcc-gfortran.x86_64 \ libstdc++.x86_64 libstdc++-devel.x86_64 libgcj.x86_64 compat-lib \ stdc++.x86_64
10.2
Intel Compilers
For the Intel compilers which may give better results depending on the scenario, we shall proceed with the installation as outlined below: 1. Visit the Intel Website in your preferred web browser, register and download the Intel compilers for non-commercial use. 2. Move to the directory into which you downloaded the Intel C compilers and Fortran compilers. 3. Untar the tarballs and change directory into the created directory.
[root@master [root@master [root@master [root@master [root@master [root@master ~]# tar xvfz l_ccompxe_2011.4.191.tgz ~]# cd l_ccompxe_2011.4.191 l_ccompxe_2011.4.191]# ./install.sh ~]# tar xvfz l_fcompxe_2011.4.191.tgz ~]# cd l_fcompxe_2011.4.191 l_fcompxe_2011.4.191]# ./install.sh
4. Execute the install.sh script and proceed as prompted.
11
OpenMPI installation
OpenMPI is an open source library implementation of the Message Passing Interface(MPI-2) and facilitates communication/message inter-change between process in a High Performance Computing environment.
21
11.1
OpenMPI Compiled with GCC Compilers
1. Untar and compile the sources

[root@master src]# tar xvfj openmpi-1.4.2.tar.bz2 [root@master src]# cd openmpi-1.4.2 [root@master openmpi-1.4.2]# mkdir build [root@master openmpi-1.4.2]# cd build/ [root@master build]# ../configure CC=gcc CXX=g++ FC=gfortran \ F77=gfortran --prefix=/opt/openmpi/1.4.2/gcc/4.1.2 \ --with-tm=/opt/torque/
2. Create binaries by running make

[root@master build]# make
3. Finally, install the binaries into the system

[root@master build]# make install
11.2
OpenMPI Compiled with Intel Compilers
1. Untar and compile the sources as above. However, take keen notice of the value of the variables CC, CXX, FC and F77 as compared to the same step when compiled with the GCC compilers above.
[root@master src]# tar xvfj openmpi-1.4.2.tar.bz2 [root@master src]# cd openmpi-1.4.2 [root@master openmpi-1.4.2]# mkdir build [root@master openmpi-1.4.2]# cd build/ [root@master build]# ../configure CC=icc CXX=icpc FC=ifort \ F77=ifort --prefix=/opt/openmpi/1.4.2/intel/12.0.4 \ --with-tm=/opt/torque/
2. Create binaries by running make

[root@master build]# make
3. Finally, install the binaries into the system

[root@master build]# make install
12
Environment Modules installation
1. Obtain the environment modules source le, uncompress it and changee directory into the created directory as below
[root@master src]# tar xvfz modules-3.2.8a.tar.gz [root@master src]# cd modules-3.2.8
2. Then compile the sources specifying a prex where the sources should be installed. 22
[root@master modules-3.2.8]# ./configure --prefix=/opt
Should, you be running a 64-bit system and encounter an error indicating tcl lib and include directories cannot be found, proceed as below
[root@master modules-3.2.8]# ./configure --with-tcl-lib=/usr/lib64/ --with-tcl-inc=/usr/include/ --prefix=/opt
3. Then create binaries and install.

[root@master modules-3.2.8]# make [root@master modules-3.2.8]# make install
4. Finally, copy the init scrips to the /etc/prole.d directory to make the modules command available system-wide.
[root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash /etc/ profile.d/modules.sh [root@master modules-3.2.8]# cp /opt/Modules/3.2.8/init/bash_compl etion /etc/profile.d/modules_bash_completion.sh
13
C3 Tools installation
[root@master src]# tar xvfz c3-4.0.1.tar.gz [root@master src]# cd c3-4.0.1 [root@master c3-4.0.1]# ./Install-c3
1. Uncompress the C3 tools source package and execute the install script
2. Create a c3.conf conguration le dening a cluster name, the master node and nodes in the cluster.
[root@master c3-4.0.1]# vi /etc/c3.conf [root@master c3-4.0.1]# cat /etc/c3.conf cluster cluster1 { master:master node0[1-3] }
3. Create ssh keys to be used for passwordless login in the nodes of the cluster.
[root@master ~]# ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/root/.ssh/id_dsa): Created directory /root/.ssh. Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_dsa. Your public key has been saved in /root/.ssh/id_dsa.pub. The key fingerprint is: 46:6d:e5:e5:e2:5c:b5:72:16:bc:04:6f:59:2c:b5:32 root@master .cluster
23
4. Copy the /.ssh/id dsa.pub contents to the authorized keys le of all nodes in the cluster. This is how to do it on a single node.
[root@master ~]# ssh-copy-id -i ~/.ssh/id_dsa.pub root@node01 21 The authenticity of host node01 (192.168.10.2) cant be es tablished. DSA key fingerprint is fe:8d:bf:6e:de:f4:94:d3:c4: d7:ee:74:6c:8c:dd:da. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added node01,192.168.10.2 (RSA) to the list of known hosts. root@node01s password: Now try logging into the machine, with "ssh root@node01", and check in: .ssh/authorized_keys to make sure we havent added extra keys that you werent expecting.
5. Test if the key was succesfully registered by attempting to login into node01.
[root@master ~]# ssh node01 Last login: Fri Jun 17 12:53:28 2011 [root@node01 ~]# exit logout
14
Password Syncing
User accounts and passwords in the cluster should be similar in all nodes forming the cluster should be the same however, we cant have the user create the password in all the machines that form up the cluster. We shall therefore create a script to eect this. In our case we shall use the cpush command from the c3 tools package installed earlier. Listing 3: node-ks.cfg
#! / b i n / b a s h # # Sync / e t c / passwd , / e t c / shadow and / e t c / group # File : / root / bin # Cron : min hour dom month dow r o o t / e t c / passwordpush . sh f o r f i n passwd shadow group ; do / opt / c3 4/cpush / e t c / $ { f } > / dev / n u l l done
However, have in mind that rsync could be used to achieve the same.
15
NetCDF, HDF5 and GrADs installation
Grads requires NetCDF and HDF5 as dependencies for its installtion. Therefore, we shall install them all as a pack from the epel repositories. 24
[root@master ~]# yum -y install netcdf hdf5 grads
16
NCL and NCO installation
These too we shall have installed using the yum package manager as below
[root@master ~]# yum -y install ncl nco
17
R Statistical package installation
The R statistical package will be installed from the epel repositories to save as from the agony of installing a myraid of dependencies and for easy updating of the packages.
[root@master ~]# yum -y install R.x86_64 R-core.x86_64 R-devel.x86_64 \ libRmath.x86_64 libRmath-devel.x86_64
25
Part III
Computing Node Installation
26
18
Node OS installtion
With the master node setup complete, installtion of the nodes should just be a push of a button. However, a little understanding of the node-ks.cfg is essential. It marks the packages tftp, openssh-server, openssh, xorg-x11-xauth, mc and strace for installation and those with a preceeding sign for uninstalltion. There after, the post installation section is executed, which removes unwanted packages, creates a local repository, and install the gcc compilers on the nodes which are available on the CentOS repositories. Listing 4: node-ks.cfg
tftp opensshs e r v e r openssh xorgx11xauth mc strace cups cupsl i b s b l u e z u t i l s b l u e z gnome rppppoe ppp %p o s t l o g =/ r o o t / ksp o s t . l o g MASTER= 1 9 2 . 1 6 8 . 1 0 . 1 # D e l e t e unwanted s e r v i c e s f o r i in sendmail ; do c h k c o n f i g d e l $ { i } done # Remove d e f a u l t r e p o s t a r c v f z yum . r e p o s . d . t a r . gz / e t c /yum . r e p o s . d rm r f / e t c /yum . r e p o s . d/ # Mount / d i s t r o form master node mkdir p / d i s t r o mount t n f s $MASTER: / d i s t r o / d i s t r o # Add mount t o f s t a b echo e 1 9 2 . 1 6 8 . 1 0 . 1 : / d i s t r o \ t / d i s t r o \ t \ t n f s \ t d e f a u l t s \ t 0 0 | t e e a / e t c / f s t a b # Add master node s / o p t t o f s t a b echo e 1 9 2 . 1 6 8 . 1 0 . 1 : / opt \ t / opt \ t \ t n f s \ t d e f a u l t s \ t 0 0 | t e e a / etc / fstab # Add master node s /home t o f s t a b echo e 1 9 2 . 1 6 8 . 1 0 . 1 : / home\ t /home\ t \ t n f s \ t d e f a u l t s \ t 0 0 | t e e a / etc / fstab # E x e c u t e t h e n o d e i n s t a l l . sh s c r i p t t o i n s t a l l pbs mom / opt / t o r q u e / n o d e i n s t a l l . sh # Create l o c a l repo mkdir p / d i s t r o / c e n t o s echo e [ L o c a l ] \nname=CentOS$ r e l e a s e v e r L o c a l \ n b a s e u r l= f i l
27
e : / / / d i s t r o / c e n t o s \ ngpgcheck=0 \ n e n a b l e d=1 | t e e / e t c /yum . r e p o s . d/CentOSL o c a l . r e p o yum c l e a n a l l yum makecache # GCC c o m p i l e r s yum y i n s t a l l g c c . x 8 6 6 4 gccg f o r t r a n . x 8 6 6 4 l i b s t d c ++. x 8 6 6 4 l i b s t d c++ e v e l . x 8 6 6 4 l i b g c j . x 8 6 6 4 compatl i b s t d c ++. x 8 6 6 4 d
Once the installation is complete, you could have a look at the ks-post.log in roots home directory for any errors while executing the post section of the kickstart le.
19
Name resolution
Finally, ensure that all the nodes in the cluster can resolve names of the nodes in the cluster. You can either setup DNS on the master node or use the /etc/hosts le. SHould you need help setting up a DNS server, post your requests in the comments below.
28

Linux HPC Cluster Setup Guide

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Linux HPC Cluster Setup Guide

Uploaded by

Copyright:

Available Formats

Guide to Building your Linux High-performance Cluster

Edmund Ochieng March 2, 2012

II Software and Compiler installation and conguration 14

Computing Node Installation

18 Node OS installtion 19 Name resolution

Master node Conguration

External interface conguration

MAC address acquisition

System Documentation / Manuals

Netwotk Trac Monitoring

5. Copy the netboot kernel image and an initial ramdisk.

[root@master ~]# cp /usr/share/doc/dhcp-3.0.5/dhcpd.conf.sample \ /etc/dhcpd.conf cp: overwrite /etc/dhcpd.conf? y

192.168.0.1; 255.255.255.0; "domain.org"; "cluster"; 192.168.10.1; 10800; # EAT 192.168.1.1; 192.168.1.1;

4. Finally, save the conguration le and start the server. 10

[root@master ~]# service dhcpd start Starting dhcpd:

A local repository is very crucial in cases of poor Internet connectivity.

4. Make a cache of the new available repositories.

3. Make the nfs daemon to autostart without on system start up.

SSH Key Generation Script

Software and Compiler installation and conguration

11. A print of the pbs server conguration looks as below

2. Create a system user maui through which maui shall be run

# Allocation Manager Definition AMCFG[bank] .... EOF TYPE=NONE

[root@master maui]# vi /etc/profile.d/maui.sh [root@master maui]# chmod +x /etc/profile.d/maui.sh

4. Execute the install.sh script and proceed as prompted.

OpenMPI Compiled with GCC Compilers

1. Untar and compile the sources

2. Create binaries by running make

3. Finally, install the binaries into the system

OpenMPI Compiled with Intel Compilers

2. Create binaries by running make

3. Finally, install the binaries into the system

Environment Modules installation

[root@master modules-3.2.8]# ./configure --prefix=/opt

3. Then create binaries and install.

NetCDF, HDF5 and GrADs installation

[root@master ~]# yum -y install netcdf hdf5 grads

NCL and NCO installation

R Statistical package installation

Computing Node Installation

You might also like