Sonntag, 10. Februar 2008

New node plug and pray

Since our new cluster has just come. Partly. I want to make them automagically booting without further per-node configuration. We (me and Gerolf) have made the scenario like this: All nodes boot via pxe. For this we need to install a proper dchp, nfs, tftp, and linux-pxe. apt-get will help us. Notes: - do not use dhcp3, it fails to give pxe files. Dunno why. - use tftp-hpa The problem is, we don't know the addresses of interfaces of the nodes. The installation would be rather annoying, if we had to plug one-by-one a monitor to each node, just to see they hardware addresses. No. I don't want to do like that. This is the trick: Node side 1) steal initrd from Debian, and change its init script, just to detect network interfaces (eth*). 2) use tftp to stor the address at servers directory. Put it in the provided slots at the server. 3) wait until the server provide everything needed to boot, by waiting for an indicator. I chose as indicator a string 'ready' in the slot file. 4) reboot, and fetch the right kernel, initrd, and nfsroot. Server side 1) configure dhcpd to serve request from unknown interfaces. To do this I give a range declaration inside the subnet block for dynamic ips fron 10.255.0.1 to 10.255.0.255. 2) prepare request directory, containing slot files called req-(number). (number) is only sequence number to identify the slot. 3) check the slot files. If they contains hardware address then process it and put the corresponding entries in /etc/dhcpd.conf. 4) write indicator (ready string) inside each configured nodes. 5) call refresh script, written by Gerolf, to refresh pxe boot files, nfs exports, dhcp, and everything which is needed. Consult documentation about setting up pxe-boot somewhere in the net. If I put that details here, it would only give noises. If there is nothing wrong, the node will boot and getting online soon. This is the init script inside initrd,

----------------------------------------------------------- #!/bin/sh echo "Loading, please wait..." [ -d /dev ] || mkdir -m 0755 /dev [ -d /root ] || mkdir --mode=0700 /root [ -d /sys ] || mkdir /sys [ -d /proc ] || mkdir /proc [ -d /tmp ] || mkdir /tmp mkdir -p /var/lock mount -t sysfs none /sys mount -t proc none /proc tmpfs_size="10M" if [ -e /etc/udev/udev.conf ]; then . /etc/udev/udev.conf fi mount -t tmpfs -o size=$tmpfs_size,mode=0755 udev /dev [ -e /dev/console ] || mknod /dev/console c 5 1 [ -e /dev/null ] || mknod /dev/null c 1 3 > /dev/.initramfs-tools mkdir /dev/.initramfs export DPKG_ARCH= . /conf/arch.conf export ROOT= . /conf/initramfs.conf for i in conf/conf.d/*; do [ -f ${i} ] && . ${i} done . /scripts/functions export break= export init=/sbin/init export quiet=n export readonly=y export rootmnt=/root export debug= export cryptopts=${CRYPTOPTS} export ROOTDELAY= export panic= SERVER= DEV=eth0 for x in $(cat /proc/cmdline); do case $x in server=*) SERVER=${x#server=} ;; dev=*) DEV=${x#dev=} ;; break=*) break=${x#break=} ;; break) break=premount ;; esac done depmod -a maybe_break top run_scripts /scripts/init-top load_modules run_scripts /scripts/init-premount maybe_break request [ -z "$SERVER" ] && panic "Server name must be defined in \ kernel option: server=" echo "#### getting interface address of $DEV ####" ipconfig -c dhcp -d $DEV ping -c 2 $SERVER || panic "Cannot connect server $SERVER" ##### # request file in format req-(number) must exist and writable (mode 666) # in request directory @ server # req=1 reqfile="" while [ ${req} -le 100 ]; do tftp -g -l req -r request/req-${req} $SERVER if [ -s req ]; then req=$(( ${req} + 1 )) else reqfile=req-${req} break fi done [ -z "${reqfile}" ] && panic "No free request slot.. this is impossible" # slot available for i in /sys/class/net/eth*; do cat $i/address >> req done tftp -p -l req -r request/${reqfile} $SERVER ready= while [ -z "$ready" ]; do echo "<<<< waiting for valid boot files >>>>" sleep 10 tftp -g -l req -r request/${reqfile} $SERVER grep ready req && ready=yes done echo "<<<< Booting files are ready.... REBOOT... >>>>" reboot --------------------------------
And to handle the request, I put a _secret_ tag:
#--> insertion point bla bla bla

somewhere in a right place inside dhcpd.conf. All the host entry will be written in _one line_. I put also a lease range 10.255.0.1 to 10.255.0.255 in the subnet block. Here is the handling script,
---------------------------------- DIR=/srv/tftp die() { echo $@ exit } cd $DIR/request echo "searching request" rm -f /tmp/newnode handled= for i in req-*; do [[ -s $i ]] || continue grep 'ready' $i && continue mainif=`awk 'NR==1{print}' $i` auxif=$(awk 'NR==2{print}' $i ) pxef=$(echo $mainif|awk '{gsub(/:/,"-");print}') [[ -f ../pxelinux/$pxef ]] && continue echo "Handling request for $mainif" handled="$handled $i" # find node number awk 'BEGIN{n=0} /^[[:space:]]*#/{next} /^[[:space:]]*host node/{gsub(/[[:alpha:]_-]/,"",$2);if(n<$2)n=$2} END{print "host node"n+1" \ {option host-name \"node"n+1"\";\ hardware ethernet '$mainif';fixed-address 10.10.1."n+1";}" if("'$auxif'") print "host node"n+1"a \ {hardware ethernet '$auxif';fixed-address 10.10.2."n+1";}" }' /etc/dhcpd.conf >> /tmp/newnode done [[ -z $handled ]] && die "no new node request" awk 'BEGIN{fl=0} FNR==1{fl++} fl==1{x[FNR]=$0;next} /^#--> insertion/{for(i in x)print x[i]} {print}' /tmp/newnode /etc/dhcpd.conf > /tmp/dhcpd-tmp mv /tmp/dhcpd-tmp /etc/dhcpd.conf rm -f /tmp/newnode /root/admin-scripts/refresh # to reboot the node: put 'ready' in req-* file in request directory for i in $(echo $handled);do echo ready > $i; done ----------------------------------------------------------------------

Keine Kommentare: