Since our new cluster has just come. Partly. I want to make them automagically booting without further per-node configuration. We (me and Gerolf) have made the scenario like this:
All nodes boot via pxe. For this we need to install a proper dchp, nfs, tftp, and linux-pxe.
apt-get will help us.
Notes:
- do not use dhcp3, it fails to give pxe files. Dunno why.
- use tftp-hpa
The problem is, we don't know the addresses of interfaces of the nodes. The installation would be rather annoying, if we had to plug one-by-one a monitor to each node, just to see they hardware addresses. No. I don't want to do like that.
This is the trick:
Node side
1) steal initrd from Debian, and change its init script, just to detect network interfaces (eth*).
2) use tftp to stor the address at servers directory. Put it in the provided slots at the server.
3) wait until the server provide everything needed to boot, by waiting for an indicator. I chose as indicator a string 'ready' in the slot file.
4) reboot, and fetch the right kernel, initrd, and nfsroot.
Server side
1) configure dhcpd to serve request from unknown interfaces. To do this I give a range declaration inside the subnet block for dynamic ips fron 10.255.0.1 to 10.255.0.255.
2) prepare request directory, containing slot files called req-(number). (number) is only sequence number to identify the slot.
3) check the slot files. If they contains hardware address then process it and put the corresponding entries in /etc/dhcpd.conf.
4) write indicator (ready string) inside each configured nodes.
5) call refresh script, written by Gerolf, to refresh pxe boot files, nfs exports, dhcp, and everything which is needed.
Consult documentation about setting up pxe-boot somewhere in the net. If I put that details here, it would only give noises.
If there is nothing wrong, the node will boot and getting online soon.
This is the init script inside initrd,
-----------------------------------------------------------
#!/bin/sh
echo "Loading, please wait..."
[ -d /dev ] || mkdir -m 0755 /dev
[ -d /root ] || mkdir --mode=0700 /root
[ -d /sys ] || mkdir /sys
[ -d /proc ] || mkdir /proc
[ -d /tmp ] || mkdir /tmp
mkdir -p /var/lock
mount -t sysfs none /sys
mount -t proc none /proc
tmpfs_size="10M"
if [ -e /etc/udev/udev.conf ]; then
. /etc/udev/udev.conf
fi
mount -t tmpfs -o size=$tmpfs_size,mode=0755 udev /dev
[ -e /dev/console ] || mknod /dev/console c 5 1
[ -e /dev/null ] || mknod /dev/null c 1 3
> /dev/.initramfs-tools
mkdir /dev/.initramfs
export DPKG_ARCH=
. /conf/arch.conf
export ROOT=
. /conf/initramfs.conf
for i in conf/conf.d/*; do
[ -f ${i} ] && . ${i}
done
. /scripts/functions
export break=
export init=/sbin/init
export quiet=n
export readonly=y
export rootmnt=/root
export debug=
export cryptopts=${CRYPTOPTS}
export ROOTDELAY=
export panic=
SERVER=
DEV=eth0
for x in $(cat /proc/cmdline); do
case $x in
server=*)
SERVER=${x#server=}
;;
dev=*)
DEV=${x#dev=}
;;
break=*)
break=${x#break=}
;;
break)
break=premount
;;
esac
done
depmod -a
maybe_break top
run_scripts /scripts/init-top
load_modules
run_scripts /scripts/init-premount
maybe_break request
[ -z "$SERVER" ] &&
panic "Server name must be defined in \
kernel option: server="
echo "#### getting interface address of $DEV ####"
ipconfig -c dhcp -d $DEV
ping -c 2 $SERVER || panic "Cannot connect server $SERVER"
#####
# request file in format req-(number) must exist and writable (mode 666)
# in request directory @ server
#
req=1
reqfile=""
while [ ${req} -le 100 ]; do
tftp -g -l req -r request/req-${req} $SERVER
if [ -s req ]; then
req=$(( ${req} + 1 ))
else
reqfile=req-${req}
break
fi
done
[ -z "${reqfile}" ] &&
panic "No free request slot.. this is impossible"
# slot available
for i in /sys/class/net/eth*; do
cat $i/address >> req
done
tftp -p -l req -r request/${reqfile} $SERVER
ready=
while [ -z "$ready" ]; do
echo "<<<< waiting for valid boot files >>>>"
sleep 10
tftp -g -l req -r request/${reqfile} $SERVER
grep ready req && ready=yes
done
echo "<<<< Booting files are ready.... REBOOT... >>>>"
reboot
--------------------------------
somewhere in a right place inside dhcpd.conf. All the host entry will be written in _one line_. I put also a lease range 10.255.0.1 to 10.255.0.255 in the subnet block. Here is the handling script,
----------------------------------
DIR=/srv/tftp
die() {
echo $@
exit
}
cd $DIR/request
echo "searching request"
rm -f /tmp/newnode
handled=
for i in req-*; do
[[ -s $i ]] || continue
grep 'ready' $i && continue
mainif=`awk 'NR==1{print}' $i`
auxif=$(awk 'NR==2{print}' $i )
pxef=$(echo $mainif|awk '{gsub(/:/,"-");print}')
[[ -f ../pxelinux/$pxef ]] && continue
echo "Handling request for $mainif"
handled="$handled $i"
# find node number
awk 'BEGIN{n=0}
/^[[:space:]]*#/{next}
/^[[:space:]]*host node/{gsub(/[[:alpha:]_-]/,"",$2);if(n<$2)n=$2}
END{print "host node"n+1" \
{option host-name \"node"n+1"\";\
hardware ethernet '$mainif';fixed-address 10.10.1."n+1";}"
if("'$auxif'") print "host node"n+1"a \
{hardware ethernet '$auxif';fixed-address 10.10.2."n+1";}"
}' /etc/dhcpd.conf >> /tmp/newnode
done
[[ -z $handled ]] && die "no new node request"
awk 'BEGIN{fl=0}
FNR==1{fl++}
fl==1{x[FNR]=$0;next}
/^#--> insertion/{for(i in x)print x[i]}
{print}' /tmp/newnode /etc/dhcpd.conf > /tmp/dhcpd-tmp
mv /tmp/dhcpd-tmp /etc/dhcpd.conf
rm -f /tmp/newnode
/root/admin-scripts/refresh
# to reboot the node: put 'ready' in req-* file in request directory
for i in $(echo $handled);do echo ready > $i; done
----------------------------------------------------------------------
Keine Kommentare:
Kommentar veröffentlichen