Mittwoch, 26. Oktober 2011

When PXE saves the day...

It was so shocking last night when one of our filers crashed. It would have not so terrible if the mainboard is , say, normal.  This system is my self built NAS-station, having 6 SATA ports. Yes, it is not normal, since all the ports are used by the harddisk, and I used "internal" USB slot to boot. I had no choice. The weird thing is that no IDE connector exist, but the old-useless-ugly-space-wasting-floppy-disk-connector is there! I never understand brain damaged companies that keep retaining this connector. To my opinion IDE is much more useful like for plugging a flash-disk.

The headache is, no direct access to the server room, no available replacement of the USB stick, and no port to plug a harddisk. It was time to think of a strategy. The only hope is network boot, pxe, as the boot server is alway running in the head. Thank to IPMI, I could just do everything remote although the speed is somehow irritating. Well the strategy is, boot to network, use ramdisk (tmpfs). This should be easy, I thought.

I worked on automating node installation using pxe, tftp, http, etc, a couple of years ago. So I dig again a bunch of scripts I wrote. It functioned, and it should still function this time. This is actually a brilliant idea of mine, if I'm allowed to say so, that to an unregistered node or any alien computer is given an installation boot scheme. One need only to create pxe boot configuration named before the mac address of the ip given by the dhcp server in hexadecimal form. Deleting the file would send a node to the default configuration, which is the installation. This default scheme is created by modifying initrd to allow loading installation script from the http server on the head. It's quite efficient and practical, since a user may change the script according to his need, not only for installation purpose. I called this a respond script, because it is used as a respond to the request sent by a booting node. The node send information of it's configuration (minimalist) such as mac addresses, information about hardisks in the request which is named by the first mac address.

The respond script will be executed by the node. To install a node it may have a straight forward step.

  1. Format hardisk
  2. Create disk image, maybe using debootstrap, etc.
  3. Load harddisk content from the server and put it in the harddisk. I used wget to take a targz-ed disk image. Quite efficent.
  4. Make a correct PXE boot configuration and let the node reboot.
This time I failed to use the scheme for installation. The reason is, that mdadm and lvm2 can not be installed in the debootstraped directory. So I used it just to diagnose the broken NAS.  I only need to put 'sh' in the respond script to put me in shell command line.  So I installed first a lazy computer (it did not do any work) and make sure that everything works fine and able to load mdraid and lvm volumes, and take the targz-ed image from it.


Then It came to my mind, it should be easier just to put the disk image inside the initrd. Yes, the size explodes but who cares. One needs only about 250MB unzipped space for NAS. I modified initrd so it mount a tmpfs as root, and untargzs the attached disk image there.  Now, actually I did not need the scheme I told you before, but I don't feel guilty telling you that because this idea came after meshing around with it. So I need only two files, the kernel and the fat initrd.

In init script inside initrd file, there are two options to load root file system, local and nfs. I could have added new options, say ramdisk, but I decided to do it in a rude way: force the system to use ramdisk.
A simple inclusion in scripts directory, called 'ramdisk' is sufficient:

# ramdisk

mountroot ()
{
  wait_for_udev 10
  mount -t tmpfs none ${rootmnt} -o size=2048M
  tar xzvf /disk.tgz -C ${rootmnt}
  rm /disk.tgz
}

and set $BOOT variable in init script to ramdisk. Actually I did it hard way, call directly scripts/ramdisk before 'maybe_break mountroot'.

Boots... and smiled.... I then did ssh to the node and all the raids and lvms are there. Mount them, export to nfs, done. At the end I copied the kernel and initrd also to a USB disk. Now it's better because the disk is only used for loading the image in boot time.

Notes:
  • debian squeeze is used.
  • unpacking initrd: gzip -dc path/to/initrd-file|cpio -id
  • repacking initrd: find .|cpio -H newc -o|gzip > ../initrd-ramdisk , from the unpacked directory.
  • Services on server:
    • dhcp/bootp
  • Required packages for NAS:
    • mdadm
    • lvm2
    • rsync
    • ssh
  • modification needed in the disk image:
    • /etc/fstab
    • /etc//udev/rules.d/70-persistent-net.rules
    • /etc/exports
    • cleaning of unnecessary file under /usr and /var