Thursday, July 28, 2011

"Link is not ready" when downloading stage2.img in RHEL/CentOS 5 kickstart install - or watch those typos!

I recently had a kickstart install of a HP blade BL460c G1 that would get part way through booting into the kickstart process but fail when loading the stage2.img file. It seemed to load updates.img and product.img fine.

It was late in a 12 hour day in the data center. I couldn't figure it out. I updated the firmware. Same issue. The machine was fine booting from PXE into the HP diagnostic and firmware images, but not fine once inside the RHEL 5.6 install.  After much head scratching and Googling, I found this:

RHEL 5.4 kickstart occasionally fails on HP BL460 G1 and G6 blades

https://bugzilla.redhat.com/show_bug.cgi?id=547746

which, lacking a conclusion, didn't help.

I gave up.

Went home. Got a good night's sleep. Went back to the data center the next morning.

Changed to a different release (5.5) in the kickstart and PXE boot file. Same issue.

I got suspicious about the kickstart file. The RHEL install goes on the PXE network configuration (DHCP) until it gets to parsing the kickstart. Then it reconfigures the network according to what's in the kickstart. I had a static network configuration in the kickstart. I had checked it the day before, but didn't notice anything.

But this time I saw the issue. I had not defined the gateway in the static configuration. Once I defined it, the machine installed normally.

Lessons learned?

  • 12h days don't necessarily mean more productivity. Any progress can easily be killed by fatigue-induced errors. If your admins are working 12h days, you either need more admins or you need to stop what you are doing to evaluate why your infrastructure requires 12h days. Fatigue will bite you.  I will be evaluating our infrastructure.
  • If your infrastructure requirements allow DHCP (ours do not in this case), use it. Simplicity prevents errors. In this case, the "network" line in the kickstart would have simply said '--bootproto dhcp' and I would not have had to worry about the other change points (gateway, ip, etc.).