Wednesday, March 11, 2009

The Data Center Water Incident

The False Sense of Security

When I worked for an association I had an experience I will never forget. I was the IT Manager and I managed a network with several servers. The servers where kept in locked cabinet inside a data center with raised floors, power conditioning, temperature and humidity control and a Halon fire suppression system designed to suppress the fire with out damaging your computer systems. So basically this room was engineered with protecting computer systems in mind giving me a false sense of safety.

One morning when I was getting ready to go into work I received a call that the server was down. When I entered the office area just out side the data center I noticed that the ceiling tiles had collapsed onto desks and all over the floor. It was then explained to me that water was leaking from the floor above. The floor above had an instant hot water heater installed in a kitchen area for the tenants in that section of the building. It had burst and sent water spewing for hours before it was noticed. The water had made its way through the foundation and was leaking to the floor below. Outside of the data center there was lots of damage but inside if the data center that housed the servers there was only two panels that collapsed. One on top of my desk and the other right on top of the, that’s right, the cabinet that housed the servers. No where else in the entire room was there more water leaking then right where the cabinet stood. The water made its way into the cabinet and traveled to the 4th server in the rack and took it out.

The server that was affected was a very robust server designed to be redundant in most all its hardware. The server had 4 drives and was using a striped configuration. To our best efforts we could not get the system back online after the system was dried out of course. The Raid controller was damaged by the water among other unknown components.

To top it all off the backups where not being done because we where still awaiting approval of the expense to correct the failing backup drive. We did have a backup form Friday before but this happened on Wednesday. So my supervisor the accounting manager who had no experience in IT matters, decided to send the drives away to see if we could get them recovered against my recommendation to just restore from the backup we had and have accounting input in the work that they had done from the missing days. The total missing days were 5 working days. Which I found out later would have only taken them about 16 hours to input back in the lost days plus days that we where down because of the damaged server.

So what is the lesson in all of this? Make sure you have a good disaster recovery plan for your company. Insure that your backups are working and please expedite requests for backup drives and tapes from your IT staff.

John Ledyard

Ledyard Consulting

No comments:

Post a Comment