Server downtime, July 18-19, 2005

Thread Tools
 
Search this Thread
 
  #1  
Old 07-19-2005, 04:04 PM
FTE Ken's Avatar
FTE Ken
FTE Ken is offline
Post Fiend
Thread Starter
Join Date: Jan 1997
Location: Enjoying the real world.
Posts: 23,165
Likes: 0
Received 7 Likes on 6 Posts
Server downtime, July 18-19, 2005

Sorry folks, we had an enormous amount of down time in the past 24 hours. Traffic has been way up and I blamed it on that, but it turns out the load wasn't the problem (good old reliable Linux!).

Yesterday morning a 7 ton A/C unit in the hosting facility took a nose-drive. By noon the server room was hot enough to crash the web server. I did a remote reboot not knowing the A/C was down. By 4pm it crashed again. By 7pm again. Then 4 times from 7 - 10pm. Little did I know the server facility was getting hotter and hotter and causing the FTE web server to die. I arrived at the facility at 10:45pm and found, to my shock, AC repairmen frantically working on the system and FTE's servers frying in a 105 degree room.

I did everything I could with what I had to keep them running cool (box fan, putting 1U of space between each unit, leaving the covers off) but by 4am it was apparent that none of that would do the trick.... by this time the system wouldn't stay up for more than 5 minutes. I called it a night, went home for 2 hours of sleep.

So my day has been spent getting 2 monsterously high speed system cooling fans (average PC fan move 10-20 CFM, these move 94 CFM each). The servers are now up and running again, for several hours without a problem but we're not out of the woods yet. The AC system is a specialized unit and will take several days to get the parts. They were installing a "portable" 7 ton unit the size of my Ranger in the place, temp was down to 90 by the time I left and they expect it to stay around 70 (65 is considered optimal) unit the permanent unit is replaced.

Sorry folks.... if you think you were frustrated with FTE that past 24 hours... put yourself in my shoes!

Well... off to bed and hoping the server moniter doesn't page me with a crashed system again!
 
  #2  
Old 07-19-2005, 04:57 PM
Mil1ion's Avatar
Mil1ion
Mil1ion is offline
New User
Join Date: Jan 2008
Posts: 0
Likes: 0
Received 11 Likes on 11 Posts
I'll blame Hurricane Emily

Let's hope you get some quality sleep real soon !
 
  #3  
Old 07-19-2005, 08:47 PM
BigF350's Avatar
BigF350
BigF350 is offline
Hotshot
Join Date: May 2004
Location: Melbourne, Aus
Posts: 18,790
Likes: 0
Received 14 Likes on 9 Posts
I gathered that you were having "issues". Hope its all behind you...
Would one of these do the trick for cooling?
http://www.mining-technology.com/con...th/voith2.html
 
  #4  
Old 07-19-2005, 09:48 PM
rywegh's Avatar
rywegh
rywegh is offline
Cargo Master
Join Date: Apr 2001
Location: northern kentucky
Posts: 2,466
Likes: 0
Received 3 Likes on 3 Posts
Sorry for your loss of sleep, but thanks for your diligence. Also thanks for keeping us in the loop. I wondered what happened last night. Now I know.
 
  #5  
Old 07-19-2005, 10:19 PM
Carlene's Avatar
Carlene
Carlene is offline
Admom
Join Date: Feb 2001
Location: Silver Springs
Posts: 9,400
Received 188 Likes on 116 Posts
I feel for you Ken and definately wouldn't want your job.

My computer was only down for a couple days due to a fried hard drive and between the lost time and reloading everything, it drove me nuts.

Thanks for letting us know - and - get some sleep.
 
  #6  
Old 07-19-2005, 10:45 PM
fordborn's Avatar
fordborn
fordborn is offline
Post Fiend

Join Date: Jan 2004
Location: Northwest, Arkansas
Posts: 6,610
Likes: 0
Received 0 Likes on 0 Posts
I too would not want to be in Ken's shoes in issues like this.

On the other hand I did get to bed earlier then usual while it was down.
 
  #7  
Old 07-19-2005, 10:56 PM
MemOrex's Avatar
MemOrex
MemOrex is offline
Postmaster
Join Date: Jun 2004
Location: B/CS, Texas!!
Posts: 2,665
Likes: 0
Received 0 Likes on 0 Posts
Yikes, I never thought heat would actually turn off the systems....they must've been scorchin' hot.
 
  #8  
Old 07-20-2005, 08:11 AM
FTE Ken's Avatar
FTE Ken
FTE Ken is offline
Post Fiend
Thread Starter
Join Date: Jan 1997
Location: Enjoying the real world.
Posts: 23,165
Likes: 0
Received 7 Likes on 6 Posts
The system that kept dying had 4 gigabytes of RAM, two CPUs, a very large enterprise level SCSI hard drive controller (with its own CPU and RAM) and 4 15K RPM hard drives and 4 10K RPM hard drives. The hard drives, because they spin faster than your typical hard drive, generate 2-3 times more heat so its like have about 16-20 hard drives in a case. There's a LOT of heat generated... the power supply for this sucker is a 650 watts.
 
  #9  
Old 07-20-2005, 10:16 AM
Howdy's Avatar
Howdy
Howdy is offline
Posting Guru
Join Date: Jun 1999
Location: Oregon
Posts: 2,007
Likes: 0
Received 0 Likes on 0 Posts
It figures this would happen just before the Rally..

Amazing what heat will do to equipment. One of the sites I work on is filled with rack mounted microwave Rx/Tx pairs for TV classes around the state. It had an A/C problem a few years ago and the heat caused the power supply to fry in each unit. Lost about 3 a day. Brand new, state of the art PS at $1500 each.

This stuff happens. I've found cheap insurance by keeping 4 box fans with extention cords in storage. Take covers off and keep the air flowing.
 
  #10  
Old 07-20-2005, 12:52 PM
Jag Red 54's Avatar
Jag Red 54
Jag Red 54 is offline
Logistics Pro
Join Date: Oct 2002
Location: Valley Center, CA
Posts: 4,485
Likes: 0
Received 2 Likes on 2 Posts
I believe that when the problem is finally eliminated, they will find that there were some Chevy parts installed somewhere in the system. Jag
 
  #11  
Old 07-20-2005, 08:42 PM
fordborn's Avatar
fordborn
fordborn is offline
Post Fiend

Join Date: Jan 2004
Location: Northwest, Arkansas
Posts: 6,610
Likes: 0
Received 0 Likes on 0 Posts
Our main server area at work we have to maintain at 55 degrees. Heat is the biggest killer of IT equipment for sure. They will let an office AC go out and spend a week making sure you do the paperwork right to get a replacement but let one of the AC units in the IT room go out and you better have a new one in that day. There are 3 5-ton units for this room, one will do but the others are standby and they have us cycle through them every week to ensure all will work if and when needed.
 
  #12  
Old 07-20-2005, 10:16 PM
FTE Jason's Avatar
FTE Jason
FTE Jason is offline
Site Administrator

Join Date: Jan 2005
Posts: 623
Likes: 0
Received 0 Likes on 0 Posts
Talking

Yeah but the good news is that you just saved a bunch of money on car insurance by switching to Geico...


Ba-dump-bump
 
  #13  
Old 07-20-2005, 10:38 PM
MemOrex's Avatar
MemOrex
MemOrex is offline
Postmaster
Join Date: Jun 2004
Location: B/CS, Texas!!
Posts: 2,665
Likes: 0
Received 0 Likes on 0 Posts
hehehe, lol
 
  #14  
Old 07-21-2005, 02:48 AM
CowboyBilly9Mile's Avatar
CowboyBilly9Mile
CowboyBilly9Mile is offline
Post Fiend
Join Date: Feb 2003
Location: Eastern WA
Posts: 6,940
Likes: 0
Received 2 Likes on 2 Posts
Just out of curiosity, are there any plans or thoughts to install redundant A/C systems?
 
  #15  
Old 07-21-2005, 10:40 AM
FTE Ken's Avatar
FTE Ken
FTE Ken is offline
Post Fiend
Thread Starter
Join Date: Jan 1997
Location: Enjoying the real world.
Posts: 23,165
Likes: 0
Received 7 Likes on 6 Posts
Don't know, its not our facility so I don't know what their plans are. I do know that the portable unit is going to stay even after this unit is replaced. There are several units in the place already.
 


Quick Reply: Server downtime, July 18-19, 2005



All times are GMT -5. The time now is 05:38 PM.