Myself, Coding, Ranting, and Madness

The Consciousness Stream Continues…

What do you do when the Network Fails?

5 Sep 2011 16:00 Tags: None

Here's a thought...given the increasing use of the Internet Protocol for just about everything, what do you do when the network goes down? I'm currently sitting in my office and, thankfully, it's only mid-afternoon - normally a time I hate due to the amount of time it feels like I've been here. Today, however, 4PM was punctuated by a switch or router somewhere upline of us going offline. I don't know quite how far up, but high enough that it was noticed very quickly and that, although the device came on quickly enough, it took several minutes for the devices to all reregister and be assigned IP addresses. Well, that's a guess - I'm kinda still waiting on mine to start working again.

I was really rather odd to hear some life in the corridor as, given the amount of work currently being undertaken here to do with various funding and equipment bids and such, people just immediately switched to working round the problem. I'm probably one of the few who thought to call ICT - not that I could, because all the phones were dead. Yes, we're on an entirely VOIP system; I'll quickly digress onto that actually - it's not only Ethernet controlled, it's also Ethernet powered - with just enough storage to for the JVM to attempt a clean shutdown. And, yes, the phones here run on Java. I'm not entirely sure why this should be the case, but it is - probably very much a comfort thing.

It was realising this that made me think about what other systems Imperial has patched through the intertubes and, as one might expect, they've been trying to get as much as possible through it. (Service update, even though I'm not going to be able to post this until after it's all back up - seem to be in a loop of things attempting to register IP, failing. Some IP config fault? Out of address space? My boss's phone seems to be about the only thing working...). The thing which really comes to mind is that all the security and alarm systems are now partially IP based - certainly the door authentication is. Much later in the day, and I wouldn't be able to get out of the building without using one of the emergency releases. Which, if a few people do, means that a large number of doors need to be guarded until they can be reset (they are physical overrides of the main locking systems). Also, automated response to alarms across the campus might also be degraded - I can't actually confirm or deny this, without looking up the plans in some form. But the thought is enough to make me nervous

So, what do you do if the network fails? Phoning the ICT desk is an idea - but only if you can, and they're not already being flooded with calls. If I ever get to run a systems support team, the emergency phone will have a big switch which just diverts all calls to a recorded message along the lines of 'Please be aware that we are currently tackling a major service outage.' to cut down the number of actually answered phone calls and, therefore, the number of people you need to keep back answering the phones.

P.S. Total Down-Time: 40 minutes. Slightly shabby without a good circumstance. Will look forward to the explanation.