Dan’s View

2009/09/21

The issues with noisy systems

Filed under: work — Tags: , — dan @ 17:32

Today I felt like I completely failed at my job. I was ‘on shift’, therefore I was supposed to be keeping an eye on all of the systems at my work and their health. Yet I failed to notice an issue ‘that was glaring me in the face’ for over 3 hours, due to the ‘noise factor’.

At the beginning of my shift I noticed some parts of our monitoring had been semi-broken for a large part of the weekend. So I focused on getting that fixed. Once I had that fixed that my monitoring systems were showing about 2000 alarms, which is high. But due to the large chunk of the infrastructure that had gone un-monitored over the weekend I didn’t think much of it. Alongside this there was some database maintenance ongoing.

So how did the noise fail me, well the number of alarms for databases should have been glaringly obvious. But due to the monitoring and maintenance issues, I didn’t take heed. When you are used to having between 100-200 alarms, and you have a system reporting 2000; I find that it gets very difficult to get a handle on the real problems.

Overall I know I didn’t fail, but for the next few hours I am going to be beating myself up about it.

2009/05/05

Wasted BogoMIPS

Filed under: Geek,linux,work — Tags: , , , — dan @ 14:15

So last week I screwed up in a royal fashion. I caused 1 core on 12,000 servers to be running at 100% with a zombie process. In conversation with a friend he asked “So how many BogoMIPS were you wasting?”, of course I had to find out.

So I grepped out the first core on each box’es BogoMIPS value and summed it up.

Per second: 61099300 BogoMIPS

If you take that figure and multiply it up for 12 hours.

Total: 2639489760000 Million Instructions

For more information on BogoMIPS please see this nice FAQ.

Powered by WordPress