Tuesday, October 24, 2006

Just another bad day...

Disturbed sleep. Woke up 1.5 hrs late. Time only to brush teeth and grab cereals. I knew it was going to be a bad, bad day. It was written in big bold letters.

Reached office on time. Felt dirty. Unshaven and unbathed. But things went smooth first half. Time dragged on. Almost like the lull before the storm. Pizza for lunch. Yummy.

Clock strikes 2. As part of install verification, I await for a file on a specific folder as an output of a job. Nope. 2.01 pm. Nope. Heart beat increases. 2.02 pm. Heart beat increases even more. The clock strikes like a deadly time bomb. I knew something had gone wrong. First beads of sweat and panic. Faintly feeling.

I rush to a prod box to get logs of the job. Sure enough, it had failed. Logs have insufficient explanation. I ping someone else to provide more logs. And then, at 2.10 pm, the truth is out there. Mistake in the code. Its laughing at me, mocking at me. A series of ‘Oh My God’ comes out of my lips and I let my head fall on my hands and I stay like that for the shock to pass.

Cubicle mates offer help, try to soften the blow. ‘Want anything from Starbucks?’ So sweet. But no thanks. Things are to be done. Escalations. I shoot out a mail, succinctly describing the issue, the cause and the resolution.

Word spreads fast. Within minutes, I am asked on phone with senior management explaining the situation. People all over are pulled in to explain the protocol of a production incident and its fix. Quick meeting. Ten powerful people have ten different things to say.

It was 3 pm within minutes. Amazing how time flies at times of crisis. A hundred dependent jobs were waiting for my job to finish. A hundred people waiting to see how their jobs work. Tension was palpable. The meeting decided who has to do what.

Code fix had to go within 5 pm. Impact analysis and a thorough test needed to be done to ensure that no other wrong code exists. Help was taken all around to ensure things got done. Many VPs were paged at 4 pm on a Friday to approve tickets and ensure code push to production ASAP. ‘ASAP’ couldn’t have been better realized than today.

Code fixed, tested and checked for any more loop holes while bevy of senior people standing behind my neck to ensure I don’t screw it up again. Code pushed to prod and all set by 5. Job is made to run again at 5.15 pm. Time is 5.13 pm.

Never had I looked at time like this before. Never had I this feeling of time clicking away to another bomb. Had I defused all wires? Were there anymore? 5.16 pm. Job started. No file on folder yet. 5.17 pm. No file on folder yet. I swallowed. Ten pair of eyes were staring at the status of the job, and we knew a hundred others were waiting on the phone for our job to complete. Atlast, we could see the file on the folder!

A big sigh. Another failure, and I would have shot myself. Huge relief. Congratulations all around. Not a word of blame. Not an iota of ‘you-screwed-it-up’ thought. Everything was like team work. A big ‘Thank you’ from all for having got it fixed. Imagine that! I screw it up and I get thanks for fixing it! Wow.

Post mortem analysis and more install verifications led to completion of day’s tasks at 9 pm. It had been a long day. A day to forget. It was a sad day. But it was good that the issue was resolved, and I could have a good night’s sleep. It was an irony that amidst the thousands of complicated things that were thought out during development of the job, the job failed at one of the silliest points. Damn.

I came out of the office to get hit by chilly wind. My car is all that is there in the parking lot. I revved up the engine and started my way back to home. Thousand things were in my mind.

How do things work in other professions? A doctor doesn’t have dev int, rel int and acceptance testing before he goes to commence his operation. An engineer doesn’t have dev int, rel int and acceptance testing before thousands of gallons of water hit the dam or hundreds of vehicles go on a bridge.

I mean, work has to be 100% perfect, else its not going to be the right world to stay in. And for work to be 100% perfect, one has to be brilliant, truly brilliant to think about all possible scenarios and be absolutely infallible. A hundred jobs went live today, and it was humiliation to see just my job fail. All hundred were better than me. Says a lot about my brilliance, or the lack of it.

Am I in the right profession? Do I even fit in here? If I am more experienced, would such incidents never occur? I mean, man is not infallible, so I cannot say, with time, I would not do any mistakes. The incident just showed how truly insignificant I am in front of thousands and thousands of people who have done so much to this world, and I couldn’t even get a small, simple thing working. Jeez!

Mistakes in my profession can be fixed in next install, and the max harm that was caused is a wait time for a number of people. However, mistakes in professions like doctors and engineers cause lives to be lost. How can one live if such a thing occurs? Scene in Raju ban Gaya Gentleman props upto the mind.

The mind kept gnawing at such thoughts till I reached home. I wanted to pour out my anguish over myself to someone, but alas, there was no one at home. No one to talk to when I desperately needed one. Tummy rumbled, but there was no food. 10 pm on a bad, bad day, I set out with the yucky frozen parathas. Thank God, next day was a weekend.

I hit the sack and lay there, a defeated man. Cuddled up and tried to sleep with a sad shake of head.

3 comments:

Anonymous said...

Very nicely written!!

Guru said...

beautiful article kano... u r observing the minute things of life... effort much appreciated..and regarding u r article.. well i would say.. it is that tension and pressure which makes life challenging...and sometimes threatening as well

i do agree sometimes we tend to get a bit too confident and think we cant make mistake..and a check on what we have done will not identify it because we are confident that we havent made any mistake... um.. i am not sure if it happens with u.. it has happened many times with me

Anonymous said...

My sympathies Harsha. And perhaps in some ways, I am to be blamed for the fiasco. So apologies too.

I am not sure if I quite agree with some of your thoughts though. At the risk of sounding naive, I wish to add that we tend to take up software development a tad bit too seriously. Work can not always be 100% right.

Don't get me wrong. I am not trying to trivialize your situation. But since when did software development become so important in our lives that a puny little bug in a humongous million LOC application bacame so critical?

As the authors of Peopleware put it, "Fostering an atmosphere that doesn't allow for error simply makes people defensive....the world isn't going to stop just because a project completes a month late".

I see it as an attitude problem. Mistakes do happen. Mistakes "must" happen. Grady Booch once said, "Software is inherently complex". You can adopt means to harness the complexity but you can't wish it away.

We have been conditioned to think that adopting methodologies and processes will somehow take care of such problems. I have never heard a more stupid argument in the last 7 years of experience. It just doesn't work!!! How often have we seen checklists, test cases, code reviews work?

How much thought do we give on selecting the right people for the job? And giving them the right tools to work with? Or for creating a conducive environment where they are trusted and made to feel important?

It would be interesting to know what the fidings of the postmortem were. My experience is that more often than not, the problem lies with the attitude management adopts with software development and productivity than anything else. At the end of the day, software development is more a sociological field than technical.

Adopt a more human approach to the development process and you'll have a much better experience in this industry.