Server Outage

I’d like to apologies for users of the sg server who experienced a 5 hour outage today. This should not have happened and was the result of deficiencies in my IT processes.

The number of surveys submitted to Smap hosted servers has been increasing over the last 3 years and is now averaging nearly 4,500 per day.

On the free sg server it is around 2,000 per day.  However on the 13th and 14th of June we received 11,897 submissions 80% of which included a high resolution image of a cocoa tree. This caused a big drop in the available disk space.

The IT process went:

  1. I received a text message that we were down to 4GB at 5pm UTC.
  2. Immediately ran some clean up scripts that freed up 9GB,
  3. I then added 60GB extra disk to the server.  However this disk is not made available until you reboot the server.
  4. The server was still being heavily utilised so I decided to wait until later in the evening to reboot when, given that most users are in Africa or Asia, the load should have been less.
  5. Then I forgot!

So instead of a 30 second outage we got 5 hours.  I will endeavour to ensure that this does not happen again.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.