DreamHost Outage

DreamHost experienced some pretty significant downtime yesterday. It started with network maintenance. When they brought a switch back online, it was missing configuration files. Seems kinda silly for a hosting company to not have backups of EVERYTHING.

Anyway, longren.org and various other sites I host at DreamHost were down for a good part of the day. Even the DreamHost Status blog was unavailable.

It had been a long time since a significant DreamHost outage prior to this event. Hopefully it will be a long time before we see an outage like this again.


More Dreamhost Network Issues

Dreamhost is recovering from yet another network outage. Apparently a switch had issues and they had to take a few servers off that switch. Supposedly the servers were moved over to another temporary switch so they’d have network access, but I haven’t had access to longren.org for at least 2 hours now.

We have moved the servers in the affected rack to knew switches in other racks temporarily while we get a new switch in place. We are working on recovery efforts now on servers which may be down. The list of affected servers is:

Arrow, mel, caesar, herod, alondra, overland, rossmore, oxnards, cerritos, selma, nala, demeter, jarvis. This is a mix of 3 MySQL servers and the rest webservers.

Longren.org is hosted on oxnard unfortunately, that box seems to have a lot of issues, even when the Dreamhost network is functioning properly. They’ve now managed to replace the problematic switch and are working on bringing servers back online.

Sorry about no timestamp before, it’s 12:36 Pacific time (-0800? I never could keep track of daylight savings time.)

We have completely replaced the switch and are working on getting the servers back online. All networking cables are back in their regularly scheduled switch ports.

I always get a kick out of the angry comments that show up on the Dreamhost Status blog whenever a fairly widespread problem occurs. Of course, Dreamhost hasn’t had great reliability in the last few months. I feel bad for those trying to run a business with Dreamhost.

One quick note, WordPress 2.0.5 should be here soon.

UPDATE: Apparently there’s still trouble. Now Dreamhost is saying the servers affected are still behaving abnormally. This box, oxnard, doesn’t appear to be affected though, for once! They’re still updating that same post at the Dreamhost Status blog:

Wierdness is afoot. The servers are still exhibiting the same problems, we have even moved two of them into our new datacenter and it’s the same deal. It’s only specific servers in half of one of our racks.

UPDATE 2 @ 8:52 PM CST: Dreamhost now says all their problems are over. Which they do appear to be. I believe the K2 site is hosted at Dreamhost, I noticed it was down for a while but has since been revived. Here’s the update from the Dreamhost Status blog:

As of 1700 Pacific time (GMT-0800 or so, 5PM), we think the issues with this rack are behind us. We moved some machines to new hardware and this seems to have fixed the problems. We will be looking in to this more on monday, and doing some extensive testing on that hardware to make sure it was the root cause of the problem.

Hopefully this was just a minor fluke and we don’t see any more problems. Like I said before, Dreamhost has been less than stable these last couple months. Honestly though, I have noticed this site loads much quicker since Dreamhost fought with their network last time. Still, there’s a lot of recent posts categorized as “system outages” at the Dreamhost Status blog.


Dreamhost Has A Slow Network

The issues I spoke about in Poor Site Performance probably aren’t related to any content within this site, as I had previously thought. It’s more likely related to the shabby state of Dreamhosts network.

I am now blaming the Dreamhost network for the poor performance of this site for the following reasons.

1. Intermittence:
This one is simple. Sometimes single posts will load in a snap. Other times they take up to 30 seconds to display. Same with the index page, archives, about, and contact pages. The contact and about pages are relatively void of content. There’s really no reason for slowness or lag when loading those pages, there’s not much info to display there.

2. Other Pages: Other pages at the longren.org domain also load slow at times. Take Mint for example. The Mint dashboard fails to load or loads very slowly when this blog is also responding very slowly. It takes up to a minute to refresh some very basic peppers from within Mint.

3. MySQL Is Usually OK: If I’m not mistaken, Dreamhost has it’s MySQL databases on separate servers. In other words, a machine that serves http requests doesn’t server MySQL data at the same time, another entirely separate box would handle the MySQL responses. My thought here is that Dreamhost might have their MySQL serving machines on a different network entirely, one that might not be having issues the rest of their network is experiencing. I could be way off here though, entirely speculation.

The Dreamhost Status site has been unavailable for most of the day. There’s normally a blog hosted there. Currently, there’s simply a few paragraphs of text explaining the current situation:

Network downtime Monday Night (09/11)
Monday night, we will upgrading our core networking equipment, which will result in some downtime of all services lasting approximately 30-45 minutes.
We’re expecting this to put an end to the network problems that were created by the power outages about a month ago..
– Sep. 8, 2006 3:30 p.m.

Oops! Temporary Status Site
We had a little goof and this will be our status site until the other machine can be resuscitated.
– Sep. 8, 2006 11:40 a.m.

So, as you can see, they’re relating this to the downtime experienced a month or so back when a generator caught fire. Hopefully the work they do on this coming Monday will fix the issues we’ve been having. It’s getting real old, real quick. Couldn’t happen at a worse time too, this site is starting to grow by great leaps and bounds. Hopefully growth won’t be affected by this poor performance.

UPDATE: This site was unreachable for about 12 hours lastnight/this morning. Here’s some new items that have been posted to dreamhoststatus.com:

Wilmington Failed Over
Wilmingtons second network card, which carries internet traffic, decided to give up the ghost. The server is on shiny new hardware now and websites are already starting to serve again. Our apologies for the downtime.
– Sep. 9, 2006 1:24 p.m.

Web Control Panel Issue Resolved
The network issues causing slowness for the web control panel for the web control panel have now been resolved. The network overall is more responsive and we will continue to make minor changes to keep everything working as well as possible. The major maintenance on Monday night will be essentially a complete replacement and upgrade of one of our core routers. In the meantime, we have been re-routing as much of our network traffic as possible through our other core router to improve overall performance.
– Sep. 9, 2006 2:48 a.m.

Web Control Panel Slowness
The network issues are also causing problems with the Web Control Panel for several users. We are looking into this and hope to have everything restored shortly.
– Sep. 8, 2006 8:56 p.m.

Network Problems Today
We have been experiencing Network Problems today. These problems are the same problems that have actually been happening since we first reported problems with our network. Unfortunately these problems have gotten worse today and are causing a majority of the downtime and slowness issues you are reporting today. These problems, and our attempts at fixing them, have been an ongoing effort. The maintenance Monday Night will be a big step towards resolving these problems completely. We are currently working on the network and all servers having problems and hope to improve the situation soon. Sorry about the downtime, we hope to have this all resolved soon.
– Sep. 8, 2006 5:45 p.m.