Make it faster

This past week or so I’ve been concentrating on improving the performance of our system. Thankfully this isn’t a solo task and my teammate really knows what he’s doing. Luckily I was able to contribute based on my past experiences with things like cache settings in Apache, ulimit settings and other tweaks here and there.Make it fast like Pole Postion fast!

We’ve uncovered a slew of things to work on. The first obvious place was to have the static content served by Apache rather than the mongrels. This has been on the todo-list for a long time, but has always fallen to the wayside. We knocked that off and looked towards the other items that were slowing us down. That’s when I uncovered some TCPSocket weirdness within memcache-client 1.7.4 that comes with activesupport 2.3.5.

During some tests we noticed a severe lag which we narrowed down to the fact that we were pointing to a list of memcached servers that contained one that didn’t exist. Turns out that when the memcached hostname resolves, but is not pingable, memcache-client 1.7.4 waits 3 seconds before responding with an error message (in addition it doesn’t mark the server as “down”, which I think is also a bug). This 3 second delay happens on RHEL 4, and in some brief tests on Ubuntu 10.9, it was even worse, taking over 30 seconds to respond. My guess is that there is some OS level setting that affects this, but I have yet to locate it. The fun part however, is that this problem does not exist in our old environment where we run acivesupport 2.1.2 which uses memcache-client 1.5.0.

Turns out, in 1.5.0, the memcache-client uses a 250ms timeout when calling TCPSocket.new. Something that was lost on the way to 1.7.4. Some initial tests of simply adding this CONNECT_TIMEOUT back in have been promising. It’s currently not throwing the right exception or marking the server as “down”, but once I do that, I will see about posting the source somewhere.