2008-11-19

Deployment success

For the past two years I’ve been leading the charge on some technology that will help expediate some of our deployment processes into our intranet portal. While I can’t brag about the technology itself - it’s nothing unique (AJAX, JSON, REST, etc) the positioning of it - as a reusable portlet container that works along side other JSR168/JSR286 portlets - has the change to save a lot of people a lot of money (even if it’s just “blue” dollars). We created a combination JSR168 portlet and an AJAX proxy which allows us to configure the portlet to point at and retrieve data (html/css/js) hosted outside the portal environment and bring it in asynchronously - making it appear to the user that it is the same as every other portlet on the page. The key features of this solution are:

Content can be hosted outside the portal environment meaning that the portlet owner can update the content according to their own processes and not be dependant on the portal environment update schedule.
Isolation - Since each portlet’s content is accessed asynchronously (via AJAX from the browser) the entire page doesn’t need to wait for all portlets to load before sending the page to the browser. Note: A very similar feature is available in WebSphere Portal 6.1: Client Side Aggregation (CSA)
Reduced complexity - Only HTML, JavaScript and CSS skills are required. No need to have Java development skills, learning about portlet.xml or other portlet specifics.
Easy deployment - New porltets using this framework do not need to deploy any new code to the portal environment. Creating a new portlet becomes as simple as copying/cloning the portlet container and configuring it to point at the new content.
Enforces SOA - Since the portlet is only a container for HTML, JavaScript and CSS, there is no way to, say, directly access a database. This is actually a good thing because it forces developers to create services. It’s one of my not-so-hidden agendas to get more services deployed to get at more data - to make mashups & situational applications easier to create.

The point of this story was to talk about how this stuff was successfully deployed on November 15. It actually went much smoother than I had expected, with only a minor wrinkle that still needs to be examined.

Murphy’s Law

Like other deployments, this was done on Saturday, which meant that Monday would be our true “first test” exposure to the IBM population. And of course, Murphy’s Law was in full effect. By 10am we determined that the service we were accessing was having problems… serious problems. It was hanging. If you develop anything on the web, you know that hanging is the worst of all problems. It would have been better if the server had blown up (seriously). No fear - we had built in timeouts set on our AJAX proxy to handle just such an occasion (see how smart we are?). As designed, the portlet displayed a spinner, then eventually an error message that it couldn’t connect the service.

A major step forward

This is all goodness and proves a major step forward for our production environment. To date we’ve been become so stringent in our performance requirements that we don’t even have many portlets that make actual service calls (most use JDBC to talk directly to a database). With this new ability to properly timeout backend service requests, we have the opportunity to distribute services in a way that we’ve always wanted. Unfortunately the service we were calling was hung for almost the entire rest of the day. When the servers were finally rebooted our portlet & AJAX proxy started returning error messages right away, again, as expected. Like I said, a hung server that doesn’t respond to a request for a very long time is the absolute worst case. Once the servers came back up everything worked as expected and requests/responses were flowing beautifully. While all this was going on, there was zero impact to the portal environment and other portlets on the same page rendered without a problem. Now that’s cause for celebration!

Configuration recommendations

Even though the portlet handled everything as expected, we have made some recommendations to the production team to handle things in an even better way should this situation arise again: Server configuration 1. WebContainer Max Threads - increase from 50 to 125

125 is more in line with the settings on other WAS clusters

2. Session Timeout - decrease from 30 to 10 minutes

This application is stateless, therefore sessions are not needed, but we can’t turn them off (WAS limitation)

Connection timeouts - XML updates within the WAR/EAR 1, Ajax Proxy - decrease from 20 to 10 seconds

Decreasing will allow us to handle more connections when the service is in a hung state (Currently seeing values in the 15 - 90ms range)

2. Skeleton Proxy - decrease from 20 to 3 seconds

Decreasing this value as the content is coming from ODW static content & it should never take longer than 3 seconds. (Currently seeing an average of 35ms)