Retiring a Zombie App

5 Jun

Herewith a short offering on how to kill zombies.  Even in an advanced civilization that has given us penicillin and “Here Comes Honey Boo-Boo”, zombie apps, neither growing nor dying, can still wander in and out of our resource matrixes with impunity, consuming our brains, our time, our memories (both human and mechanical), and of course our enterprise’s money.  Such was the case with Custodian Data Interface I.

Introduced, in tech-time, several geological epochs ago (okay, it was 2007), CDI 1 was a solid workhorse essential to getting Fortigent’s indigenous back-end suite of data acquisition and core database apps off the ground.  We used to have Advent DataExchange feeding into Advent Axys.  Now instead we would be in a world of CDI 1 feeding into ARI.  Raise the new flag.

Then, in 2009, CDI 2 was born.  Eventually all our Evare feeds were cut over to CDI 2, and its evolution has continued — with our first direct custodial feed, expanded IDC links, and now the beginnings of hedge fund data.

What made CDI 1 a zombie was our tardiness in cutting over one of our last and smaller data providers, DSCS.  When an app neither lets you down nor distinguishes itself with its growth potential, it can simply linger in the twilight.  That’s what CDI 1 did for nigh on 4 years, until last month, when at last the nails started being pounded into the coffin.  Here’s how:

First, we realized that permanently retiring CDI 1 was a down payment on a larger process of cleanup.  Our complement of applications has grown radically in the last 5 years.  Rationalization and streamlining are inevitably needed.  Here was some low-hanging fruit (after all, the replacement app was already working).  What was needed was willpower.  Guru Rao supplied that.

Second, knowing that this was a one-time commitment which would reap permanent benefits meant the resource allocation could be justified.  I was given primary responsibility, joined by Sarada Aragonda for testing and Bob Lewicki for file re-routing.  As to timeframe, we had no hard deadline, but we had a mandate: move slow or move fast, but don’t stop moving.  Get it done.

Third, we had to have a plan that would be quick, comprehensive, non-invasive of applications or existing operations, and as close to paralleling the expected production experience as possible.

1. In preparation, about 1,500 transaction translations and security translation mappings had to be ported from CDI 1 to CDI 2.  This required integrating data from 4 tables in CDI 1 to reside in just 2 tables in CDI 2.

2. We used a ‘pre-positioning’ strategy.  If you know you will need something later and can harmlessly incorporate it into the Production platform now, do it.  In this case, the mappings were loaded to Production and then started flowing out with the weekend DB refreshes to the cloud test environments, preventing the need for weekly reloading.  Jan Corsetty stepped in here to help.  Any necessary tweaks found during testing would be done to Prod at the same time, and those fixes would then propagate outwards in perpetuity.  This contributes to a cycle of across-the-board quality improvement and makes Production and cloud into near-exact mirror images of each other (with the primary exception of data obfuscation), improving the realism of the test experience.

3. Then, for each of 42 DSCS custodians, a test file was created whose purpose was to stress test every single mapping for its expected outcome in the new system.  Jenny Zhang from Operations ran point with any questions. This efficient move acted as both ‘outcome testing’ and a form of user-acceptance testing, with myself as proxy for the users.

MSSDTest

4. Expected results were mapped out in advance:

Expected

5. For each custodian being tested, a dedicated test account was used. These were easily and efficiently created as ‘TBD’ accounts by the very files being used for testing. For instance, for Morgan Stanley, downloading in files tagged with the letters “MSSD”, MSSDTEST was the artificial test account (above).  All Morgan Stanley test transactions would go through this account.  Once processed, they could simply be viewed through ARI and compared with expected results.  Note that each transaction was given a unique numerical tag that served as its quantity, its dollar amount and its Comment field contents. This made sorting and identifying results quick and easy.

With this method, we were able to advance from loading mappings on April 26, to file testing throughout the first week of May.  By its end, all but two anomalies localized to one custodian had been resolved, making cutover a reasonable decision.  Cutover began on May 14 with 40 custodians (2 more followed).  The data flow is now completely shunted to CDI 2, with the data capture component slated to be replaced within a few weeks.  Then the dirt goes on the coffin.

Few cutovers are flawless, and this one did experience minor hiccups, but after the go-live date all problems encountered were easily diagnosed and remediable.  The best sign of a well-done cutover experience is that it be an operational non-event.  With methodical planning and teamwork, the retirement of CDI 1 falls into that category.

Rest in peace, CDI 1.  And don’t come back.

One Response to “Retiring a Zombie App”

  1. Jamie McIntyre (@Jamie_McI) June 5, 2013 at 4:24 pm #

    The King is Dead! Long Live the King!

Leave a comment