Overview

Most banks run a core ledger that predates the internet. The system works, the regulator is comfortable with it, and replacing it “in one go” has failed at every bank that has tried. The realistic path is incremental: build a modern integration boundary around the legacy core, route new functionality through it, and retire pieces of the core when they no longer have callers.

This is the strangler fig pattern, named after the vine that grows around a host tree until the tree is gone. The vine in our case is a modern integration platform; the tree is the mainframe.

Don’t modernise what works

The mainframe is not the problem. The problem is that every new product team has to learn COBOL conventions, CICS COMMAREA layouts, and EBCDIC encoding to integrate. Modernising means hiding the mainframe behind a clean contract — not necessarily replacing it. Some banks run a strangled core for a decade before retirement, and that is fine.

What “legacy” means

The term covers a small number of recurring shapes:

  • CICS / COBOL on z/OS. Transactional, COMMAREA-based, called via 3270 emulation, MQ, or CICS Transaction Gateway.
  • IMS DB/DC. Hierarchical database, transactional, called via MQ-IMS bridge or APPC.
  • AS/400 (IBM i). RPG programs, DB2/400, often called via remote-program-call or message queue.
  • File-drop integration. Fixed-width or CSV files dropped on FTP/SFTP, processed nightly.
  • Vendor packages with no API. Closed core banking systems where the only integration point is a database table or a flat file.

The pattern is the same regardless of shape: put a façade in front, normalise the contract, route new traffic through the façade, retire the legacy entry point.

Strangler fig

Three properties make this work:

  • The façade contract is stable. Consumers don’t know which calls hit the new service vs the legacy core. Migration is a backend concern.
  • Routing is feature-flagged. Per-customer or per-percentage routing lets you migrate 1% → 10% → 100% with quick rollback.
  • Each phase is independently shippable. No big bang. If “customers” takes 18 months to migrate, accounts and payments can wait.

Anti-corruption layer

The façade and the legacy system speak different languages. The anti-corruption layer (ACL) is the translation boundary: legacy concepts (account types as 2-character codes, dates as YYYYDDD Julian, currency as numeric position) get mapped into modern equivalents on the way out, and modern shapes get mapped back on the way in.

The ACL is a separate component, not part of either side. Burying it inside the façade leaks legacy concepts into the API contract; burying it inside the legacy adapter ties the new code to mainframe quirks forever.

MQ ↔ CICS bridge

The most common integration topology: a modern service places a request on an MQ queue, the MQ-CICS bridge picks it up, invokes a CICS transaction with the request as COMMAREA, and puts the response on a reply queue.

JMS producer in the new service

AccountQueryClient.javajava
public AccountBalance query(String accountId) throws JMSException {
  JMSContext ctx = cf.createContext();
  Queue request = ctx.createQueue("queue:///REQUEST.QUEUE");
  TemporaryQueue reply = ctx.createTemporaryQueue();

  JsonObject body = Json.createObjectBuilder()
      .add("op", "GETBAL")
      .add("accountId", accountId)
      .build();

  TextMessage msg = ctx.createTextMessage(body.toString());
  msg.setJMSReplyTo(reply);
  msg.setStringProperty("_format", "MQSTR");

  ctx.createProducer().send(request, msg);

  Message response = ctx.createConsumer(reply).receive(5000);  // 5s timeout
  if (response == null) throw new TimeoutException("CICS no reply");

  return AccountBalance.fromJson(response.getBody(String.class));
}

COBOL COMMAREA layout

ACCT001.cblcobol
      * Request COMMAREA: 80 bytes
       01  WS-REQUEST.
           05  WS-OP            PIC X(8).
           05  WS-ACCOUNT-ID    PIC X(16).
           05  WS-CHANNEL       PIC X(4).
           05  FILLER           PIC X(52).

      * Response COMMAREA: 200 bytes
       01  WS-RESPONSE.
           05  WS-RC            PIC 9(4).
           05  WS-RC-MSG        PIC X(40).
           05  WS-BAL-AVAIL     PIC S9(15)V99 COMP-3.
           05  WS-BAL-LEDGER    PIC S9(15)V99 COMP-3.
           05  WS-CCY           PIC X(3).
           05  WS-LAST-TXN-DATE PIC 9(7).

The bridge marshals JSON to/from this layout. The mapping is rote but worth scripting in CI — field-level test coverage on the marshalling code is the difference between a successful migration and a 3am incident.

Migration steps

  1. Inventory the entry points

    List every system that calls the legacy core. Channel, protocol, frequency, peak rate, who owns it. This is usually 50–200 entry points and the inventory itself takes a quarter.

  2. Identify the seams

    Group entry points into bounded contexts (accounts, payments, customers, cards). Each seam becomes a façade.

  3. Build the façade

    One façade per seam. Initially every operation routes to the legacy core via MQ-CICS. The façade exists only to stabilise the contract.

  4. Cut over consumers to the façade

    Migrate every consumer from direct mainframe access to the façade. This is the boring, long phase — budget for it.

  5. Build the new service for one capability

    Pick a low-risk, high-volume capability (read-only first — balance enquiry beats funds transfer). Build it independently, dual-write to legacy during shadow phase.

  6. Switch the façade to route to the new service

    Behind a feature flag, route a small percentage of traffic to the new service. Compare responses against legacy. Ramp up.

  7. Retire the legacy entry point

    Once 100% of the capability is on the new service for some sustained period, remove the legacy code path. This is the only step that frees up real mainframe MIPS.

Data shape

Three legacy data conventions trip up almost every migration:

  • EBCDIC vs ASCII. The mainframe speaks EBCDIC; everyone else speaks ASCII (or UTF-8). Conversion happens in the bridge or the channel; get it wrong and currency symbols, accents, and special characters break silently.
  • Packed decimal (COMP-3). COBOL stores monetary values as packed BCD — not native binary or decimal string. Bridges must unpack to BigDecimal with the correct scale.
  • Date formats. Julian dates (YYYYDDD), four-digit year ambiguity, missing time-of-day. Convert at the boundary, store ISO 8601 internally.

Cutover patterns

PatternHow it worksWhen to use
ShadowWrite to both, read from legacy, compareBuilding confidence in the new system
Dual-writeWrite to both, read from newAfter shadow proves equivalence
Read-from-newStop writing to legacy, new is system of recordOnce dual-write is stable
Big-bangCut over in one window, no fallbackAlmost never; very small surface only
Reconcile, don’t trust

During shadow and dual-write, run a reconciliation job that compares legacy and new state daily. Any drift — even “just one record” — is a bug. Track drift count as a release-gate metric. Without reconciliation you discover migration bugs from customer complaints, which is too late.

Common pitfalls

Underestimating throughput mismatch

The mainframe is slower per transaction than a modern service but absorbs huge concurrent loads. The new service is faster per transaction but falls over at the legacy concurrent load. Load-test the new service to mainframe peak, not average.

The ACL becomes the legacy

If the ACL accumulates business logic (“round amounts to 2 dp for partner X but 4 dp for partner Y”), it ossifies. Keep the ACL purely about translation; push business logic to either side.

No rollback plan

Once the new service is system of record and legacy has stopped writing, rolling back is hard or impossible. Define the “point of no return” explicitly; require explicit sign-off; keep the legacy path warm for a quarter past the cutover.

COBOL knowledge fade

The engineers who wrote the original code retire. Codify their knowledge in tests against the bridge before they leave — otherwise you discover undocumented behaviour by re-implementing it incorrectly.

When not to migrate

Strangler fig is the right pattern when the legacy code is a constraint on shipping new product, scaling, or hiring. It is the wrong pattern when:

  • The legacy works and is invisible. A 30-year-old batch job that runs at 2am, processes 50M transactions, and never fails is not a problem. Don’t modernise it just because it’s old.
  • Regulatory certainty depends on legacy behaviour. If the regulator is comfortable with the current system and a re-implementation would require re-certification, the cost calculus shifts dramatically. Document and ringfence; don’t modernise.
  • The team has no mainframe expertise left. A migration done without anyone who understands the legacy code will introduce subtle bugs that are impossible to diagnose. Hire or train first; migrate second.
  • Volumes are small and the cost of running it is low. A small AS/400 system that costs less to keep than to migrate is a candidate for “leave alone forever.”