Overview
Most banks run a core ledger that predates the internet. The system works, the regulator is comfortable with it, and replacing it “in one go” has failed at every bank that has tried. The realistic path is incremental: build a modern integration boundary around the legacy core, route new functionality through it, and retire pieces of the core when they no longer have callers.
This is the strangler fig pattern, named after the vine that grows around a host tree until the tree is gone. The vine in our case is a modern integration platform; the tree is the mainframe.
The mainframe is not the problem. The problem is that every new product team has to learn COBOL conventions, CICS COMMAREA layouts, and EBCDIC encoding to integrate. Modernising means hiding the mainframe behind a clean contract — not necessarily replacing it. Some banks run a strangled core for a decade before retirement, and that is fine.
What “legacy” means
The term covers a small number of recurring shapes:
- CICS / COBOL on z/OS. Transactional, COMMAREA-based, called via 3270 emulation, MQ, or CICS Transaction Gateway.
- IMS DB/DC. Hierarchical database, transactional, called via MQ-IMS bridge or APPC.
- AS/400 (IBM i). RPG programs, DB2/400, often called via remote-program-call or message queue.
- File-drop integration. Fixed-width or CSV files dropped on FTP/SFTP, processed nightly.
- Vendor packages with no API. Closed core banking systems where the only integration point is a database table or a flat file.
The pattern is the same regardless of shape: put a façade in front, normalise the contract, route new traffic through the façade, retire the legacy entry point.
Strangler fig
Three properties make this work:
- The façade contract is stable. Consumers don’t know which calls hit the new service vs the legacy core. Migration is a backend concern.
- Routing is feature-flagged. Per-customer or per-percentage routing lets you migrate 1% → 10% → 100% with quick rollback.
- Each phase is independently shippable. No big bang. If “customers” takes 18 months to migrate, accounts and payments can wait.
Anti-corruption layer
The façade and the legacy system speak different languages. The anti-corruption layer (ACL) is the translation boundary: legacy concepts (account types as 2-character codes, dates as YYYYDDD Julian, currency as numeric position) get mapped into modern equivalents on the way out, and modern shapes get mapped back on the way in.
The ACL is a separate component, not part of either side. Burying it inside the façade leaks legacy concepts into the API contract; burying it inside the legacy adapter ties the new code to mainframe quirks forever.
MQ ↔ CICS bridge
The most common integration topology: a modern service places a request on an MQ queue, the MQ-CICS bridge picks it up, invokes a CICS transaction with the request as COMMAREA, and puts the response on a reply queue.
JMS producer in the new service
public AccountBalance query(String accountId) throws JMSException {
JMSContext ctx = cf.createContext();
Queue request = ctx.createQueue("queue:///REQUEST.QUEUE");
TemporaryQueue reply = ctx.createTemporaryQueue();
JsonObject body = Json.createObjectBuilder()
.add("op", "GETBAL")
.add("accountId", accountId)
.build();
TextMessage msg = ctx.createTextMessage(body.toString());
msg.setJMSReplyTo(reply);
msg.setStringProperty("_format", "MQSTR");
ctx.createProducer().send(request, msg);
Message response = ctx.createConsumer(reply).receive(5000); // 5s timeout
if (response == null) throw new TimeoutException("CICS no reply");
return AccountBalance.fromJson(response.getBody(String.class));
}
COBOL COMMAREA layout
* Request COMMAREA: 80 bytes
01 WS-REQUEST.
05 WS-OP PIC X(8).
05 WS-ACCOUNT-ID PIC X(16).
05 WS-CHANNEL PIC X(4).
05 FILLER PIC X(52).
* Response COMMAREA: 200 bytes
01 WS-RESPONSE.
05 WS-RC PIC 9(4).
05 WS-RC-MSG PIC X(40).
05 WS-BAL-AVAIL PIC S9(15)V99 COMP-3.
05 WS-BAL-LEDGER PIC S9(15)V99 COMP-3.
05 WS-CCY PIC X(3).
05 WS-LAST-TXN-DATE PIC 9(7).
The bridge marshals JSON to/from this layout. The mapping is rote but worth scripting in CI — field-level test coverage on the marshalling code is the difference between a successful migration and a 3am incident.
Migration steps
-
Inventory the entry points
List every system that calls the legacy core. Channel, protocol, frequency, peak rate, who owns it. This is usually 50–200 entry points and the inventory itself takes a quarter.
-
Identify the seams
Group entry points into bounded contexts (accounts, payments, customers, cards). Each seam becomes a façade.
-
Build the façade
One façade per seam. Initially every operation routes to the legacy core via MQ-CICS. The façade exists only to stabilise the contract.
-
Cut over consumers to the façade
Migrate every consumer from direct mainframe access to the façade. This is the boring, long phase — budget for it.
-
Build the new service for one capability
Pick a low-risk, high-volume capability (read-only first — balance enquiry beats funds transfer). Build it independently, dual-write to legacy during shadow phase.
-
Switch the façade to route to the new service
Behind a feature flag, route a small percentage of traffic to the new service. Compare responses against legacy. Ramp up.
-
Retire the legacy entry point
Once 100% of the capability is on the new service for some sustained period, remove the legacy code path. This is the only step that frees up real mainframe MIPS.
Data shape
Three legacy data conventions trip up almost every migration:
- EBCDIC vs ASCII. The mainframe speaks EBCDIC; everyone else speaks ASCII (or UTF-8). Conversion happens in the bridge or the channel; get it wrong and currency symbols, accents, and special characters break silently.
- Packed decimal (
COMP-3). COBOL stores monetary values as packed BCD — not native binary or decimal string. Bridges must unpack toBigDecimalwith the correct scale. - Date formats. Julian dates (
YYYYDDD), four-digit year ambiguity, missing time-of-day. Convert at the boundary, store ISO 8601 internally.
Cutover patterns
| Pattern | How it works | When to use |
|---|---|---|
| Shadow | Write to both, read from legacy, compare | Building confidence in the new system |
| Dual-write | Write to both, read from new | After shadow proves equivalence |
| Read-from-new | Stop writing to legacy, new is system of record | Once dual-write is stable |
| Big-bang | Cut over in one window, no fallback | Almost never; very small surface only |
During shadow and dual-write, run a reconciliation job that compares legacy and new state daily. Any drift — even “just one record” — is a bug. Track drift count as a release-gate metric. Without reconciliation you discover migration bugs from customer complaints, which is too late.
Common pitfalls
The mainframe is slower per transaction than a modern service but absorbs huge concurrent loads. The new service is faster per transaction but falls over at the legacy concurrent load. Load-test the new service to mainframe peak, not average.
If the ACL accumulates business logic (“round amounts to 2 dp for partner X but 4 dp for partner Y”), it ossifies. Keep the ACL purely about translation; push business logic to either side.
Once the new service is system of record and legacy has stopped writing, rolling back is hard or impossible. Define the “point of no return” explicitly; require explicit sign-off; keep the legacy path warm for a quarter past the cutover.
The engineers who wrote the original code retire. Codify their knowledge in tests against the bridge before they leave — otherwise you discover undocumented behaviour by re-implementing it incorrectly.
When not to migrate
Strangler fig is the right pattern when the legacy code is a constraint on shipping new product, scaling, or hiring. It is the wrong pattern when:
- The legacy works and is invisible. A 30-year-old batch job that runs at 2am, processes 50M transactions, and never fails is not a problem. Don’t modernise it just because it’s old.
- Regulatory certainty depends on legacy behaviour. If the regulator is comfortable with the current system and a re-implementation would require re-certification, the cost calculus shifts dramatically. Document and ringfence; don’t modernise.
- The team has no mainframe expertise left. A migration done without anyone who understands the legacy code will introduce subtle bugs that are impossible to diagnose. Hire or train first; migrate second.
- Volumes are small and the cost of running it is low. A small AS/400 system that costs less to keep than to migrate is a candidate for “leave alone forever.”