Overview

IBM API Connect is a policy-driven API management platform. The runtime that actually enforces policies and proxies traffic is DataPower Gateway — an appliance (physical, virtual, or container) that terminates TLS, applies policies expressed as an assembly, and forwards to a backend. Around the gateway sit three control-plane services: API Manager (authoring, plans, catalogs), Developer Portal (consumer onboarding), and Analytics (event ingest, reporting).

In financial services it shows up because three things matter at once: TLS/mTLS termination with strong cipher control, deterministic rate-limit and quota enforcement, and an audit trail that maps every request to a consumer, plan, and policy version. DataPower’s purpose-built XML/JSON processing engine and FIPS‑validated cryptography make those properties enforceable rather than aspirational.

Naming

The product family is IBM API Connect. The gateway component is DataPower Gateway (often deployed as the v5c or API Gateway service). When people say “IBM API Gateway” they almost always mean DataPower configured as an API Connect gateway.

Components

An API Connect deployment has four logical subsystems. Each can be scaled independently, and each owns a distinct concern in the lifecycle.

SubsystemConcernScaling
DataPower GatewayData plane: TLS, policy enforcement, routing, transformationHorizontal, stateless; size by RPS & payload
API ManagerAuthoring of APIs, products, plans, catalogs; publish to gatewayActive/active; size by author count, publish rate
Developer PortalExternal-facing site for consumer onboarding, app keys, docsHorizontal; size by consumer base
AnalyticsEvent ingest (OpenSearch) and dashboards for traffic, errors, latencyCluster; size by event volume and retention

Request lifecycle

Every request that traverses DataPower runs through the same pipeline: connection acceptance, policy assembly, backend invocation, response policy assembly, and response. Each stage can short-circuit. The lifecycle below is the one to internalise — almost every operational decision (where to log, where to enforce a quota, where to mask a field) is really a question about which stage.

Two practical implications:

  • Rate-limit before transform. A rejected request should consume zero CPU on payload parsing. Always place quota and security checks before any transformation policy.
  • Mask in the response stage, not the request stage. PII redaction belongs in post-invoke — the gateway has the full backend response and can drop fields before they cross the trust boundary back to the caller.

Deploying an API

The path from a local OpenAPI file to a published, consumable API runs through six steps. Each one is scriptable via the apic CLI, which is the only sane way to manage anything beyond a handful of APIs.

  1. Author the OpenAPI definition

    Write the spec as plain OpenAPI 3.0 YAML. The IBM‑specific behaviour (security, rate limits, assembly) lives under the x-ibm-configuration extension — the spec itself stays portable and reviewable.

  2. Define the assembly (policy chain)

    Inside x-ibm-configuration.assembly.execute, list the policies in order: set-variable, oauth, rate-limit, map, invoke, gatewayscript. The assembly is what DataPower actually executes; the rest of the OpenAPI spec is metadata.

  3. Wrap the API in a Product with Plans

    A Product groups one or more APIs. Plans inside the Product attach quotas (e.g. 1000/hour, burst 50/sec). Consumers subscribe to a plan, not an API directly — this is what makes per-consumer rate limits work.

  4. Stage to the Catalog

    A Catalog is an environment (sandbox, test, prod). Staging publishes the Product to that catalog’s gateway service in a pending state. At this point the gateway has the configuration but is not routing live traffic yet.

  5. Publish

    Publishing flips the staged Product to active and makes it visible in the Developer Portal. Existing in-flight requests are not affected; new requests pick up the new policy version.

  6. Subscribe and test

    Consumers register an application in the Portal, subscribe to the plan, and receive a client ID and secret. Calls to the gateway must include both, plus the agreed auth (OAuth bearer or mTLS client cert).

CLI: end-to-end publish

The pattern below is what runs in CI for every catalog promotion. Token, org, and catalog vary by environment; everything else is the same.

publish.shbash
# 1. Authenticate against the management cluster
apic login --server manager.api.example.com \
  --username $APIC_USER --password $APIC_PASS \
  --realm provider/default-idp-2

# 2. Validate the OpenAPI + assembly locally before pushing
apic validate payments-api.yaml

# 3. Push API and Product into the org draft area
apic apis:publish payments-api.yaml      --org acme-bank
apic products:publish payments-product.yaml --org acme-bank

# 4. Stage to the target catalog (no live traffic yet)
apic products:publish payments-product.yaml \
  --org acme-bank --catalog production --stage

# 5. Promote to active after smoke tests pass
apic products:publish payments-product.yaml \
  --org acme-bank --catalog production
Idempotency

products:publish without --stage will replace the active version. In a CI pipeline, always stage first, run a synthetic-traffic check against the staged endpoint, and promote in a separate step. Never let a single command both stage and activate in production.

OpenAPI & assembly

Below is a representative OpenAPI 3.0 file for a payments lookup. The gateway behaviour is entirely in x-ibm-configuration — the rest is the contract clients see.

payments-api.yamlyaml
openapi: 3.0.0
info:
  title: Payments Lookup
  version: 1.2.0
servers:
  - url: https://api.acme-bank.com/payments/v1

x-ibm-configuration:
  cors: { enabled: false }
  enforced: true
  testable: false
  phase: realized
  activity-log:
    enabled: true
    success-content: activity
    error-content: payload
  gateway: datapower-api-gateway

  assembly:
    execute:

      # Tag the request with a correlation ID for downstream tracing
      - set-variable:
          version: 2.0.0
          title: tag-correlation
          actions:
            - { set: message.headers.x-correlation-id,
                value: $(api.id)-$(message.headers.x-request-id) }

      # OAuth 2.0 bearer validation against an external authorization server
      - oauth:
          version: 2.0.0
          title: verify-bearer
          oauth-provider-settings-ref:
            name: acme-keycloak-introspect
          supported-oauth-components: [ OAuthValidateRequest ]

      # Plan-level quota; key derives from client_id + plan id
      - rate-limit:
          version: 2.0.0
          title: plan-quota
          source: plan

      # Strip internal headers before backend hop
      - set-variable:
          title: strip-internal
          actions:
            - { clear: message.headers.authorization }
            - { clear: message.headers.x-ibm-client-secret }

      # Forward to the internal core service
      - invoke:
          version: 2.0.0
          title: backend
          target-url: https://payments-core.svc.internal/v1$(request.path)
          timeout: 30
          verb: keep
          cache-response: protocol
          tls-profile: internal-mtls

      # Response post-processing: redact account numbers, drop debug
      - gatewayscript:
          version: 2.0.0
          title: redact-pii
          source: |
            var body = apim.getvariable('message.body');
            if (body && body.account)
              body.account = body.account.replace(/.(?=.{4})/g, '*');
            delete body.__debug;
            apim.setvariable('message.body', body);

The source: plan on rate-limit is the load-bearing detail. It tells DataPower to derive the quota key from the subscribed plan and consumer app, not from a fixed string — which is what makes the same API enforce different limits for “internal” vs “partner” vs “public” consumers without forking the assembly.

Security policies

Three patterns cover almost every real API Connect deployment in financial services.

OAuth 2.0 with external authorization server

The gateway is a resource server, not the AS. It introspects bearer tokens against an external IdP (Keycloak, Ping, ForgeRock). This keeps token lifecycle, refresh, and consent flows out of DataPower — which is purpose-built to validate, not to issue.

oauth-introspect-provider.yamlyaml
name: acme-keycloak-introspect
title: Keycloak Token Introspection
provider-type: third_party
third-party-config:
  introspection-endpoint: https://idp.acme-bank.com/realms/prod/protocol/openid-connect/token/introspect
  authentication-method: client-secret-basic
  client-id: datapower-rs
  client-secret: $(env.IDP_CLIENT_SECRET)
  cache-type: time-to-live
  cache-ttl: 300
scopes:
  - payments.read
  - payments.write
grants:
  - accessCode
  - clientCredentials

The cache-ttl: 300 is critical. Without it, every gateway request becomes a synchronous call to the IdP. With it, the gateway introspects once and caches the active/inactive decision for five minutes — trading immediate revocation for an order-of-magnitude latency improvement. For revocation-sensitive flows (high-value transfers, admin operations), set the TTL to 0 on the specific assembly, not globally.

Mutual TLS for partner traffic

For B2B traffic the contract is “present a client certificate signed by our partner CA, or you don’t reach the listener.” This is configured at the gateway service level (not in the API assembly), because TLS happens before any policy runs.

tls-profile-partner-mtls.cfgdatapower
crypto profile partner-mtls
  idcred           gw-server-cert
  valcred          partner-ca-bundle
  ciphers          TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384:TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
  protocols        TLSv1.2 TLSv1.3
exit

ssl-server partner-listener
  crypto-profile   partner-mtls
  request-client-auth  on
  require-client-auth  on
  send-client-auth-ca-list  on
exit

JWT validation (offline, no introspection)

When the issuer is trusted and tokens are short-lived (under 5 minutes), validate the JWT signature locally using the issuer’s JWKS endpoint. No round-trip to the IdP, sub-millisecond verification.

jwt-validate.yamlyaml
- jwt-validate:
    version: 2.0.0
    title: verify-jwt
    jwt: request.headers.authorization
    output-claims: decoded.claims
    jwk: https://idp.acme-bank.com/realms/prod/protocol/openid-connect/certs
    iss-claim: "https://idp.acme-bank.com/realms/prod"
    aud-claim: "payments-api"
    verify-crypto: true

Rate limiting

Rate limit is the most operationally consequential policy. The model is a two-level hierarchy:

  • Plan-level quota — the contract with the consumer (e.g. 10 000 calls/day). Enforced by the gateway, counted per (consumer-app, plan).
  • Burst limit — protection for the backend (e.g. 50 requests/second). Enforced regardless of plan, counted per (consumer-app, API).

Both are configured on the Plan, not the API:

payments-product.yamlyaml
product: 1.0.0
info:
  name: payments
  title: Payments Product
  version: 1.2.0

plans:
  internal:
    title: Internal (no quota)
    approval: false
    rate-limits:
      unlimited: { value: unlimited }
    apis: { payments: {} }

  partner-tier1:
    title: Partner Tier 1
    approval: true
    rate-limits:
      daily:  { value: 100000/1day,  hard-limit: true }
      burst:  { value: 50/1second,    hard-limit: true }
    apis: { payments: {} }

  public:
    title: Public
    approval: true
    rate-limits:
      daily:  { value: 5000/1day,    hard-limit: true }
      burst:  { value: 10/1second,    hard-limit: true }
    apis: { payments: {} }

hard-limit: true rejects with HTTP 429 once exceeded. With hard-limit: false (the default in older versions) the gateway emits a warning header but still passes the request — useful for soft launches, dangerous in production. Be explicit.

Counter consistency

DataPower distributes counters across the gateway peer group via UDP gossip. Across two data centres with high latency the count can drift; use per-DC plan quotas sized to the DC’s share of traffic, plus a global burst limit, instead of trying to enforce a single global daily quota across geographies.

HA topology

Production deployments run active/active across two data centres. Each DC has its own DataPower peer group; API Manager and Portal sit in one DC with a warm standby. Analytics is a stretched cluster.

Three operational rules follow from this topology:

  • The data plane is active/active; the control plane is active/standby. Authoring an API in DC2 while DC1 is the primary breaks — the change won’t replicate back. Authoring is restricted to the primary; failover is a deliberate, runbook-driven event.
  • DataPower peer groups don’t span DCs. Counter gossip across high-latency links is unreliable. Each DC maintains its own quota counters; this is the trade you make to keep enforcement deterministic.
  • Analytics is stretched, not active/standby. Both DCs write events to the same OpenSearch cluster. A single dashboard shows global traffic; the cluster handles partition tolerance via shard placement rules.

Observability

Three signals are non-negotiable in production: per-API latency percentiles, per-consumer error rate, and policy-failure attribution. The first two come for free from Analytics. The third requires deliberate instrumentation.

Logging configuration

Set activity-log to payload for errors only — logging full payloads on success blows up storage and creates a PII footprint that compliance will refuse to certify. Errors get the payload because that’s where you actually need it.

activity-log.yamlyaml
activity-log:
  enabled: true
  success-content: activity     # headers + meta only
  error-content:   payload      # headers + meta + body

Forwarding events to a SIEM

DataPower writes to its own analytics endpoint, but most banks also need events in Splunk or QRadar for security correlation. The pattern: a log target on the gateway publishes to a syslog endpoint, which feeds the SIEM. Don’t try to make the SIEM the primary store — analytics ingest patterns are different.

log-target-splunk.cfgdatapower
log target splunk-tcp
  type          syslog-tcp
  remote-host   splunk-hec.acme-bank.com
  remote-port   514
  format        json
  event         apic-gw-error
  event         apic-gw-throttle
  priority      warning
exit

Health checks

Every gateway exposes /2018-04-01/diag over the management port. Wire it into the LB’s health probe with a 2-second timeout and 3-failure threshold — that’s aggressive enough to drain a bad node within 6 seconds, lenient enough to survive a GC pause.

Common pitfalls

Activity log bloat

Setting success-content: payload in production will fill the analytics disk inside a week on any decent traffic level. Every team I’ve seen do this also fails to set retention policies, which compounds the problem.

OAuth introspection without caching

Default cache-ttl: 0 means every request introspects against the IdP. At 1 000 RPS that’s 1 000 IdP calls/sec — the IdP becomes the bottleneck and the bank’s auth service falls over before the gateway breaks a sweat. Always set a non-zero TTL; reserve 0 for revocation-sensitive endpoints only.

Catalog drift

Editing an API directly in the API Manager UI on the production catalog creates a divergence the source repo can’t reproduce. Make the production catalog read-only via RBAC; force all changes through the staging catalog and a CI promotion.

Hard-coding URLs in target-url

Use $(target-url) with a property-defined backend instead of literal hostnames. Otherwise every promotion to a new catalog requires editing the assembly — and editing assemblies in CI is exactly what catalog properties exist to prevent.

When not to use IBM API Gateway

API Connect is excellent at what it’s designed for. It is a poor fit when:

  • Traffic is internal east-west only. A service mesh (Istio, Linkerd) gives you mTLS, observability, and traffic policy without the licensing cost or operational footprint of DataPower.
  • You need request-stream rewriting at the byte level. Envoy or NGINX with custom Lua/Wasm gives you primitives DataPower doesn’t expose. DataPower’s strength is policy, not raw stream manipulation.
  • The use case is webhook ingestion. Webhook receivers benefit from an event-driven runtime (Kafka + a small consumer) more than a synchronous gateway. Forcing webhooks through API Connect adds latency without buying you policy enforcement that matters at the edge of the bank.
  • You don’t have an IBM relationship. The cost only makes sense alongside other IBM software (MQ, ACE, DataPower, Cloud Pak for Integration). As a standalone purchase the value-for-money argument is hard to win against Kong, Apigee, or AWS API Gateway.

Where it does win: when policy is the product. Regulated environments where the gateway must be certifiable, where consumer-facing rate limits are part of the commercial contract, where TLS termination has to be FIPS-validated, and where the audit trail needs to be queryable by examiners. That is the environment API Connect was built for.