DevOps & Platform Engineering — Tekton & ArgoCD — Lab

Stack: OpenShift / Tekton / ArgoCD / Vault Audience: Engineers Read: ~14 min Author: Waleed Albadawi

Overview

Integration platforms in financial services have CI/CD problems that vanilla microservice teams don’t face: BAR files for IBM ACE, Kafka topics that can’t be re-created without data loss, MQ queues that need clustering aware of network zones, OAuth clients that need IdP configuration synchronised with deployments, and a regulator who wants every change traced to an approver.

The pattern that holds: GitOps for state, Tekton for events. Git is the single source of truth for what should be deployed; Tekton runs the pipelines that build artifacts; ArgoCD reconciles the cluster to git. Promotions happen by merging a PR, not by a kubectl from a developer’s laptop.

Platform engineering is not just CI

A platform engineering team owns the runtime, the pipeline, the dev experience, and the contracts with consumer teams. The deliverable is a self-service path: a service team can ship a new integration without raising a ticket. If teams still file tickets to deploy, you have a CI team, not a platform team.

Pipeline shape

Figure 1 — CI builds artifacts, GitOps reconciles state, environments promote via PR

The crucial property: nothing reaches a cluster except via git. No kubectl apply from a laptop, no manual oc rsh-and-edit. Every change is a commit; every commit has an author and a reviewer. This is what regulators care about and what most banks fail to implement consistently.

Tekton

Tekton is Kubernetes-native CI: pipelines and tasks are CRDs, runs are pods. The benefit over Jenkins is operational — the runtime is the cluster you already operate — and the cost is that Tekton is a primitive; you build the pipeline yourself.

apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
  name: build-ace-flow
spec:
  params:
    - { name: git-url }
    - { name: git-revision }
    - { name: image }
  workspaces:
    - name: source
  tasks:

    - name: clone
      taskRef: { name: git-clone }
      params:
        - { name: url,      value: $(params.git-url) }
        - { name: revision, value: $(params.git-revision) }
      workspaces: [{ name: output, workspace: source }]

    - name: unit-test
      runAfter: [clone]
      taskRef: { name: ace-unit-test }
      workspaces: [{ name: source, workspace: source }]

    - name: build-bar
      runAfter: [unit-test]
      taskRef: { name: ace-build-bar }
      workspaces: [{ name: source, workspace: source }]

    - name: image-build
      runAfter: [build-bar]
      taskRef: { name: buildah }
      params:
        - { name: IMAGE, value: $(params.image) }
      workspaces: [{ name: source, workspace: source }]

    - name: trivy-scan
      runAfter: [image-build]
      taskRef: { name: trivy-scan }
      params:
        - { name: image, value: $(params.image) }
        - { name: severity, value: "HIGH,CRITICAL" }

    - name: sign-image
      runAfter: [trivy-scan]
      taskRef: { name: cosign-sign }
      params:
        - { name: image, value: $(params.image) }

    - name: bump-env-manifest
      runAfter: [sign-image]
      taskRef: { name: git-pr-bump }
      params:
        - { name: env-repo, value: https://git/.../env-dev }
        - { name: image,    value: $(params.image) }

Each task is reusable; the pipeline is one team’s composition. Push reusable tasks to a shared catalog repo so platform updates apply everywhere.

GitOps with ArgoCD

Two repos per environment: the app repo (source) and the env repo (manifests). The CI pipeline opens a PR against the env repo with the new image tag; ArgoCD watches the env repo and reconciles the cluster.

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: payments-prod
  namespace: argocd
spec:
  project: integration
  source:
    repoURL: https://git/acme/env-prod
    targetRevision: main
    path: apps/payments
    helm:
      valueFiles: [values.prod.yaml]
  destination:
    server: https://prod-cluster
    namespace: payments
  syncPolicy:
    automated:
      selfHeal: true
      prune: false      # prod: never auto-prune; humans approve deletes
    syncOptions:
      - CreateNamespace=true
      - PrunePropagationPolicy=foreground
    retry:
      limit: 3
      backoff: { duration: 10s, factor: 2, maxDuration: 2m }
  revisionHistoryLimit: 10

Environment promotion

Three environments is enough for almost any platform: dev (everyone, free for all), test (integration testing, fixed data), prod.

Environment	Sync	Approval	Data
dev	Automated on commit to main	None	Synthetic, freely reset
test	Automated on PR merge from dev branch	1 platform reviewer	Anonymised prod-like
prod	PR with image tag + change record	Service owner + change manager	Real

Promotion is image-tag-only. The same image that passed test is what runs in prod. Configuration that varies by environment lives in env-specific Helm values files; never rebuild between environments.

Secrets

Three things must be true: secrets are not in git, secrets are versioned, and secrets are accessible to pods without runtime humans.

Vault as source of truth. HashiCorp Vault or IBM Cloud Pak for Integration secrets; pods authenticate via Kubernetes service account JWT.
External Secrets Operator. Pulls from Vault, creates Kubernetes Secrets in the namespace; ArgoCD doesn’t see the secrets, just ExternalSecret objects.
Sealed Secrets as fallback. Encrypted secret in git, decrypted by a controller in the cluster. Ok for low-risk env-specific configuration, not for production credentials.

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
  name: payments-mq-creds
spec:
  refreshInterval: 1h
  secretStoreRef:
    name: vault-prod
    kind: ClusterSecretStore
  target:
    name: payments-mq-creds
    creationPolicy: Owner
  data:
    - { secretKey: username, remoteRef: { key: payments/mq, property: user } }
    - { secretKey: password, remoteRef: { key: payments/mq, property: pwd } }

Observability

The platform itself needs the same telemetry it gives its consumers. Three signals at minimum:

Pipeline duration & success rate. Slow pipelines erode developer trust faster than failing pipelines. Track p50/p95 by stage.
Sync drift. ArgoCD’s out-of-sync status — track per app per environment. Persistent drift means manual changes to the cluster, which means GitOps is bypassed.
Image freshness. Time since the image in production was built. An image older than 30 days probably has unpatched CVEs.

Release patterns

Three release patterns cover almost every integration deployment:

Rolling. Default for stateless services; replace pods one at a time behind a load balancer. Fine for HTTP/REST.
Blue/green. Stand up green alongside blue, switch traffic at the gateway. Use for releases that change message contracts where in-flight requests with the old contract must drain.
Canary. Route a small percentage of traffic to the new version; ramp on metrics. Use for high-risk changes or anywhere business metrics are sensitive (fraud detection, risk scoring).

Stateful services need pre-release coordination

Kafka topic partition increases, MQ queue config changes, schema registry compatibility shifts — none of these are pod-level rollouts. Encode them as separate, idempotent platform tasks (Tekton or Ansible) that run before the pod-level deploy. Don’t bake them into the application Helm chart.

Audit & compliance

The regulator’s question is the same in every audit: “show me who approved this change.” GitOps answers that question by construction: every change is a commit with an author; every prod commit was a PR with a reviewer; every reviewer maps to an identity. Make sure the chain is unbroken.

Branch protection on prod env repo: require 2 reviewers, 1 from a different team, no force-pushes, signed commits.
Image provenance: sign images with cosign; ArgoCD admission policy refuses unsigned images in prod.
SBOM per artifact: attach a Software Bill of Materials to every image; the next CVE response is a query against SBOMs, not a reverse-engineering exercise.
Change record link: require a ServiceNow / Jira change ticket reference in the prod PR title; auditors want to trace from change ticket to commit to deploy.

Common pitfalls

Cluster admin shortcuts

One platform engineer with cluster admin can fix anything in 30 seconds. They can also undo six months of GitOps discipline. Restrict cluster admin to break-glass; require justification per use; alert on every non-ArgoCD apply.

Pipeline secrets in CI logs

The single most common cause of credential leaks is debug output in a CI pipeline. Configure Tekton tasks with strict log filtering; reject any task that prints environment variables; rotate secrets immediately if any leak is suspected.

Per-team golden path

If every team has its own pipeline, you have N pipelines to maintain and audit. Define a small number (1–3) of paved paths; allow teams to deviate only with explicit approval. The flexibility cost is real but the audit and operational savings are larger.

No rollback runbook

Rollback in GitOps is a revert PR. That sounds easy — until ArgoCD has auto-pruned a resource and the revert recreates it with a different IP, breaking downstream service-mesh routing. Test rollback regularly; never assume it works.

Production checklist

App repo and env repo separated; only env repo drives deploys.
Branch protection on prod env repo; 2 reviewers; signed commits.
ArgoCD with selfHeal on, auto-prune off in prod.
Tekton tasks signed with cosign; only signed tasks run in prod.
Trivy/Clair scan on every image; HIGH/CRITICAL fail the pipeline.
Cosign image signing; admission policy enforces signatures in prod.
SBOM attached to every image; queryable by CVE.
Vault + External Secrets; no plaintext secrets in git.
One golden path per service shape; deviations need approval.
Pipeline duration p95 alert; sync drift alert; image freshness alert.
Documented and quarterly-tested rollback runbook.
Cluster admin restricted; non-ArgoCD apply alerts wired to oncall.