Overview
Built and operated the entire cloud infrastructure and DevOps platform for an AI-powered SaaS startup from zero to production. The platform serves a React web app, 3 Node.js microservices, and a Flutter mobile app — all running on Kubernetes with full GitOps automation.
As the sole DevOps and platform engineer, I designed, implemented, and maintained every layer of the stack — from Terraform modules provisioning Azure resources to Flux CD automating deployments to Kubernetes.
Architecture
Terraform Layered State Architecture
The infrastructure uses a layered state architecture where each layer references the previous via terraform_remote_state:
base/ → VNet, NSGs, NAT, Storage, SQL, Key Vault, Entra ID
↓ remote_state
identity/ → Managed Identities, Role Assignments, Federated Credentials
↓ remote_state
aks/ → AKS Cluster, ACR, Front Door, AGFC, Document Intelligence
↓ remote_state
flux/ → Flux Extension, Git Repository Source
CI/CD & GitOps Flow
Developer Push → Azure DevOps Pipeline (Build + Test + DB Migration)
↓
ACR (Container Registry)
↓
Flux Image Reflector (polls every 60s)
↓
Flux Image Policy (selects latest build)
↓
Flux Image Update Automation (commits new tag to GitOps repo)
↓
Flux Kustomize Controller (reconciles)
↓
AKS Cluster
├── user-service (Pod)
├── subscription-service (Pod)
├── llm-service (Pod)
├── AGFC Gateway (HTTPS + WAF)
├── External Secrets (← Key Vault)
├── cert-manager (TLS)
├── OpenSearch + Fluent Bit (Logging)
└── Prometheus + Grafana (Monitoring)
↓
Azure Front Door CDN → React SPA (Blob Storage)
↓
Users (Web + Mobile)
Infrastructure (Terraform)
- 16 custom, reusable Terraform modules provisioning the entire Azure footprint
- Layered state architecture:
base→identity→aks→flux, each referencing prior state viaterraform_remote_state - Modules include: AKS cluster (OIDC, Workload Identity, Node Auto-Provisioning), VNet with private/public subnets + NSGs + NAT Gateway, Azure SQL (serverless), PostgreSQL Flexible Server, Key Vault (RBAC + private endpoint), Container Registry (zone-redundant, trust policies), Azure Front Door CDN (custom domains, WAF with rate limiting + managed rulesets), Application Gateway for Containers (AGFC with WAF + Bot Manager), Document Intelligence (AI cognitive service), Entra External ID (B2C — 433-line module for OAuth2/OIDC with social providers), Flux GitOps extension, Azure DevOps OIDC service connection, frontend static storage, customer data storage (Data Lake Gen2)
- Zero static credentials — Workload Identity Federation and Managed Identities throughout
- Private endpoints for all data services (SQL, PostgreSQL, Key Vault, Cognitive Services)
# Example: AKS Module with Workload Identity
module "aks" {
source = "../../modules/aks"
cluster_name = "pryvasee-${var.environment}"
kubernetes_version = "1.29"
node_count = 3
vm_size = "Standard_D4s_v5"
oidc_issuer_enabled = true
workload_identity_enabled = true
node_auto_provisioning = true
network = {
vnet_id = data.terraform_remote_state.base.outputs.vnet_id
subnet_id = data.terraform_remote_state.base.outputs.aks_subnet_id
}
}
Kubernetes Platform
- Custom Helm chart (
pryvasee-microservices-chart) — single reusable chart deployed per service with value overrides - 161-line helpers template with intelligent secret type detection (splitting External Secrets into
datavsdataFromgroups) - Security hardened:
runAsNonRoot,readOnlyRootFilesystem, drop ALL capabilities, resource limits enforced - Gateway API with AGFC: HTTPS routing, path-prefix rewriting, TLS termination with cert-manager
- HPA (CPU/memory), PodDisruptionBudgets, liveness/readiness/startup probes
- External Secrets Operator syncing secrets from Azure Key Vault via ClusterSecretStore
- Azure Key Vault CSI driver for direct volume-mounted secrets
- Cost optimization: Karpenter (NAP) with spot node pools
# Example: Security Context
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
GitOps (Flux CD)
- Three-tier Kustomization hierarchy: flux-system → addons (14 add-ons with dependency ordering) → services
- Closed-loop automated deployment: code push → Docker build → ACR push → Flux image reflector polls ACR every 60s → ImagePolicy selects latest build → ImageUpdateAutomation commits updated tag → Flux reconciles → deployment rolls out
- 14 cluster add-ons managed via GitOps: External Secrets Operator, cert-manager, AGFC Gateway, OpenSearch logging cluster, Fluent Bit (DaemonSet log shipping), kube-prometheus-stack (Grafana + Prometheus + Alertmanager), custom Prometheus alerts, NAP spot NodePool, RBAC roles, self-hosted Azure DevOps agent
CI/CD Pipelines (Azure DevOps)
- Backend: 606-line, 4-stage pipeline — Build (matrix strategy for parallel multi-service builds) → Database Migration (TypeORM with automatic rollback on failure) → Pod Rollout Monitoring (waits for Flux reconciliation, verifies image tags, checks pod health, detects crash loops) → Automated Rollback (kubectl rollout undo, ACR image cleanup, DB migration revert, diagnostic log collection as pipeline artifacts)
- Frontend: React+Vite build → Azure Blob Storage
$webdeployment → Front Door CDN cache purge - Mobile: 3-stage pipeline (Staging → Production Firebase → App Stores) with parallel Android + iOS jobs, Fastlane, Firebase App Distribution, Match code signing for iOS
- Self-hosted build agents: Custom multi-arch (amd64/arm64) Docker image running ON the AKS cluster itself
Observability
- OpenSearch cluster (operator-managed) for centralized logging
- Fluent Bit DaemonSet shipping logs from all nodes
- kube-prometheus-stack: Grafana dashboards, Prometheus metrics, Alertmanager
- Custom Prometheus alert rules with Microsoft Teams integration
- HTTPRoutes exposing Grafana, Prometheus, Alertmanager dashboards through AGFC Gateway
Key Achievements
- Single-handedly built the entire DevOps platform from zero
- 16 reusable Terraform modules — zero static credentials
- Fully automated deployment pipeline with zero-downtime rollouts
- Complete observability stack with alerting
- Cost-optimized with spot instances and right-sized workloads
- Production-grade security with zero static credentials and private endpoints
Need something similar?
Let's discuss how I can build this kind of infrastructure for your team.