Senior Site Reliability Engineer with 20+ years of infrastructure experience, including 3+ years hands-on with AWS EKS, Kubernetes, and GitOps at production scale. Grounded in enterprise datacenter operations — VMware, SAN, multi-DC management — and now contributing to cloud-native platform engineering for a ~100-engineer organization. Strong execution engineer with depth in Kubernetes troubleshooting, incident response, cluster lifecycle management, and infrastructure as code. Comfortable taking ownership of hard problems and delivering results independently.
Workload kind against the team's new v2 platform standard, establishing the working pattern for all subsequent team workload deployments. Deployed ngri-analytics-ai as the first consumer of the new standard.| Cloud & IaC | AWS (EKS, EBS, EFS, S3, ECR, RDS, ALB/NLB, TGW, Route53, VPC, IAM), Terraform, Terragrunt, Atlantis, Crossplane v2, Docker |
| Kubernetes | EKS, ArgoCD, Helm, Karpenter, KEDA, Kyverno, cert-manager, external-secrets, Traefik, Cilium, node-local-dns |
| CI/CD | GitHub Actions, GitOps workflows, GitHub Packages, secrets management |
| Observability | PagerDuty, Coralogix, OTEL, kube-prometheus-stack, nOps, FireHydrant, Incident.io, Better Stack, CheckMK, Nagios |
| Networking | AWS VPC/TGW/VPN, Cilium, Traefik, OpenVPN, DNS, load balancers (ALB/NLB) |
| Legacy / Other | Linux, Windows Server, VMware vSphere, SAN Storage (Nimble, EqualLogic, Compellent), MongoDB, HashiCorp Vault |