Exploring the intersection of AI and DevOps, covering best practices, insights, and practical applications for modern infrastructure teams.
9 Articles in this series
How to Detect Terraform Drift Across Multi-Cloud
A development RDS instance had its publicly_accessible flag flipped on a Friday afternoon. The team's drift-detection cadence was once per weekday, so 60+ hours passed before anyone caught it. Walkthrough of the audit-log subscription architecture that would have caught it in two minutes across AWS, GCP, and Azure, with every config block paste-able into your own account.
How to Trace a Production Incident Back to the Commit
Burned 25 minutes on a Friday-morning page before I realized the responsible commit was in another team's repo. This is the four-command sequence I now run when an alert lands and `git log` on my own service comes up empty, with the outputs at each step and where the search space gets cut.
My Workers Stopped Polling: a K8s + Temporal Whodunit
Temporal workflows stuck in Running with zero pollers, and Temporal still reports a healthy task queue. The root cause lives one layer down: a CrashLoopBackOff in the Kubernetes worker pod, caused by a single bad environment variable. A walkthrough of debugging Temporal workers on Kubernetes the manual way (10 minutes), then with an infrastructure context layer that bridges the two systems (seconds).
Top 10 AI SRE Tools in 2026: A Comprehensive Comparison
The 10 best AI SRE tools in 2026 compared by architecture, root cause analysis, remediation, and change awareness — from Anyshift's versioned graph to Resolve AI's autonomous agents.
Why AI-SRE Needs Topology, Not Just Telemetry
The limits of telemetry-only AI approaches to SRE and why topology is the missing piece.
Navigating AI in your Infrastructure: Dos, Don'ts, and Why It Matters
GenAI is everywhere. But very often, the cool and exciting demos don't work the same way in production.
Common Weak Points in Infrastructure Management: An In-Depth Guide
Managing infrastructure at scale is a complex endeavor that demands meticulous planning, robust tooling, and continuous adaptation.
Anyshift.io at Awesome AI Dev Tools September
Anyshift's presentation at the Awesome AI Dev Tools event in September 2024.
5 Key Reasons You're Struggling to Debug Your Infrastructure in Under an Hour
Most infrastructure debugging sessions blow past the one-hour mark for the same five structural reasons: scattered visibility across cloud accounts, missing historical state, terraform plan output that hides downstream impact, runbooks that lag the live infrastructure, and post-merger environments that no one has fully mapped. A walkthrough of each, with concrete examples and what reduces the time.