Series

Exploring the intersection of AI and DevOps, covering best practices, insights, and practical applications for modern infrastructure teams.

9 Articles in this series

1

How to Detect Terraform Drift Across Multi-Cloud

A development RDS instance had its publicly_accessible flag flipped on a Friday afternoon. The team's drift-detection cadence was once per weekday, so 60+ hours passed before anyone caught it. Walkthrough of the audit-log subscription architecture that would have caught it in two minutes across AWS, GCP, and Azure, with every config block paste-able into your own account.

Louis Fradin · May 15, 2026 · 9 min read
2

How to Trace a Production Incident Back to the Commit

Burned 25 minutes on a Friday-morning page before I realized the responsible commit was in another team's repo. This is the four-command sequence I now run when an alert lands and `git log` on my own service comes up empty, with the outputs at each step and where the search space gets cut.

Louis Fradin · May 15, 2026 · 8 min read
3

My Workers Stopped Polling: a K8s + Temporal Whodunit

Temporal workflows stuck in Running with zero pollers, and Temporal still reports a healthy task queue. The root cause lives one layer down: a CrashLoopBackOff in the Kubernetes worker pod, caused by a single bad environment variable. A walkthrough of debugging Temporal workers on Kubernetes the manual way (10 minutes), then with an infrastructure context layer that bridges the two systems (seconds).

Louis Fradin · Apr 8, 2026 · 6 min read
4

Top 10 AI SRE Tools in 2026: A Comprehensive Comparison

The 10 best AI SRE tools in 2026 compared by architecture, root cause analysis, remediation, and change awareness — from Anyshift's versioned graph to Resolve AI's autonomous agents.

Roxane Fischer · Mar 13, 2026 · 15 min read
5

Why AI-SRE Needs Topology, Not Just Telemetry

The limits of telemetry-only AI approaches to SRE and why topology is the missing piece.

Roxane Fischer · Jan 27, 2026 · 8 min read
6

Navigating AI in your Infrastructure: Dos, Don'ts, and Why It Matters

GenAI is everywhere. But very often, the cool and exciting demos don't work the same way in production.

Roxane Fischer · Oct 15, 2024 · 7 min read
7

Common Weak Points in Infrastructure Management: An In-Depth Guide

Managing infrastructure at scale is a complex endeavor that demands meticulous planning, robust tooling, and continuous adaptation.

Roxane Fischer · Sep 19, 2024 · 4 min read
8

Anyshift.io at Awesome AI Dev Tools September

Anyshift's presentation at the Awesome AI Dev Tools event in September 2024.

Roxane Fischer · Sep 1, 2024
9

5 Key Reasons You're Struggling to Debug Your Infrastructure in Under an Hour

Most infrastructure debugging sessions blow past the one-hour mark for the same five structural reasons: scattered visibility across cloud accounts, missing historical state, terraform plan output that hides downstream impact, runbooks that lag the live infrastructure, and post-merger environments that no one has fully mapped. A walkthrough of each, with concrete examples and what reduces the time.

Roxane Fischer · Jul 30, 2024 · 4 min read