Think in layers: is it the application (slow queries, resource leaks), the platform (pod limits, connection pools), or the network (DNS, load balancer, firewall)? Narrow down the layer before diving deep.
Strong answers describe a systematic approach: checking metrics (latency percentiles, error rates), examining logs with correlation IDs, verifying DNS resolution, checking network connectivity and load balancer health, examining the downstream service's health, reviewing recent changes, and considering resource exhaustion (connection pool, file descriptors). Best candidates think in layers: application, platform, network.
Tests troubleshooting methodology. Real-world incidents require systematic diagnosis, not guessing. Candidates who can articulate a structured debugging approach are effective on-call engineers.