How do you set up monitoring and observability for a production system? What do you monitor and what alerts do you set?

Question

Accepted Answer

Strong answers cover the three pillars: metrics (system and application), logs (structured, centralised), and traces (distributed tracing for microservices). Alert philosophy: alert on symptoms not causes, avoid alert fatigue, use SLOs/SLIs. Best candidates mention dashboards for different audiences and runbooks for common alerts.

Interview Prep

🔧 DevOps / SRE

How do you set up monitoring and observability for a production system? What do you monitor and what alerts do you set?

What good answers include

What interviewers are looking for