Start small and in staging. Define what "normal" looks like (steady state), then introduce one failure and observe. Only move to production chaos when you have confidence in your observability and rollback capabilities.
Strong answers cover: starting with game days in non-production environments, defining steady-state hypotheses, using tools like Chaos Monkey or Litmus, running experiments with clear blast radius limits, having abort conditions, and documenting findings. Best candidates discuss the cultural prerequisites for chaos engineering and how to build organisational buy-in.
Advanced SRE practice. Candidates with chaos engineering experience have mature reliability practices. Ask: "What did your first chaos experiment reveal that you did not expect?" to test genuine experience.