Day 225
Week 33 Day 1: If You Are Always Putting Out Fires, You Built a Flammable Organization
Firefighting is not leadership. It is the evidence that leadership failed to build systems that prevent fires in the first place.
Lesson Locked
Every organization has occasional emergencies. That is normal. What is not normal is an organization where emergencies are the primary mode of operation -- where the team spends more time reacting to crises than doing planned work. When that pattern persists, the problem is not the fires. The problem is the organization's architecture. Something in the design is producing fires faster than the team can extinguish them.
Here is how to tell the difference between a team with occasional fires and a team that is structurally flammable. The occasional-fire team: emergencies are rare, the team has documented response procedures, the root cause is investigated and addressed, and the same fire does not occur twice. The structurally-flammable team: emergencies are weekly or daily, the response is ad hoc (different people do different things each time), the root cause is never investigated because the team is already fighting the next fire, and the same problems recur on a predictable cycle. I managed a structurally-flammable team for 18 months before I diagnosed the pattern. Every week we had at least one production incident. Every incident required the same three senior engineers to drop their planned work and respond. Every post-incident, I said 'we should fix the root cause.' Every time, the root cause investigation was displaced by the next incident. The team was so busy fighting fires that we could not invest in fire prevention. We were stuck in what systems thinkers call a 'doom loop' -- the fires consumed the time and energy needed to prevent the fires. The breakthrough came when I made a decision that felt irresponsible at the time: I pulled one senior engineer off the firefighting rotation entirely and assigned them exclusively to root cause analysis and system hardening. For two months, every incident was handled by two people instead of three. It was painful. Response times were slower. Some incidents escalated further than they would have otherwise. But the engineer who was freed from firefighting identified and fixed the six systemic issues that were causing 80% of the incidents. By month three, our incident rate dropped by 70%. The team that had been firefighting for 18 months suddenly had time to do planned work for the first time in anyone's memory.
The structural flammability pattern is an instance of what Senge (1990) calls the 'fixes that fail' systems archetype -- a pattern where short-term solutions (firefighting) produce side effects (depleted capacity for prevention) that reinforce the original problem (more fires). The archetype predicts that intensifying the short-term fix (hiring more firefighters, working longer hours) will worsen the long-term problem because it further diverts resources from prevention. The only escape is what Meadows (2008) calls 'leverage point intervention' -- changing the system at a point where small effort produces large effect. In the firefighting context, the leverage point is the allocation of capacity to prevention rather than response. Research by Repenning and Sterman (2001) on 'firefighting in product development' provides quantitative support: they found that when organizations allocated more than 50% of engineering capacity to firefighting, the organization entered a self-reinforcing vicious cycle where firefighting demand increased exponentially, eventually consuming 100% of capacity. Their model predicted that the only way to break the cycle was a deliberate, counterintuitive reallocation of scarce capacity from firefighting to prevention -- exactly the intervention described in level_2. The 80/20 finding (six root causes driving 80% of incidents) is consistent with Pareto analysis (Juran, 1951), which demonstrates that quality problems in complex systems follow a power law distribution where a small number of root causes produce a disproportionate share of failures.
Continue Reading
Subscribe to access the full lesson with expert analysis and actionable steps
Start Learning - $14.99/month View Full Syllabus