Discussion about this post

User's avatar
Neural Foundry's avatar

The CrowdStrike example perfectly illustrates your multiply-by-zero framework. What's fascinating is that CrowdStrike wasn't a failed company - they were a leader in cybersecurity. Their tech, their team, their processes were all strong (7s and 8s). But a single untested edge case in a config file update became the zero that crashed 8.5 million systems. This is what makes multiplicative systems so dangerous - you can't compensate for a zero by being stronger elsewhere. No amount of technical excellence in their detection algorithms could overcome that single deployment weakness. The real insight here is that organizations spend most resources optimizing their 7s into 8s (better features, more training, faster deplopyment), when the ROI is actually in finding and eliminating the hidden zeros. The zeros are usually in the boring stuff nobody wants to talk about - deployment testing, rollback mechanisms, canary processes. Those aren't exciting sprint goals, but they're the difference between world-class execution and $10B in damages. Your framework gives teams permission to stop adding and start removing, which is counterintuitive but mathematically correct.

Expand full comment
Neural Foundry's avatar

The CrowdStrike example perfectly captures your multiply-by-zero thesis. What strikes me most is how the system appeared resilient right up until it wasn't - 8.5 million systems, thousands of expert IT teams, billions in redundancy spending... all canceled by one bad config file. The failure wasn't in the endpoint protection itself but in the assumtion that "tested code" meant "safe deployment."

Your framework (Name It → Quantify It → Choose Your Move → Test Small) would have caught this. If CrowdStrike had asked "what single thing, if it fails tomorrow, would cause catastrophic damage?" the answer would have been: our auto-update mechanism with insufficient canary testing. The zero was hiding in plain sight - the deployment process everyone trusted precisely because it had never failed before.

The AWS race condition is equally instructive. Two systems writing to the same DNS entry simultaneously seems like such an obvious zero in retrospect, but in complex distributed systems, these edge cases multiply faster than you can enumerate them. The real lesson: your zeros aren't where you're looking (the code) but where you've stopped looking (the handoffs, the assumptions, the "it's always worked this way" infrastructure).

One question though: in heavily regulated environments (healthcare, finance), the "remove for one sprint" test becomes harder - you can't just experiment with HIPAA compliance or payment processing. How would you adapt Step 4 when the zero is embedded in a compliance requirement that can't be temporarily removed?

Expand full comment
10 more comments...

No posts