The Failure of Happy Path Testing: A Post-Mortem
Even after 19 years, the "obvious" can still bite you.
I almost shipped a data-loss bug.
Not because of a logic error or a syntax mistake, but because I fell for the oldest trap in the book: The Happy Path. I was so focused on getting my sync logic working that I completely ignored the physical reality of where the data actually lives.
The "Duh" Moment
I was designing the sync feature. I wanted to give users the choice: "Sync via P2P, Google Drive, or your own NAS."
In my head, a NAS is just a drive. I test on my local NVMe, and everything is lightning fast. I test on a simulated "slow" drive, and it still works. Green checkmarks across the board.
Then it hit me. NAS... POSIX locks... dang.
The Blind Spot
If you’re building a database-backed app (like I am with DuckDB and SQLite), you rely on the OS to handle file locking. On your local machine, this is a solved problem. But as soon as you move that database file to a NAS via SMB or NFS, you aren't in Kansas anymore.
Most consumer-grade NAS protocols have notoriously flaky implementations of POSIX advisory locks. If I had stayed on my "Happy Path," I would have let users point their live DuckDB store at a network drive.
The result? Two devices trying to write at once would have resulted in an immediate IO Error at best, and a completely mangled, unrecoverable database image at worst.
Why Experience is a Double-Edged Sword
The irony is that I know this. I’ve dealt with network file systems for two decades. But when you’re in "builder mode," you develop a sort of tunnel vision. You assume the environment will behave because your code is "correct."
I forgot to test the Adversarial Environment. I assumed the "Local" in "Local-First" extended to anything mounted as a local volume.
The Pivot: From State to Events
This "Doh!" moment forced a complete rethink of the persistence layer.
The Happy Path approach: Sync the binary database file. (Result: Corruption).
The Resilient approach: The database must stay on the local NVMe. The NAS and Cloud are relegated to being "dumb" mailboxes for immutable Event Logs (Protobuf).
By shifting to an event-sourcing model, I stop caring if the NAS supports POSIX locks. I’m just dropping small, uniquely named files into a folder. If the NAS fails, the file just isn't there yet—it doesn't break the files that are there.
The 6:30 PM Reality Check
So here I am. It’s 6:30 PM. I've already put in a full day of regular work, and I’m sitting here back at the desk having to rebuild this piece because I missed a fundamental edge case.
This means more hours on weirdness. It means writing entirely new sets of adversarial tests to ensure the system survives the environments I forgot to check. It’s a stark reminder that if you aren't testing each environment every step of the way, especially the ones that aren't "polite", you’re just building a house of cards.
My new rule: If a feature depends on the filesystem, test it on the shittiest, most non-compliant network share you can find. If it survives that, it might just survive the users.