An Abstraction Is Lossless Only If It's Context-Free
When can you abstract something away for free? The condition has two names, context-free and zero mutual information, and almost nothing in a real system meets it.
In the last post I argued that every abstraction is a debt, and I buried a claim in a footnote that I want to dig back up, because it’s the part I haven’t been able to put down. The footnote said that some abstractions compress without losing anything and some don’t, and that the difference comes down to context. That’s true, but it’s the kind of true that’s hiding a real result underneath it. There’s a precise condition for when you can abstract something away for free, and it has two names depending on which building you ask in. The computer scientists call it context-free. The information theorists call it independence. They aren’t quite the same statement, but they rhyme for a reason that becomes exact once you say what you’re measuring, and once you see it a lot of the leaks in this field stop looking like bad luck and start looking like arithmetic.
Here’s the claim, stated plainly so we can spend the rest of the post earning it: an abstraction is lossless, with respect to whatever you’re measuring, exactly when the part it hides tells you nothing about that measure the interface hasn’t already told you. That is the context-free condition in disguise, and the moment it breaks, abstracting the part in isolation throws information away, and the information you threw away is the bill that arrives later. The abstraction didn’t malfunction. It told the truth about the part and stayed quiet about the relationship, and the relationship was the part that mattered.
What lossless even means here
Start with what we mean by lossless, because it’s easy to wave at. An abstraction takes something, a component, an instruction, a service, a page of memory, and gives you a smaller stand-in: an interface, a signature, a name. You act through the stand-in and you get to not think about what’s underneath. Call the abstraction lossless if everything you need to predict the behavior is recoverable from the stand-in alone, without peeking at what surrounds it. If you can always swap the real thing for its interface and still reason correctly, nothing was lost. If you sometimes have to look at the neighbors to know what the thing will do, the interface was a lie of omission, and the size of the lie is precisely the part you had to go look up.
That word swap is the whole game. An abstraction is a promise of substitutability: this stands for that, here and everywhere, no questions about the surroundings. When the promise holds, you have genuinely made the world smaller. When it doesn’t, you have only hidden part of it, and hidden is not the same as gone.
One thing to nail down before we go further: lossless is always relative to what you’re measuring. Correctness, latency, throughput, memory, energy, security, those are different properties, and an abstraction can be flawless for one and a pack of lies for another. A type signature can be a perfect abstraction of what a function computes and a useless one for how long it runs. So every claim below is quietly of the form lossless for this property, never lossless, full stop.
The first name: context-free
The computer-science name for the promise holding is context-free, and it comes from one of the quietly load-bearing ideas in the field. Starting with a 1956 paper and sharpened over the next few years, Noam Chomsky ranked grammars, ways of describing which strings of symbols are legal in a language, by exactly one thing: how much context a rule is allowed to consult. A context-free rule looks at a single symbol and rewrites it the same way regardless of what sits to its left or right. A becomes this, full stop, wherever A shows up. A context-sensitive rule is allowed to peek at the neighbors: in this surrounding, A becomes one thing; in that surrounding, A becomes another. The entire hierarchy is organized around how much of the context a rule has to know to do its job.
This is not really a story about grammar. It’s a story about what you can reason about locally. Context-free grammars are a big part of why much programming-language syntax parses efficiently, though the full legality of a real program always spills outside the grammar, into name binding, type rules, and the rest. The deeper appeal is local reasoning: when a piece means the same thing regardless of its neighbors, you can understand it once and reuse the understanding everywhere. Context-sensitive is the world where you can’t do that, where the same symbol in two places is two different things, and you have to carry the surroundings with you to stay correct. Translate it into systems and the claim is immediate. An abstraction is safe to reason about in isolation exactly when the thing beneath it behaves context-free. If the behavior is context-sensitive, the interface that hid the context hid the thing you needed.
My favorite example is one Curtis Dunham and I tripped over years ago and only half understood at the time. A processor’s instruction set is supposed to be a clean, context-free contract: here is an add, here is a load, here is a vector operation, each with a meaning that doesn’t depend on the company it keeps. The ISA was largely context-free for what each instruction means. It was not context-free for what each instruction costs. A vector op’s architectural contract can be perfectly local while its latency, its throughput, and whether it pays off at all depend on the loads and stores around it, on the memory access pattern it’s embedded in, on whether the data is laid out the way the hardware quietly prefers. The instruction-by-instruction interface presented each vector op as if it stood on its own, and the part it left out, the relationship to its neighbors, was exactly the part a compiler needs to use it well. And the dependence runs deeper than adjacency. A vector op may never touch memory itself, yet using it implies a whole train of loads and stores around it, and that train implies more underneath: the coherence traffic, the ordering, the cache-line movement that has to happen for the operation to mean anything at all. None of that is in the instruction’s contract, and it was left out on purpose, because writing it all into the interface would drown the very thing the interface exists to simplify. Most days that is the right call. But left out is not gone, and the implied side-effects, and the side-effects of those, are exactly the context the clean op pretends it doesn’t have. The ISA wasn’t wrong about any single instruction. It was context-free about something that wasn’t.
The second name: zero mutual information
Now the same condition from a different building. In 1948 Claude Shannon gave us a formalized way to measure information in bits, and with it a quantity called mutual information: how much knowing one thing tells you about another. If two things are independent, learning one tells you nothing about the other, and their mutual information is zero. If they’re entangled, knowing one narrows down the other, and the mutual information counts, in bits, exactly how much.
Point that at an abstraction and the lossless condition sharpens from a yes-or-no into a number. When you abstract a part in isolation, you are encoding it without its context. You lose nothing when the hidden part carries no information about the property you care about beyond what the interface, and whatever context you do expose, already carry.1 The instant it does carry more, encoding the part alone leaves those task-relevant bits on the floor. That is not only a metaphor: fix a workload and fix a property to measure, and the missing information is a real quantity, countable in bits. The vector instruction shares real, behavior-relevant information with its surrounding memory pattern. Lift it out and present it on its own, and the bits you dropped are precisely the bits the compiler now has to reconstruct, guess, or do without.2
They are one claim, not two
So the computer scientist says context-free and the information theorist says no behavior-relevant information hiding in the context, and these aren’t two unrelated ideas. They rhyme, and once you fix a property to measure and a distribution over contexts they line up into the same shape. The thing that connects them has a name too, older than either field: compositionality, the principle (usually hung on Frege, and central to formal semantics including Montague’s) that the meaning of a whole is built from the meanings of its parts and the way they are combined, with no part secretly depending on the rest. A grammar rule is context-free when it applies without consulting its neighbors; an abstraction is context-free in the engineering sense when the context carries no extra information about the property the interface promises. Three vocabularies, one geometry: for the thing you’re measuring, the part is independent of its surroundings, or it isn’t. When it is, abstraction is free and lossless and you have actually shrunk the world. When it isn’t, abstraction is a loan against the context you pretended away, and we are back, exactly, at the debt from last time.
Why almost nothing is context-free
Here is the uncomfortable part. Almost nothing interesting is context-free with respect to every property we care about. The clean cases, the ones we hold up as the triumphs of abstraction, are clean because we worked hard to strip the context out, and even then it leaks back in through the one door we could never lock: performance. A page of virtual memory is a gorgeous context-free story about correctness. Every address means the same thing, your process owns a private and uniform expanse, and none of it seems to depend on what anyone else is doing. That last part is the tell, because the clean address is really two abstractions stacked into one. The private uniform space is virtual memory; the guarantee that a location reads the same no matter which core touches it, even while other cores are writing to it, is memory coherence, a second hidden machine with as many designs as virtual memory has fabrics, snooping and directories and the schemes in between, each with its own cost and its own ways to bite. The address hides both. And both are an outright lie about cost, because what a memory access actually costs depends entirely on context: on what else is resident, on the locality of everything around it, and on the coherence traffic your neighbors just set in motion. The correctness abstraction is context-free; the cost, and under contention even the timing of correctness, is wildly context-sensitive; and the bill, when your working set stops fitting, is thrashing. The same fault line runs through caches, branch predictors, and TLBs, every structure whose entire purpose is to exploit the context, the history, the neighbors, that the clean interface above it swears doesn’t exist.
And this is not only a hardware disease. Climb to the very top of the stack and the same lie is waiting. An object-relational mapper promises you can treat rows in a remote database as ordinary objects in memory: ask a user for its orders, write user.getOrders(), no SQL, no joins, no network in sight. That promise is context-free for one call and a catastrophe in a loop. Fetch the orders for a single user and it is one query; do it for a hundred users inside a for-loop and you have quietly fired a hundred and one, the infamous N+1, because the cost of a fetch was never a property of the fetch. It was a property of how and when you asked: the batching, the joins, the eager-versus-lazy choice the object interface swore you would never have to make. The method call dressed a network round-trip as a memory access, and under load the difference is the whole story.
This is the real reason Joel Spolsky’s law holds, the one that says all non-trivial abstractions are to some degree leaky. It isn’t carelessness and it isn’t bad luck. A non-trivial system has parts that depend on each other, and once you put a distribution over its runs those dependencies show up as mutual information between its pieces, which is just to say it isn’t context-free for everything at once, which is just to say no single interface over it can be lossless for everything at once. The leak was never a defect in the abstraction. It is the behavior-relevant information you declined to pay for, finding its own way out.
What it changes for compilers
This reframes a couple of fights I care about. Take compilers. A compiler is in the strange business of using an abstraction, the ISA, to do the very thing the abstraction was built to make unnecessary: reason about the machine underneath. To schedule instructions well, to allocate registers, to decide whether a vectorization pays for itself, it needs the context the ISA worked so hard to hide, the latencies, the ports, the true cost of a memory access in its actual surroundings. An ISA that is genuinely context-free is easy to target and hard to target well, because the same independence that makes it clean is the independence that starves the compiler of what it needs. In practice we paper over this by smuggling the context back in through side doors: cost models, scheduling tables, performance counters, profile-guided feedback. Every one of those is an admission that the interface lost something, plus a hand-built channel to ship the lost bits back upstairs. Which suggests the honest engineering question. When an interface can’t be lossless, the choice is not whether to expose the context but how to expose it on purpose, rather than leak it by accident.
What it changes for the machines writing the code
And then there is the version of this that is eating the industry right now. When a model writes you a function, or an explanation, or a plan, it is handing you an abstraction: a clean, context-free-looking artifact that stands in for a tangle of reasoning and constraints it never showed you. The artifact reads as if it is self-contained, as if you could lift it out and drop it into your system and it would mean what it appears to mean. But the correctness of that output almost always depends on context the output does not carry, the invariants of your particular system, the assumptions baked into the surrounding code, the failure modes nobody wrote down. The mutual information between the generated answer and the reality it has to survive in is high, and the answer, presented on its own, contains almost none of it. That gap, measured in bits you can’t see, is the entire risk. It is the same gap as the vector instruction, the virtual page, and the coherence fabric, scaled up to the level of whole reasoning and handed to people who were never shown the context in the first place. I am going to come back to that one in its own post, because it deserves more than a paragraph. For now the only point is that it isn’t a new problem. It is the oldest problem in abstraction, wearing the newest clothes.
What to do about it
So what do you do with a result that says most of your abstractions can’t be lossless? Not despair, and not the other failure mode either, the one where you decide abstraction is a lie and try to hold the whole machine in your head at once. You ship nothing that way. The move is humbler and more useful: learn to see, for each abstraction you lean on, roughly how context-free it actually is, and therefore roughly how much it is hiding. Treat the cost side of a clean interface as context-sensitive until it proves otherwise. When you build an abstraction over something that genuinely depends on its surroundings, don’t pretend the dependence away. Give the layer above an honest channel to the context it is going to need, and where you can’t hand over the context, hand over a contract instead: not a promise that the underside is simple, but a stated operating envelope, the bounds inside which the abstraction behaves and outside which all bets are off. This is the same thing that makes human organizations work. People coordinate best with defined roles and responsibilities, when everyone knows the edges of their own authority and whom to call when they reach one, and functions and systems are no different. You may never grok the full complexity above you and below you, but you can bound it, and a bounded leak is a survivable one: declare the envelope, enforce the limits at the edges, and you shrink the blast radius of the abstraction on the day it finally leaks. The truly lossless abstraction, the one whose underside you can actually forget, is a rare and beautiful thing, and most of engineering is the negotiation you run when you can’t have one. The skill, the same one from last time, is knowing the exchange rate: which bits are safe to hide, and what it costs you when you hide the ones that aren’t.
If you have an abstraction in your own stack that looks context-free and absolutely isn’t, the kind that only bites under load or only in production, I’d love to hear it. Those are the ones worth collecting.
Sources & Further Reading
- The Chomsky hierarchy: Chomsky’s 1956 ranking of grammars by how much surrounding context a rule may consult, and the origin of “context-free” and “context-sensitive.”
- Claude E. Shannon, “A Mathematical Theory of Communication” (1948): information measured in bits, and mutual information, the bits two things share.
- The principle of compositionality (Stanford Encyclopedia of Philosophy): the meaning of a whole built from its parts and how they combine; traditionally Frege’s, made rigorous by Montague.
- Curtis Dunham and Jonathan Beard, “This Architecture Tastes Like Microarchitecture” (WP3, 2018): the vector-ISA argument this grew out of, that an interface can leak the very microarchitecture it was meant to hide.
- Peter J. Denning, “Virtual Memory” (Computing Surveys, 1970): the working-set model and thrashing, the canonical case of a context-free correctness story sitting on a context-sensitive cost.
- Joel Spolsky, “The Law of Leaky Abstractions” (2002): the original line, “all non-trivial abstractions, to some degree, are leaky,” which this post is trying to give a reason for.
- Thomas M. Cover and Joy A. Thomas, Elements of Information Theory (2nd ed., Wiley, 2006): the standard reference for entropy and mutual information, including the conditional version the footnote leans on.
Footnotes
-
The precise version, for the reader who wants it. Call the hidden thing P, its interface A, the context you still expose C, and the property you care about B. The abstraction loses nothing about B when, given A and C, the hidden details of P tell you nothing more about B; in information-theory notation, the conditional mutual information I(B; P | A, C) is zero. This is sharper than the slogan in two ways that matter. Zero mutual information between a part and its context is not sufficient: let P and C be independent coin flips and let the behavior be their XOR, and nothing is shared between part and context, yet you still can’t predict the behavior from P alone. And it is not necessary: a part and its context can share heaps of information that has nothing to do with B. The loss is never all the mutual information between a part and its surroundings; it is only the slice of it that bears on what you’re measuring. ↩
-
There’s a third name for compressibility worth a nod, this one for a single finite object rather than a stream: Kolmogorov complexity, the length of the shortest program that reproduces a thing. An object is compressible when it has structure a shorter description can capture, and incompressible (random) when it doesn’t. The tie to context is direct, up to the usual additive constants: a part is losslessly separable from its surroundings when describing the two together is no shorter than describing them apart, which is mutual information again, wearing the algorithmic coat instead of the probabilistic one. (Kolmogorov complexity isn’t computable in general, so this is a way to think, not a meter you can attach.) Same geometry, third vocabulary. ↩