Week of 2022-04-25

Apr 29, 2022

Model compression

At the end of each journey in our process of understanding, we have an effective solution to the problem we were presented with. Here’s an interesting thing I am noticing. We still have a diverse, deeply nuanced mental model of the problem that we developed by cycling through the solution loop. However, we don’t actually need the full diversity of the model at this point. We found the one solution that we actually need when approaching the given problem.

This is a pivotal point at which our solution becomes shareable. To help others solve similar problems, we don’t need to bestow the full burden of our trials and errors upon them. We can just share that one effective solution. In doing so, we compress the model, providing only a shallow representation of it that covers just enough to describe the solution.

This trick of model compression seems simple, but it ends up being nothing short of astounding. Let’s start with an example of simple advice, like that time when an expert showed me how to properly crack an egg and I almost literally felt the light bulb go off in my head. It would have taken me a lot of cycling through the solution loop to get anywhere close to that technique. Thanks to the compressed model transfer, I was able to bypass all of that trial and error.

Next, I invite you to direct your attention to the wonder of a modern toothbrush. Immeasurable amounts of separate solution loop iterations went into finding the right shape and materials to offer this compressed model of dental hygiene. To keep my teeth healthy, I don’t have to know any of that. I only need to have a highly compressed model: how to work the toothbrush. This ability to compound is what makes model compression so phenomenally important.

We live in a technological world. We are surrounded by highly compressed mental models that are themselves composed of other highly compressed models, recursing on and on. I am typing this little article on a computer, and if I stop to imagine an uncompressed mental model of this one device, from raw materials scattered unfound across the planet to the cursor blinking back at me, my mind boggles in awe. To type, I don’t have to know any of that. Despite us taking it for granted, our capacity to compress and share models might just be the single most important gift that humanity was given – aside from being able to construct these models, of course.

Model compression introduces a peculiar extra stage to the process of understanding. At this fifth stage, our solution effectiveness is high, flux is low, but our model diversity is low as well. When we acquire a compressed model – whether through technology or a story – we don’t inherit the rich diversity of the model. We don’t get the full experiential process of constructing it. We just get the most effective solution.

It feels like a reasonable deal, yet there is a catch. As we’ve learned earlier, things change.

When my solution is at this newly discovered “compressed” stage, a new change will expose this stage’s brittleness: I don’t have the diversity of the model necessary to continue climbing the stair steps of understanding. Instead, it appears that I need to start problem-solving from scratch. This does make intuitive sense, and the compressed model compounding makes this even more apparent. When a modern phone suddenly stops working, we have only a couple of different things we can try to resuscitate: plug in the charger and/or maybe try to hold down the power button and hope it comes back. If it doesn’t, the vastness of crystallized model compression makes it as good as a pebble. Chuck it into a drawer or into a lake – not much else can happen here.

Lucky for us, this phenomenon of compressed models being brittle in the face of change is a problem in itself – which means that we can aim our solving ability at it. If we’re really honest about it, software engineering is not really about writing software. It’s about writing software that breaks less often and when it does, it does so in graceful ways. So we’ve come with a neat escape route out of this particular predicament. If my toothbrush breaks or wears out, I just replace it with a new one from the five-pack in which they usually come. If my laptop stops working, I take it to a “genius” to have it fixed. Warranties, redundancies, and repair facilities – all of these solutions rely on the presence of someone else possessing – and maintaining! – their diversity of the mental model for me to lean on.

This shortcut works great in so many cases that I probably need to draw a special arrow on our newly updated diagram of the process of understanding. There are two distinct cycles that emerge: the already-established cycle of learning, and the applying cycle, where I can only use compressed models obtained through learning – even if I didn’t do the learning myself! Both are available to us, but the applying cycle feels much more (like orders of magnitude) economical to our force of homeostasis. As a result, we constantly experience the gravitational pull toward this cycle.

🔗 https://glazkov.com/2022/04/26/model-compression/

Model compression and us

Often, it almost seems like if we run the process of understanding long enough, we could just stay in the applying cycle and not have to worry about learning ever again. Sure, there’s change. But if we study the nature of change, maybe we can find the underlying causes of it and incorporate it into our models – thus harnessing the change itself? It seems that the premise of modernism was rooted in this idea.

If we imagine that learning is the process of excavating a resource of understanding, we can convince ourselves that this resource is finite. From there, we can start imagining that all we have to do is – simply – run everything through the process of understanding and arrive at the magnificent state where learning is more or less optional. History has been rather unkind to these notions, but they continue to hold great appeal, especially among us technologists.

Alas, combining technology and a large-enough number of people, it seems that we unavoidably grow our dependence on the applying cycle. In organizations where only compressed models are shared, change becomes more difficult. There’s not enough mental model diversity within the ranks to continue the cycle of understanding. If such organizations don’t pay attention to attrition of its veterans, the ones who knew how things worked and why, they find themselves in the Chesterton’s fence junkyard. At that point, their only options are to anxiously continue holding on to truisms they no longer comprehend or to plunge back to the bottom of the stairs and re-learn, generating the necessary mental model diversity by grinding through the solution loop cycle, all over again.

I wonder if the nadir of the hero’s journey is marked by suffering in part because the hero discovers first-hand the brittleness of model compression. Change is much more painful when most of our models are compressed.

At a larger scale, societies first endure horrific experiences and acquire embodied awareness of social pathologies, then lose that knowledge through compression as it is passed along to younger generations. Deeply meaningful concepts become monochrome caricatures, thus setting up the next generation to repeat mistakes of their ancestors. More often than not, the caricatures themselves become part of the same pathology that their uncompressed models were learned to prevent.

In a highly compressed environment, we often experience the process of understanding in reverse. Instead of starting with learning and then moving onto applying, we start with the application of someone else’s compressed models and only then – optionally – move on to learning them. Today, a child is likely to first use a computer and then understand how it works, more than likely never fully grasping the full extent of the mental model that goes into creating one. Our life can feel like an exploration of a vast universe of existing compressed models with a faint hope of sometimes ever fully understanding them.

From this vantage point, we can even get disoriented and assume that this is all there is, that everything has already been discovered. We are just here to find it, dust it off, and apply it. No wonder the “Older is Better” trope is so resonant and prominent in fiction. You can see how this feeds back into the “excavating knowledge as a finite resource” idea, reinforcing the pattern.

In this way, a pervasive model compression appears pretty trappy. Paradoxically, the brittle nature of highly compressed environments makes them less stable. The very quest to conquer change results in more – and more dramatic – change. To thrive in these environments, we must put conscious effort to mitigate the nature of the compression’s trap. We are called to strive to deepen our diversity of mental models and let go of the scaffolding provided by the compressed models of others.

🔗 https://glazkov.com/2022/04/27/model-compression-and-us/

Nicklas Berild Lundblad

May 1, 2022

This is really interesting - and reminded me of a recent podcast from the Santa Fe-folks, where Krakauer noted that we have a tendency to gravitate towards one or two dimensional models.

What you are describing, in a way, is the reduction of the dimensionality of a problem to a single dimension, where one selected variable varies with another and can be expressed simply in a visual chart. And then - as you point out - when one of the dimensions reduced changes, well then we are stuck. In many senses, I think of model compression as a general case of image compression, and the loss rate is the challenge. The asymmetry is the cost -- and reconstructing a highly compressed piece of music or an image requires a lot of work if it is at all possible.

Another question is if there is a way for us to compress models and then provide a way to unpack the compression (a bit like file compression) or if all compression to be useful implies the loss of dimensionality.

Expand full comment

What Dimitri Learned

Discussion about this post