Rmz Lab Log: Software entropy

At work someone recently claimed that "entropy always increases, and the same is often true for software systems", and was there anything we could do about that.

This is of course a very interesting question so I started thinking. When I started   to formulate an email in reply to him it turned out to be way too long and slightly off topic, so instead of replying by mail I morphed my reply into a couple of blogposts (this and another in our internal wiki).

But getting back to the topic; actually it's only true that entropy always increases if you study closed systems   (it says so right in the wikipedia definition linked to above). In an open system you can move entropy out of the system and thus decrease entropy,   that's why refrigerators work.

In statistical mechanics entropy is basically defined as the number of states a system can be in that is compatible with what you observe. Technically it's usually the logarithm of the number of possible states that's used, but that's just a technicality to avoid dealing with really large numbers all the time.   So a refrigerator will move entropy out of the refrigerator, and by doing so will make the uncertainty related to the exact location of the atoms inside the refrigerator less, so the number of possible combinations of locations for the atoms will be smaller. See how it all fits together? ;-)

How do we bring the that metaphor back the domain of software? Well, if we define "program entropy" as the number of different things a human perceives a program to possibly being capable of doing, then that number of things defines that program's entropy. If you think the program can only do one thing it has an entropy of zero (log(1) = 0).   If instead you have some code that you can't wrap your head round and it could do a gazillion things for all you know, then log(gazillion) is the entropy of that piece of code.    So then if you have two implementations, functionally equivalent but one that has but has fewer possible interpretations as interpreted by a human reader than the other, then the less ambiguous code has lower entropy by this definition.   Note that since this is about perception of the programs by humans, not by computer it makes sense to talk about a program having different entropies for different people. Also it does make a difference if the program is short enough to wrap a human mind around it. If you can't wrap your mind about it you can't know what it does so its entropy will be high since it in theory could do many things. Entropy jumps in large leaps when things get incomprehensible but conversely also drops quickly when incomprehensible pieces becomes comprehensible.

The same kind of argument can be applied to configurations of systems (essentially graphs and not necessarily not lines of code), so anything that brings us clearer code or configurations will lower our entropy.

Does this make sense to you?

One topic I just thought of is what this definition of entropy does wrt. abstractions. By the definition above abstractions must have high entropy, or do they? Well, abstractions must have high entropy since they conceivably can do their things in many different ways, that's the very essence of abstraction. Let's take an ordered collection as an example: Ordered collections can can be implemented in a gazillion ways, but that doesn't matter since you're treating it only as an ordered collection and all those gazillion way are essentially equivalent. So perhaps it works the other way, an abstraction makes it unnecessary to think of all the different ways the program implements the abstraction, so instead of thinking of the many different implementation one things about only the abstraction, so the entropy is in fact lowered by using good abstractions.

Actually I think both these perspectives are valid, but if we let an individual reader's subjective perspective be the most central one, then I think it makes the most sense to claim that good abstractions will lower a program's entropy.   That is, it will be true if the reader is of the kind of person that thinks that one abstraction is less complex than a set of direct implementations, and I know by experience that not everyone is like that, so opinions may differ systematically depending on how well the reader is able to internalize abstractions.

Rmz Lab Log

Monday, December 10, 2012

Software entropy

No comments: