This is just a musing; it should not be mistaken for an attempt at insight. I am not an information scientist. Two separate topics popped up recently, which made me think of an old idea I’d thought about a long, long time ago. Can we generate all possible interesting images?
I can’t remember where I first heard about this concept. It’s possible I came up with it on my own back when I was playing with graphics APIs in high school, but I cannot be sure. See back in those days, you rendered not to 4K Ultra HD TV via a 46-core graphics processor, but through a software renderer on a single-core CPU to a CRT monitor that would be barely recognizable as a computer screen to a 6-year-old these days. And you’d do it at a resolution of 320 by 200 pixels. Brutally archaic, right? But really, 320×200 is bad, but not The Worst; it’s still recognizable enough to make fun images and games. That lead me to this idea. Take any photo. Now re-size and/or crop it to 320×200 and make it grey-scale. It may not be perfect, but it’s passable enough to tell what it is. It’s also very small in data size, and simple in structure. For example, say you’ve got a picture of a dog in front of a rainbow.
Not bad. So think on that. What can I do with that? What if I write a program that generates every permutation of a 320×200 grey-scale photo? I’d have every possible image a human being could ever see. At least, a small, black and white version of every possible image.1 Every frame of every movie. Every family vacation photo. Every comic book panel. Every possibility is contained in those 320×200 pixels. Our every-pic-generating program doesn’t work, though. At least, it doesn’t work if you don’t have a hard drive that is the size of the universe and a few forevers to wait for the algorithm to churn.
Darn. So what else could we do? Well let’s try to cheat a bit and skip over all those boring mostly-dark / mostly-light pictures the algorithm starts with. Let’s randomly generate our pictures! We can just pick a grey-scale value for each photo pixel, and then check if the output is interesting. Unfortunately, if you fire up your favorite random pic generator, you’ll see not someone’s trip to Yosemite, but rather something that looks like this:
Total noise. The number of possible “noise” pictures so overwhelms the number of “interesting” pictures that you’ll almost certainly never get a good one this way. The algorithm will run for a practical eternity before the selfie you took last week pops out. A thousand monkeys at a thousand typewriters won’t make any Shakespeare works after all. When I originally pursued this idea many years ago it blew my mind that something so seemingly small, like a 320×200 grey-scale pic, actually defines this gargantuan space of useless noise. Even though there is an equal chance of randomly generating “dog in front of rainbow” as there is “rectangle of noise #1023910239109231312388845322”, there are so many more rectangles of noise out there that they are all you see.
Which brings me to the first topic that made me think about this old idea. Passwords. The same concept applies to passwords, specifically the difference between pass-phrases made up of a few real-life words, and passwords made of truly random character choices. A friend of mine insisted pass-phrases were just as good as using random characters. People don’t understand how random random really is. There are, depending on who you ask, about 250,000 words in the English language. If you pick 4 of them for your pass-phrase, you’re at 250,000^4 possible choices (in real life you’re much more likely to choose from among around 10,000 common words, though). If you’d just picked 1 of 96 common keyboard characters for each of 24 characters, you’d be at 96^24 choices. You might think 250,000 to the 4th is somewhat close to 96 to the 24th, but unfortunately that’s not the case. There is a difference of 26 orders of magnitude between them. One is guessable within the lifespan of the universe, the other is not. Pass-phrases are far superior to single-word passwords for security, but they don’t come close to matching the confounding power of real randomness.2 There is just so much more stuff out there that isn’t a word than is.
And that leads to the other topic that I’ve been reading about lately, functions. As in programming functions that take in some values and spit back some result. What does this idea of random noise overwhelming the quantity of “interesting” data have to do with functions? There was this great article by the creator of the new Unison development environment, where he mentions de-coupling functions from their packages – freeing us from Apps and APIs. I’ve had similar thoughts in the past, though not as elegantly stated or thoroughly thought-out. If we tried, we could get to a point where we’re all working in a common “type” format and (if we’re touching pure functions) we can reliably create and distribute them to everyone in the world. Why are we wasting all this time writing the same code over and over? Other developers and programming environments have pushed towards this idea of unique, universal, and available functions, too.3
It’s such a fun idea. If we happen to be the first to need some function, once we’re done making it, anyone else can and would feel free to re-use it. We would post functions up on our Twitters or Facebooks just like we do our family vacation photos or pithy pop culture commentary. “Here is my new awesome photo filter thing! It takes in a few parameters and returns you photo all hipstergrammized!” “Here is my new economic model based on Gaussian sub-market fields. It takes in some starting parameters and returns a time-series you can step through to see the results.” “My function takes two numbers and returns their sum.”
Now our job, as developers, becomes identifying behaviors that are useful, plucking the diamonds out of the rough; we’d be pulling functions from the noise.
- You don’t really even have to limit yourself to black and white. 8-bit color isn’t great, but you can get passable results with it using the same amount of data as 8-bit grey-scale. ↩
- If you really need a practical take-away from this article, it’d be this: use a good pass-phrase for access to your password keeper software of choice, and use auto-generated 24+ character completely random passwords for everything else stored in said keeper. It’s the best we can do until someone finally comes up with a better way to manage security than passwords. ↩
- I swear I read an article from Joe Armstrong of Erlang along these lines, but all I can find now are his arguments on reducing entropy in documents using hashing. Not quite the same, but interesting. ↩