It got me thinking about various things, those which I'm sure (especially corpus-based) NLP people have actually thought pretty hard about. I mean, just what can you figure out just form co-occurrence patterns of tokens? The fact that visualization is so helpful suggests that we have reached a point where the algorithms we know how to write (for visualization or for massive statistical analysis) and the algorithms in our brain we know to use (i.e. stare at something and see if any patterns reveal themselves) are no longer really in a relationship of the one just beating the other, but are complementarily useful.
A specific open question to those of you that do know a lot NLP/machine learning stuff: how hard/already pretty well solved is the following problem? You are given some random typical western european text from which all the spaces have been removed, and all letters have been permuted - that is, a random cipher has been applied, just to keep you from cheating and using dictionaries. Just from statistical properties of letter co-occurrence alone, figure out where spaces should go.