Has anybody pointed out that AI language models are William Burroughs cut-ups?
This is how I was thinking about it when I first started doing Markov chains. The likelihood of words appearing next to each other is less random than pure shuffling, but any text you cut up has a built-in “language model” from which words it uses and how often.
You can get some semblance of an author’s mind just by sampling from the frequency of which words are paired together. You can get more by sampling with a better method. Transformers use larger windows, more complex attention patterns, but they’re still next-word predictors.
Unfortunately when I’ve tried to fine-tune large pre-trained models on Philip K. Dick, it has gone awry. As soon as the model goes out of distribution a little, it rolls downhill toward generic and even problematic outputs. https://deepfates.com/sciops/2019/05/14/turbulent-priest.html
Which is to say, the model inherits biases, stereotypes and clichés from its parent language. Just as we all do, of course. But the inputs to PKD’s brain over his lifetime were definitely different than the modern language and world-model contained in internet datasets.
Language models find dimensions in semantic space that correlate with our own usage, and they can compress ideas into coordinates along those dimensions! This is analogous to analogies even before the fact that they can also literally use analogies.