text and language intersect. in some ways, text is a superset of language, mostly due to social, or what is also called pragmatic, factors that complement semantics. also, the semantics/syntax interface is everything else than clear cut, at least in natural human languages.
Any text corpus is a subset of the language, under the normal definition that a language is the set of all possible sentences (or a set of rules to recognize or generate that set of possibilities). This text subset has an intrinsic bias as to which sentences were selected to represent real language use, which would be significant as a training set for an ML model.
So, perhaps you are saying that the text corpus carries more "world" information than the language, because of the implications you can draw from this selection process? The full language tells us how to encode meaning into sentences, but not what sentences are important to a population who uses language to describe their world. So, if we took a fuzz-tester and randomly generated possible texts to train a large language model, we would no longer expect it to predict use by an actual population. It would probably be more like a Markov chain model, generating bizarre gibberish that merely has valid syntax.
And, this is also seems to apply if you train the model on a selection from one population but then try to use the mode to predict a different population. Wouldn't it be progressively less able to predict usage as the populations have less overlap in their own biased use of language?
regarding the relationship: yes, and in most ways it probably is a subset. is there really such a set of rules that generate all possible sentences? in any case i wanted to say the materiality and cultural activity heavily influences what can and will be put into text and that is not strictly language. "selection process" might capture some, though i'm not sure whether all of it!
I think about this as shape and color. No one ever saw a shape that wasn’t colored and likewise there are no colored things that do not have a shape.
Also, displaying text without a font is not possible.
Text is the surface of the ocean where waves emerge, and while they have their own properties and may seem to naively have agency, they are an expression of the underlying ocean.
nicely put! many aspects of text at least historically have much to do with its materiality (also in a cognitive development sense, learning how to write etc.). what we can think about nowadays is that text and speech might not be a necessary materiality of language. language might depend more on conceptual systems. more like a substrate of intelligence and that might as well be nonhuman (to stay on topic).