I think it could infer the meaning of words composed out of tokens it has already seen before, same way that you might be able to infer the meaning of an unknown word based on its prefix/suffix, country of origin, context, etc.
For an entire token that it hasn't seen before, it would have to rely only on context. Presumably it could do this, since that is after all the case in the early phases of training.
For an entire token that it hasn't seen before, it would have to rely only on context. Presumably it could do this, since that is after all the case in the early phases of training.