Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I keep reading that GPT is a "smarter" "complex" Markov, in the end, just a function spitting out next word with some probability.

But from my experience that cannot be true - it has to learn somehow. There is an easy example to make. Tell it something that happened today and contradicts the past (I used to test this with the Qatar World Cup), and then ask questions that are affected by that event, and it replied correctly. How is that possible? How a simple sentence (the information I provide) changes the probabilites for next token by that far?



There are two parts of the knowledge at play here.

1. The trained knowledge included in the parameters of the model

2. The context of the conversation

The 'learning' you are experiencing here is due to the conversation context retaining the new facts. Historically the context windows were very short and as the conversation continues it would quickly forget the new facts.

More recently context windows have grown to rather massive lengths.


Mostly #2 of what the other response says. The entire input to the NN is what it's learning off - the weights are static, you are changing the probabilities based on which context is provided and it's length.


Markov chains also depend on the initial state. In GPTs the initial state is your conversation history




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: