That's just the default. You can set max_seq_len to 8k. From the readme [0]: > A... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		mkolodny on May 17, 2024 \| parent \| context \| favorite \| on: Llama 3 implemented in pure NumPy That's just the default. You can set max_seq_len to 8k. From the readme [0]: > All models support sequence length up to 8192 tokens, but we pre-allocate the cache according to max_seq_len and max_batch_size values. So set those according to your hardware. [0] https://github.com/meta-llama/llama3/tree/14aab0428d3ec3a959...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact