do it work (edit: IT WORK)
do it work? (i'm planning on keeping my repo the same as the original model given to me by facebook, but it would be funny if substituting the llama 3.3 rope config back in just works and extends usable context length back to 128k)
Thanks, I'll also run some evals later, did some vibe checks with ~100K input tokens and it looked ok, so I think it does work.
Ran some evals , Results here
There is a statistically significant improvement, tho its not much.
Will run some long context evals later.
What was the technique? I'm planning a 1M extension.
Just updated rope_scaling, generation config and added chat template (by comparing with older llama config). All config changes only.
In the reddit thread we suspected that the 8K context length was just something from the finetuning API, not a limitation of the actual model.
Interpreting the evals correctly, there is only a small improvement but not statistically significant
i think that if there isnt a noticable decline in performance its probably the intended option
did anyone try testing MRCR/RULER/some other long context bench?
