do it work (edit: IT WORK)

#1
by Fizzarolli - opened

do it work? (i'm planning on keeping my repo the same as the original model given to me by facebook, but it would be funny if substituting the llama 3.3 rope config back in just works and extends usable context length back to 128k)

update: did the benches myself, this version appears to be ~universally better (though possibly within standard deviation? i'm not sure)

image

shb777 pinned discussion

Thanks, I'll also run some evals later, did some vibe checks with ~100K input tokens and it looked ok, so I think it does work.

Fizzarolli changed discussion title from do it work to do it work (edit: IT WORK)

Ran some evals , Results here
There is a statistically significant improvement, tho its not much.
Will run some long context evals later.

Ran some evals , Results here
There is a statistically significant improvement, tho its not much.
Will run some long context evals later.

What was the technique? I'm planning a 1M extension.

What was the technique? I'm planning a 1M extension.

Just updated rope_scaling, generation config and added chat template (by comparing with older llama config). All config changes only.
In the reddit thread we suspected that the 8K context length was just something from the finetuning API, not a limitation of the actual model.

Interpreting the evals correctly, there is only a small improvement but not statistically significant

i think that if there isnt a noticable decline in performance its probably the intended option

did anyone try testing MRCR/RULER/some other long context bench?

Sign up or log in to comment