do it work (edit: IT WORK)

pinned

by Fizzarolli - opened 8 days ago

8 days ago

do it work? (i'm planning on keeping my repo the same as the original model given to me by facebook, but it would be funny if substituting the llama 3.3 rope config back in just works and extends usable context length back to 128k)

Fizzarolli

8 days ago

update: did the benches myself, this version appears to be ~universally better (though possibly within standard deviation? i'm not sure)

shb777 pinned discussion 8 days ago

shb777

Owner 8 days ago

Thanks, I'll also run some evals later, did some vibe checks with ~100K input tokens and it looked ok, so I think it does work.

Fizzarolli changed discussion title from do it work to do it work (edit: IT WORK) 8 days ago

shb777

Owner 7 days ago

•

edited 7 days ago

Ran some evals , Results here
There is a statistically significant improvement, tho its not much.
Will run some long context evals later.

Kearm

7 days ago

Ran some evals , Results here
There is a statistically significant improvement, tho its not much.
Will run some long context evals later.

What was the technique? I'm planning a 1M extension.

shb777

Owner 7 days ago

•

edited 7 days ago

What was the technique? I'm planning a 1M extension.

Just updated rope_scaling, generation config and added chat template (by comparing with older llama config). All config changes only.
In the reddit thread we suspected that the 8K context length was just something from the finetuning API, not a limitation of the actual model.

shb777

Owner 7 days ago

Interpreting the evals correctly, there is only a small improvement but not statistically significant

Fizzarolli

7 days ago

i think that if there isnt a noticable decline in performance its probably the intended option

did anyone try testing MRCR/RULER/some other long context bench?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment