• zbyte64@awful.systems
    link
    fedilink
    English
    arrow-up
    4
    ·
    9 hours ago

    Adding the benchmark back into the training process doesn’t mean you get an LLM the can weed out irrelevant data, what you get is an LLM that can pass the new metric and you have to design a new metric with different semantic patterns to actually know if it’s “eliminating red herrings”.