• ayaya@lemmy.fmhy.ml
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    1 year ago

    It is impossible for an AI to cite its sources, at least in the current way of doing things. The AI itself doesn’t even know where any particular text comes from. Large language models are essentially really complex word predictors, they look at the previous words and then predict the word that comes next.

    When it’s training it’s putting weights on different words and phrases in relation to each other. If one source makes a certain weight go up by 0.0001% and then another does the same, and then a third makes it go down a bit, and so on-- how do you determine which ones affected the outcome? Multiply this over billions if not trillions of words and there’s no realistic way to track where any particular text is coming from unless it happens to quote something exactly.

    And if it did happen to quote something exactly, which is basically just random chance, the AI wouldn’t even be aware it was quoting anything. When it’s running it doesn’t have access to the data it was trained on, it only has the weights on its “neurons.” All it knows are that certain words and phrases either do or don’t show up together often.