• bionicjoey@lemmy.ca
    link
    fedilink
    English
    arrow-up
    70
    ·
    2 months ago

    Makes sense. AAVE is mostly a spoken thing, LLMs are mostly trained on the corpus of written text on the internet and in books. It’s pretty rare for people to write in an AAVE style in those contexts.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      30
      ·
      2 months ago

      Except it has no difficulty reading and understanding AAVE, because people use it online frequently…

      Like, the article makes that abundantly clear, but everyone commenting just read the headline and assumed what it meant was it couldn’t understand it…

      • bionicjoey@lemmy.ca
        link
        fedilink
        English
        arrow-up
        24
        arrow-down
        1
        ·
        2 months ago

        I never said it can’t understand it. I am agreeing with the notion that it has a bias against using it.

        • givesomefucks@lemmy.world
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          23
          ·
          2 months ago

          You said it’s rarely used online, which just isn’t true.

          But like even this:

          I am agreeing with the notion that it has a bias against using it

          I’m not sure if you understand the bias is against users who use AAVE, or if you’re saying a LLM doesn’t want to use AAVE.

          Maybe you did understand everything, and you’re just being vague.

          But almost everything you said could be interpreted multiple ways.

          • sugar_in_your_tea@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            12
            arrow-down
            1
            ·
            2 months ago

            Well, if the training data is largely standard english, AAVE could look like less educated English, because it doesn’t follow the normal rules and conventions. And there’s probably a higher correlation between AAVE use and lower means and/or education because people from the black community who have higher means and/or education probably use standard English more often because that’s how they’re trained.

            So I don’t think this is evidence about the model being “racist” or anything of that nature, it’s just the model doing model things. If you type in AAVE, chances are higher that you fit the given demographic, because that’s likely what the training data shows.

            So, I guess don’t really see the issue here? This just sounds like people thinking the model does more than it does. The model merely matches input text to data in the model. That’s it. There’s no “understanding” here, it’s just matching inputs to outputs.

  • Ghyste@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    29
    arrow-down
    1
    ·
    2 months ago

    They can’t possibly encounter much of it in training material… Of course they’re not going to like it.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      9
      arrow-down
      13
      ·
      2 months ago

      What?

      It trains off social media, and even white kids use AAVE online. And kids make the most social media comments.

      A lot of times when someone posts a text screenshot and everyone talks about how kids talk crazy, it’s just a patois of AAEV mixed in with “regular” English.

      It should be able to “read” it fine.

      The bias part (as clearly stated in the article…) is when you ask a LLM to describe the person who would phrase something in AAVE, and the LLM replies back with stereotypes about Black people.

      So it can read and interpret it fine, it just has a bias against people who talk like that

      • TexMexBazooka@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        2 months ago

        LLM’s don’t have a bias against anyone, it’s literally just data. And those models are by and large fed with traditionally grammatically correct data. They don’t understand dialects, you’re looking soooooo hard for something to be offended over

        • givesomefucks@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          arrow-down
          1
          ·
          2 months ago

          If you’re going to revive a 3+ day old thread…

          At least read the article first so you have a clue what other people were talking about

    • flamingo_pinyata@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      9
      ·
      2 months ago

      I’d say this is exactly where the LLMs problems with it comes from. For most of us outside of the US and even a lot of people there, it’s exactly that - a caricature of a lower class black person. However for many people it’s a legit dialect of English they speak every day.

    • sailingbythelee@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      2 months ago

      I don’t live in America either, but I went on a cruise once and there were many Americans, including a black American couple who were very obviously urban. By which I mean, the wife wore high heels and a tight jeweled mini-skirt on a sea-kayaking excursion…clearly signalling that she hadn’t spent much time outside of a city.

      Anyway, I was shocked when they spoke exactly like The Jeffersons, with all the exaggerated whooping, non-stop vernacular, and stage-like mannerisms. It was so over-the-top that I honestly thought they were play acting, but after chatting with them for a while I realized that was just how they were. They were very nice people and clearly having a great time.

    • Dasus@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      2 months ago

      Not to be confused with African-American Vernacular English.

      Aave is what I’d say is more “the kind of language a stereotypical black character in a movie would use”.

      African-American Vernacular English[a] (AAVE)[b] is the variety of English natively spoken, particularly in urban communities, by most working- and middle-class African Americans and some Black Canadians.[4] Having its own unique grammatical, vocabulary and accent features, AAVE is employed by middle-class Black Americans as the more informal and casual end of a sociolinguistic continuum. However, in formal speaking contexts, speakers tend to switch to more standard English grammar and vocabulary, usually while retaining elements of the non-standard accent.[5][6] AAVE is widespread throughout the United States, but is not the native dialect of all African Americans, nor are all of its speakers African American.

      • sanpo@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        6
        ·
        2 months ago

        Well, “not to be confused”, but the same page says AAVE is just a dialect of AAE, so mostly not much of a difference, I think.

    • givesomefucks@lemmy.world
      link
      fedilink
      English
      arrow-up
      4
      arrow-down
      4
      ·
      2 months ago

      I was wondering where the V went…

      Apparently African American Vernacular English (said AAVE, pronouncing each letter) is just a dialect and there’s a couple other that fit under just AAE? I never knew about any of those beside AAVE.

      Seems to be proper name for the kind of language a stereotypical black character in a movie would use. Can’t say about real world, since I don’t live in the USA.

      AAVE is the “relaxed” English you’re talking about. And with the interconnectedness of the Internet, AAVE is kind of displacing the rest.

      But honestly from an etymological standpoint I think it makes sense to view AAVE as the base and then just having other flavors of it. From that link they’re trying to break it down I to multiple distinct groups.

  • Grimy@lemmy.world
    link
    fedilink
    English
    arrow-up
    19
    arrow-down
    3
    ·
    2 months ago

    So for those that didn’t read the article, it basically explains how LLMs have a negative connotation about AAE. When asked to associate words with AAE written phrases, it used words like “aggressive”. When given a normal English phrase and the same phrase but in AAE and then asked what jobs would suit this person, the LLM gave low income jobs for the AAE statement with broader options for the normal English one.

    It’s a serious problem because people that naturally write in AAE are most likely getting worse results. It stems mostly from old rascist newspaper articles and similar things.

    • thedirtyknapkin@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      ·
      2 months ago

      i bet it’s honestly more more from like 4chan and other modern online racist communities. where they would mock aave with racist caricatures. agree with the rest, but if it’s related to aave then i doubt the old newspapers were the source.

    • TexMexBazooka@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      2
      ·
      2 months ago

      It’s a serious problem because people that naturally write in AAE are most likely getting worse results

      Person using LLM built on grammatical rules of the English language has subpar results when operating outside of those rules. More at 6.

    • TheRealKuni@lemmy.world
      link
      fedilink
      English
      arrow-up
      12
      ·
      2 months ago

      Essentially, yes. Ebonics isn’t inherently offensive or inappropriate, as far as I can tell, but it has connotations that are not attached to AAE. Linguists avoid the term today, and modern uses of it tend to be derogatory.

      Source

  • randon31415@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    1
    ·
    2 months ago

    African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students -> Academic text have strong bias for text written by graduate students -> LLM training data has bias for academic texts -> LLMs have a strong bias for writing like training data.

    The error occurs upstream a bit, don’t point at the coders.

    • TexMexBazooka@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      2
      ·
      edit-2
      2 months ago

      Writing in AAVE is silly, just like someone from the Deep South including southern drawl in their writing would be, or someone from Boston spelling “car keys” as “kha kees”

      So

      African Americans have a weak bias against writing in African American English -> Colleges have weak bias against accepting African Americans as graduate students

      Is a bit of a jump. Someone writing in AAVE probably wouldn’t get accepted to college, because written word is supposed to transcend dialects and follow a set of rules to be universally understandable.

  • madcat@lemm.ee
    link
    fedilink
    English
    arrow-up
    17
    arrow-down
    26
    ·
    2 months ago

    Because there is no such thing as “African American English”. There is proper English and then there is slang.

      • madcat@lemm.ee
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        2 months ago

        Slang is slang. It’s always used verbally. I am not sure why someone would expect a llm to generate proper slang. Not sure at all how stating that fact makes one a “bigot”.

    • Wanderer@lemm.ee
      link
      fedilink
      English
      arrow-up
      2
      arrow-down
      1
      ·
      2 months ago

      It’s bad enough the American’s are too stupid to use the proper one that we have to have two.

      But people talking incorrectly is not a reason to write like that. Unless it’s a character speaking or whatever.