Why The New York Times might win its copyright lawsuit against OpenAI::The AI community needs to take copyright lawsuits seriously.

  • SkyNTP@lemmy.ml
    link
    fedilink
    English
    arrow-up
    42
    arrow-down
    14
    ·
    9 months ago

    You are not wrong that monopolies granted by copyright are regularly and unfairly abused.

    That being said, AI trainers are getting away with plagiarism right now. More importantly, it’s not just violation of a single copy, it’s potentially the creation of tools that enable mass derivative copies. Authors that create training data need to be compensated.

    • General_Effort@lemmy.world
      link
      fedilink
      English
      arrow-up
      10
      arrow-down
      14
      ·
      9 months ago

      Authors that create training data need to be compensated.

      There should not be a problem with that. The people who work on training datasets are already being paid.

      The reason you are getting downvoted is that these lawsuits are not about that. These are about giving money to corporations like the NYT - or Reddit, or Facebook, etc - for the “intellectual property” that they already have lying around. It’s pure grift.

      Because the creation of all that is already paid for, that leaves all the more money for lawyers and PR campaigns to extract money for nothing from society.

      • tb_@lemmy.world
        link
        fedilink
        English
        arrow-up
        18
        arrow-down
        3
        ·
        9 months ago

        There should not be a problem with that. The people who work on training datasets are already being paid.

        How are the people whose articles and comments are being scraped compensated?

        Because the creation of all that is already paid for

        “This perfectly good movie has already been made and paid for, that means I can watch it without compensating the studio.”

        I do not agree with Reddit selling the comments of their users. Even so that’s a ridiculous statement to make.

        • General_Effort@lemmy.world
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          11
          ·
          9 months ago

          How are the people whose articles and comments are being scraped compensated?

          By people who work on training datasets I mean, EG, the people on Amazon Mechanical Turk. I am not working on a dataset by writing this comment. I’m putting some things straight and getting exactly the payment I was promised - IE none.

          “This perfectly good movie has already been made and paid for, that means I can watch it without compensating the studio.”

          Let’s take the NYT as an example. To publish their newspapers, they need to pay reporters, but also editors and assistants. They also need offices, and for those they need to pay maintenance and janitorial staff. To get it out there, they need printers, server admins and such.

          In order for this to work, the NYT needs to make back the money that they have paid these people, plus some profit for the owners. This has already been achieved for any issue that’s older than a few days. Before the internet, either an issue sold enough or it didn’t. No one cares about yesterday’s news. I doubt the internet changes that very much. That’s what I mean by “it’s already paid for”.

          For a movie, the time horizon is probably a year or so. IDK to be honest. AFAIK, it used to be that if a blockbuster did not make a profit in cinemas, it was over. Maybe the time horizon was longer for direct-to-DVD productions. I guarantee you that no corporations plans ahead more than a few years. Patents last only 20 years and that’s more than enough to finance all the expense for R&D that has created modern tech.

          I think it is absolutely ridiculous that corporations can still extract money for something that was made in the 1940ies and even earlier. That does not pay for movies, because it’s not money that was ever calculated with. It only pays for the creation of paywalls. As long as enforcing a copyright pays for that enforcement, it will be done.

          • tb_@lemmy.world
            link
            fedilink
            English
            arrow-up
            4
            ·
            9 months ago

            In order for this to work, the NYT needs to make back the money that they have paid these people, plus some profit for the owners. This has already been achieved for any issue that’s older than a few days. Before the internet, either an issue sold enough or it didn’t. No one cares about yesterday’s news. I doubt the internet changes that very much. That’s what I mean by “it’s already paid for”.

            So the moment a property breaks even, + makes “some profit”, you should no longer need to pay for it? Only when people still “care”, in that case they should pay?

            Just because it’s a news article or a comment doesn’t mean it’s fair game all of a sudden.

            And movies can make back their budget in the opening week(end) when they’re popular. The timeframe is irrelevant for your argument. At least if we’re talking about anything less than a decade or two old, because…

            I think it is absolutely ridiculous that corporations can still extract money for something that was made in the 1940ies and even earlier.

            … with this I do agree.

            • General_Effort@lemmy.world
              link
              fedilink
              English
              arrow-up
              1
              arrow-down
              1
              ·
              9 months ago

              So the moment a property breaks even, + makes “some profit”, you should no l onger need to pay for it? Only when people still “care”, in that case they should pay?

              That’s not what I wrote, is it?

              The problem with your idea here is that some movies/games/etc never make back the investment. That would mean that they would never run out of copyright if we did it that way. That some movies are duds also means that, on average, the rate of return on such investments is dragged down.

              In a functioning market, the average ROI should be the same across the board. If something has a lower return, then people simply don’t invest in it. That’s clear, I hope. This means, that putting a cap on the profit that may be expected will reduce investments.

              Obviously, the only returns that matter for this reasoning, are expected returns. Only the expected returns fund movies, etc., and that’s why the timeframe matters.


              At this point, ideology (or philosophy) becomes important. One has to ask: What is property about?

              There are different philosophical views around this subject, but I am really only concerned with the practical outcome. The political right tends to hold an expansive, absolute view of property rights, to the point of rejecting taxation as illegitimate. The original definition of the political right was as supporters of the monarchy. It makes sense that the right would morph into something that supports all kinds of heritable privilege or right. The anti-capitalistic right seems to have largely disappeared. They often don’t agree that intellectual property is property.

              The left tends to hold more nuanced and pragmatic views. Property rights are balanced against other rights; the interests of other people and society at large. The US Constitution takes this view of copyrights and patents. [The United States Congress shall have power] To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

              This latter view is, more or less, one to which I subscribe. Without copyright, there would be only public domain. Copyright integrates creative works into a capitalistic system by turning them into capital. Intellectual property enables people to make a profit with intellectual products. However, there is a clear limit to how much profit one may extract. One may only expect that profit, which actually incentivizes intellectual production. I actually hold the general view that all (commercial) property is only legitimate as long as it works beneficially for society.

              So, I do not believe that anyone is entitled to windfall profits. I have published stuff on the web for my own reasons. Other people have found a new use for that by creating AI training datasets. I do not believe I have any moral justification to demand a share of their work.

              I hope that clears it up.


              Clearly you do not agree with this view. Obviously, you have some more absolutist view of intellectual property. I would appreciate it if you laid out your view of things. You don’t need to answer these question, I’m just putting them here to say what is unclear: How does one create or obtain intellectual property? What can be intellectual property and what are the limits? To what does this property entitle one?

              Finally: How come that these threads bring out so much support for right wing views? Looking at other threads, I would have expected left wing views to dominate. Looking at the piracy community around the corner, I would have thought that even among the right, copyright abolitionist views would dominate here. What gives?

              • tb_@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                ·
                9 months ago

                That’s exactly what you wrote?

                In order for this to work, the NYT needs to make back the money that they have paid these people, plus some profit for the owners. This has already been achieved for any issue that’s older than a few days. Before the internet, either an issue sold enough or it didn’t. No one cares about yesterday’s news. I doubt the internet changes that very much. That’s what I mean by “it’s already paid for”.

                Your argument was that the sources that get scraped have already been paid for. I don’t see how it’s any different for newspapers than it is for movies and such. It’s not like news agencies are eternally profitable and never go bankrupt. Nor do I want corporations to profit for free off the comments I wrote, even if I may or may not have signed my soul away in some EULA nobody reads.

                • General_Effort@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  9 months ago

                  I take it that my post was too long to read. The only thing I can do is write more, which obviously will not help. So there’s nothing I can do.

                  I don’t believe you actually want that right-wing hellhole you are clamoring for. But in the end, what counts is what you vote for, what you ask for, and not what you want inside.

                  • tb_@lemmy.world
                    link
                    fedilink
                    English
                    arrow-up
                    1
                    ·
                    9 months ago

                    You seem to have misinterpreted my “alignment”, if you will. I do agree my arguments here leaned pretty heavily on the corporate side.

                    But many of these AI are either run or backed by these same massive corporations. Corporations who staunchly defend their own copyright, yet don’t mind taking from the little guy and breaking their own unfair rules even further.

                    I am, generally, anti-AI. As may have been apparent. I wish not for my words to be vacuumed up into a black box to be spat back out at me.
                    Whilst I think some amount of copyright is fair, 80 years is far too many. Putting a cap on how profiting any property can be is an interesting take.

                    But that’s not part of the conversation. It’s wrong for AI companies to take whatever data they can get their hands on just because it’s out there for human eyes to read. Whether that content has outlived its newsworthy usefulness or not.

    • littleblue✨@lemmy.world
      link
      fedilink
      English
      arrow-up
      14
      arrow-down
      20
      ·
      9 months ago

      AI trainers are getting away with plagiarism right now.

      No. They fucking aren’t. 🤦🏼‍♂️

      • Adanisi@lemmy.zip
        link
        fedilink
        English
        arrow-up
        12
        arrow-down
        8
        ·
        9 months ago

        It looks like someone hasn’t seen the video of Copilot spitting out the Quake inverse sqrt algorithm verbatim.

        • barsoap@lemm.ee
          link
          fedilink
          English
          arrow-up
          4
          arrow-down
          4
          ·
          9 months ago

          While it got popularised as “Carmack’s reverse” the algorithm is actually significantly older.

          Also you’d have to show that it was literally copy+pasted, including comments and all, to even have a chance at a copyright claim: Algorithms are not subject to copyright, similar to how story structures aren’t. This is like saying “I asked an author to write a book and they plagiarised the hero’s arc!”. And even if it was copied straight-out you’d have an uphill battle to fight, to wit, wikipedia is quoting the thing verbatim.

          That said copilot seems to be severely over-fitted in places, and I don’t like the thing one single bit, and the only thing it’s generally good at is writing code faster that shouldn’t have been written in the first place, but inverse sqrt isn’t a good example.

          • Adanisi@lemmy.zip
            link
            fedilink
            English
            arrow-up
            4
            arrow-down
            1
            ·
            edit-2
            9 months ago

            It didn’t just get the gist if the algorithm though, it literally had the same magic number (which isn’t even the most optimal iirc), the same COMMENTS (//what the fuck?), same variable names, etc.

            It didn’t produce the algorithm logically, it copied it.

            Wikipedia is also adhering to the GPL license of the code. Copilot is not, especially if it’s working on proprietary code or adding an MIT license header to copied GPL code (lol)

            • barsoap@lemm.ee
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              9 months ago

              It didn’t produce the algorithm logically, it copied it.

              The magic number is part of the logic of the thing.

              But yes as said copilot is overfitted. Inverse sqrt still isn’t a good example, it’s nearly as bad as Oracle trying to claim to have found copyright infringement in Android’s standard Java library by saying that Math.average or whatnot is identical. There are way better examples of why copilot is fucked up.

              • Adanisi@lemmy.zip
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                1
                ·
                9 months ago

                The magic number is part of the logic, yes, but that’s not even the best magic number for the job iirc, and nobody remembers how they got it.

                I just used this as an example because it’s incredibly clear that it was copied verbatim (again, comments like “what the fuck?” showing up, you can’t tell me it came up with that itself)

            • tb_@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              1
              ·
              9 months ago

              I had bing chat spit back at me the question I posted on stack overflow the day before. You know, the example code I provided which didn’t exactly work as I wanted.

          • tb_@lemmy.world
            link
            fedilink
            English
            arrow-up
            9
            arrow-down
            1
            ·
            9 months ago

            “They aren’t getting away with plagiarism”

            - “There has been some plagiarism”

            “Some plagiarism doesn’t count!”

            • littleblue✨@lemmy.world
              link
              fedilink
              English
              arrow-up
              2
              arrow-down
              10
              ·
              9 months ago

              Your lack of understanding the facts of the situation, much less the definition of plagiarism isn’t a strong argument.

              • tb_@lemmy.world
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                9 months ago

                isn’t a strong argument.

                Because “no it’s totally not like that” is?

              • Adanisi@lemmy.zip
                link
                fedilink
                English
                arrow-up
                3
                arrow-down
                1
                ·
                9 months ago

                Go on then. If copying a whole function of code verbatim INCLUDING comments like // what the fuck? is not plagiarism in the context of software, what is?