Generative artificial intelligence synthesizes, not copies

OpenAI, and by proxy, generative artificial intelligence as a whole, wins in it first round of the court battle raised by Raw Story Media Inc. y Alternet Media Inc. regarding the use of your data in training algorithms.

The two companies, which run news pages, had reported to OpenAI last February for copyright infringementsfor having used thousands of articles on their pages to train ChatGPT so that it could respond to human cues. According to the companies, OpenAI's chatbot, ChatGPT, played their copyrighted material “verbatim or near-verbatim” when requested.

The lawsuits accused OpenAI of violating the Digital Millennium Copyright Act (DMCA) by removing copyright identifying information, such as author names and titles, to facilitate infringement, and asking the court for monetary damages of at least $2,500 for each violation and an order requiring OpenAI to abandon the alleged improper use of your work.

آ؟The judge's response Colleen McMahon? That the removal of copyright management information, such as the names of authors or media articles, for the purpose of training generative artificial intelligence tools (without involving the dissemination of those works) cannot be qualified as a adverse effect necessary to establish the legitimacy of the case, which is therefore dismissed. The litigants can raise it again, but in principle, the result of this first battle favors OpenAI.

What is fundamental? That generative AI synthesizes, not copies. It works like our brains: we see, hear and read things, but our memory does not carry out a copying process, but rather a synthesis process, and the captured memory does not and cannot constitute a copy, but rather a reconstruction.

Furthermore, the data sets used in training algorithms are so huge that it is extremely unlikely that plagiarism of any specific part will take place. Given that it would take us around 170,000 years to be able to read the GPT4 training database data set by reading continuously eight hours a day, any part of it is quantifiably tiny.

Against many who believe that generative artificial intelligence, such as Chat GPT or Midjourney, is something immoral because it supposedly steals from the artists whose works they train, claiming that there is nothing new in everything they generate and that they are based on extracted data of the Internet without the consent of the creators, it must be argued that in no case do these algorithms obtain enough information from any specific source to be considered theft. From a legal perspective, copyright law requires that any infringement not only show substantial similarities to an original work, but that the alleged infringer must also have had access to the original work. And while it is obvious that they had access to the original works, it does not seem that any art produced by this procedure is, as such, theft.

From its origins with the Statute of Queen Annethe basic foundations of copyright protection laws depend on copying and distribution, and those processes are not occurring in the case of generative artificial intelligence. All newly created content is, of course, based on previous content to some extent.but here we are not only talking about a question of level of similarity, which is extremely small when diluted among the enormous magnitude of the data used, but also about the methods used, which in no case constitute neither a copy nor a distribution. If artificial intelligence is theft, every work of art since the Stone Age is. The writers and artists of today were influenced by those of yesterday, and will influence those of tomorrow.

Let's stop this nonsense and try to raise copyright as a protection against absolutely anything: everything that is publicly available on the Internet, that is, any page that a person can legitimately access , can also be accessed, and in fact is, by automatic procedures for its indexing, but also for consultation and, from there, for the training of any algorithm, in the same way that it is accessed by a person. In no case can it be claimed that a person, when accessing a page, makes a copy in their memory, in the same way that it cannot be claimed that they violate any copyright when they tell another person what they saw. The algorithm is in exactly the same case.

Of course Generative artificial intelligence challenges copyrightand it is very good that he does so. Copyright has to be limited, not extended, because otherwise it would lead us to all kinds of absurd nonsense and to an absolutely retrograde level of reasoning, contrary to all innovation. If you do not want some works to be used for training algorithms, do not make them available on the Internet to anyone: keep them locked up where no one can see them. But since you probably didn't create them for that, you would be committing your own contradiction, as we have been doing for years under intellectual property laws supposedly designed to protect the author and encourage creation, but which in reality protected the owners. of the means by which these works were distributed, even when the authors had already been dead for many years (and, therefore, could not be incentivized to create nothing more).

Let us hope that the judge's doctrine continues to prevail in future instances. The opposite would simply be outrageous.

This article is also available in English on my Medium page,آ آ«At last, it seems the law recognizes that AI synthesizes, not copiescome on

Leave a Reply Cancel reply