ChatGPT's creator OpenAI is being sued for secretly scraping 300 billion words from the internet, including books, articles, websites, posts, and personal information that was obtained without consent.

(post is archived)

You are currently inside a comment thread.

Click here to see all the comments (11).

[–] • 1 pt

Yes and no. There is a file called, "robots.txt", which sets crawling limits for the site. Nothing stops crawlers from crawling past (unless account restrictions exist), but it also sets a legal standard. Many sites' contents are crawled or indexed because of this defacto standard.

That said, copyright, which is the actual claim here, is pretty cut and dry. The AI is digesting the copyrighted contents to form at least part of its language model. This legally means the language model is a derivative work, which means the AI is in violation of copyright laws.

parent
link