ATLANTA – Issues with generative AI, and how materials are collected to expand LLMs (large language models) continue to have an impact. Lawsuits against various companies like OpenAI continue to be filed alleging that original works have been obtained and used without the creators’ permission.
In November of last year, a complaint was amended and refiled naming Stability AI Ltd., Stability AI, Inc., DeviantArt, Inc., Midjourney, Inc., and Runway AI, Inc. as defendants, alleging their products were trained using the works of thousands of artists without consent and compensation. The original lawsuit, which was filed in October was partially dismissed by a federal judge.
According to reports a Google document that purported to be a “Midjourney Style List” was leaked and contained over 16,000 names of artists used to train the systems owned by the listed defendants.
Overall, the number of artists, creators, and writers impacted by AI scraping likely numbers in the hundreds of thousands. And other than lawsuits being filed, there seems to be no clear answer as to how to protect original works.
Some sites have simply chosen to remove all original work, effectively shuttering their websites. Sara Amis, who frequently had her work published by Luna Station Quarterly (LSQ), directly felt the impact when LSQ shut down its website archive last month to prevent articles from being scraped for use to train LLMs.
LSQ publisher, Jennifer Lyn Parsons outlined the reasons for moving the publication to Patreon in her March editorial.
Like many folks, I’ve been thinking about AI, specifically generative AI, a lot for the last year. As a software engineer, I’m decently well-informed on the guts of how AI works. As an author and publisher, I’m doing my best to stay on top of the impact AI is having on our world, with a focus on how creative people are being considered and the damage being caused to them and their work.
Many of you will remember when Clarkesworld had to close their submissions last year due to the massive influx of AI-generated stories. When that happened, one of our editors went into ChatGPT and asked it to generate an LSQ story. While amateurish and something we would have definitely rejected, the story had the themes and focus we’ve come to expect from our entries and the stories we publish. Our site had clearly been scraped for training fodder for ChatGPT.
On the surface, this may not seem like a big deal but for writers like Amis, it resulted in eliminating one of the few places where her work could be read for free. Writing news & poetry supports Amis’ academic career and work, “As a working writer, it either has to pay me actual money or it needs to advance my career in some way.”
Having her work stolen to be used to power AI systems? “At best it’s wrong, at worst it’s dangerous,” Amis said. She also believes that the rise of generative AI has accelerated the amount of written works being stolen.
The amount of potential problems that generative AI can cause seems endless. Just last week 404 Media reported that Instagram had been profiting from ads for an AI application that allowed users to create non-consensual nude images.
Apple removed three of these apps soon after the article exposing the nefarious application was published, and Google followed suit removing the apps from their Play Store platform. Once that happened, Meta, the parent company of Instagram, removed the app ads from their Ad Library but apparently similar ads remain on both Facebook and Instagram.
Meanwhile, Microsoft released its first annual Responsible AI Transparency Report on May 1. The 39-page report outlines how Microsoft intends to manage the expansion of generative AI in ways they hope will curtail major disruptive issues when it comes to how AI is applied and used.
Chief Responsible AI officer, Natasha Crampton was interviewed by The Washington Post ahead of the release of report. Her responses when it came to all of the legal complaints regarding the use of copyrighted works to train LLMs were rather opaque:
We believe that there are strong grounds under existing laws to train models. It’s not surprising that there are a bunch of different pieces of litigation on foot today to really understand where those boundaries are. We’ve seen these types of cases in every past technology transformation of its kind. They’re asking legitimate questions for which the courts will provide answers in due course.
We’re building a new AI economy, and there will be both existing laws that apply to that economy and new laws, as well. And one thing that’s very clear from Microsoft’s perspective is that a successful AI economy involves benefits being spread broadly.”
The WAPO article noted that just a day before the interview eight major daily newspapers had joined forces to file a lawsuit alleging that their published works had been used without permission by OpenAI and Microsoft.
The implications for how generative AI might impact online searches are yet another area of concern. The option of offering AI-generated responses to search queries rather than simply providing weblinks could seriously undermine advertising revenue for Google, Microsoft, and other search engines even if it still includes the core weblink to the source material. Businesses pay a lot of money to make sure their websites appear prominently in online searches to drive traffic to their sites and hopefully increase revenue.
Crampton’s response to those concerns is, “It’s really important to maintain a healthy information ecosystem and recognize it is an ecosystem. And so part of what I will continue to guide our Microsoft teams toward is making sure that we are citing back to the core web pages from which the content is sourced. Making sure that we’ve got that feedback loop happening. Because that is part of the core bargain of search, right? And I think it’s critical to make sure that we are both providing users with new engaging ways to interact, to explore new ideas — but also making sure that we are building and supporting the great work of our creators.”
The accuracy of various generative AI chatbots has also been continuously in question. The early rollout of Bing’s AI chatbot and Chat-GPT (Chat Generative Pre-trained Transformer) were rife with mis- and disinformation, often producing what was referred to as “hallucinations.”
In February, The Associated Press reported on a research project conducted by election officials and AI researchers at Columbia University. All of the five LLMs — OpenAI’s GPT-4, Meta’s Llama 2, Google’s Gemini, Anthropic’s Claude, and Mixtral from the French company Mistral — produced incorrect answers to some degree in response to basic queries regarding voting, polling places, and the democratic process.
They found that more than half of the responses were inaccurate and deemed “40% of the responses as harmful, including perpetuating dated and inaccurate information that could limit voting rights.”
When it came to questions about health, researchers at Stanford University found that the answers provided by LLMs were unable to cite factual sources for the answers they produced with any reliability.
To circle back to the spiritual implications, Amis had this to say, “There are a host of practical and philosophical reasons why we should be wary of so-called AI…the danger of misinformation, the damage to what used to be open community spaces on the internet, the stifling of free information. But there’s also an underlying spiritual reason.
“The Pagan religious movement as I’ve understood it and lived it these last 35 years (!) is about valuing life and the world as it is, as it really is. Divinity is not off in some rarified imaginary realm somewhere, but right here, in this leaf, this rock, this human being, and in our relationship to the actual living breathing world around us.”
She continued with, “Spirituality is one expression of that relationship; art is another. AI cannot express a relationship that it doesn’t have. It can only imitate, the feast in a fairy tale that turns out to be toadstools under the illusion. There’s no nourishment in it, and in some cases it’s poison.”
The Wild Hunt is not responsible for links to external content.
To join a conversation on this post:
Visit our The Wild Hunt subreddit! Point your favorite browser to https://www.reddit.com/r/The_Wild_Hunt_News/, then click “JOIN”. Make sure to click the bell, too, to be notified of new articles posted to our subreddit.