Generative AI tools ingest vast amounts of data, leading to potential violations of privacy, copyright, and other data regulations.
The same AI tools which unlock value are also capable of destroying that value if the data safety risks aren’t mitigated.
Using synthetic data, or data that's been modified to retain general features without specific details, can be a solution.
Generative AI can mine treasures from vast troves of unstructured data at previously unfathomable scale. But something else is lurking in those data sets: risk. The same AI tools which unlock value are also capable of destroying that value if the data safety risks aren’t mitigated.
“If you built your solution with bad data, you basically violated the law but haven't been caught yet,” said Michael Rispin, General Counsel at Sprout Social, speaking at A.Team’s recent webinar on Data Safety in the Age of AI.
At all levels of industry, the stakes are huge: Startups race to automate tasks and speed their time to market with help from AI plugins and bots. Meanwhile 80% of Fortune 500 companies already have teams working with generative AI. And as AI adoption goes mainstream with the release of GPT Enterprise, the question of data safety looms larger than ever.
To highlight the biggest risks facing AI practitioners, A.Team assembled a panel of experts that also included Ander Steele, Data Scientist and Head of AI at Tonic.AI, and Anjana Harve, Global Chief Information Officer at BJ’s Wholesale Club and A.Team CxO. The webinar, hosted by Angela Wu, a data scientist at Twitch and member of the A.Team network, identified the primary ways product builders may already be endangering their companies by employing the latest AI tools, and discussed how they might mitigate the risks.
So, what should come first in AI development: safety or innovation?
We put this question to the audience during the webinar and they voted near-even split—53% for safety, 47% for innovation.
“Risk and innovation are flip sides of the same coin,” said Harve. “AI puts a spotlight on challenges that we’ve had for a long time”—namely, that a company’s success and failure ultimately boils down to the quality of their data.
Don’t Cross Regulators—Even If the Regulations Aren’t Clear Yet
Fundamentally, generative AI tools are what they eat: They indiscriminately gobble up masses of data, drawing useful signals from the noise in ways no human can. Therefore privacy and copyright regulations—or any laws that govern data, really—are most likely to ensnare AI practitioners.
"Once you've trained AI models with private data, they can emit private data. That's not a theoretical problem. It's real,” said Steele.
Regulators are already cracking down. In May, the Federal Trade Commission fined Amazon and ordered it to delete huge stores of data captured by its Ring cameras, citing "egregious violations of users' privacy." Over the summer, the FTC halted operations of a company called Automators AI, saying its promises of AI-powered ecommerce solutions amounted to “nothing more than old-school deception.” ”For business opportunity sellers,” wrote Lesley Fair, a senior FTC attorney, ”’AI’ stands for “allegedly inaccurate.”
IP Ain’t Free
Violating copyright—or even the appearance that your AI tools might be violating copyright—can maim future growth opportunities. European companies governed by GDPR may decline to do business with you. Major platforms can block sales of your product.
The latter is already happening, observed Rispin: This summer, Valve, the operator of the massively popular game marketplace Steam, stated that it would not publish games that used AI to repurpose existing content and open source code in ways that infringed on existing copyrights. Game developers, commenting on Reddit, have said they believe this amounts to a wholesale ban on anyone using AI tools (Valve disputes this).
You Won’t Exit If Your Data Is Scammy
“You can't sell what you don't own...and you don't own anything created by AI,” said Rispin. “You don't have copyright ownership, and it's not patented either. When you tell that to executives who are always open to sales talks and want the big check, they get very shocked. They think you own something when you create it and save it to your desktop. You don't.”
Even if a company solidly owns its IP, it can make itself a target for litigation if it misuses AI tools or fails to adopt company-wide AI policies. Investors will knock a company’s value if it believes its data presents future legal risk.
Nobody, observed Rispin, “wants to invest in a bunch of pending lawsuits.”
Investors are watching the civil courts closely: In July, comedian Sarah Silverman sued OpenAI and Meta, saying that they trained their large language models using data that was obtained illegally without the permission of the copyright holders, including Silverman's book, Bedwetter. In September, George R. R. Martin, Jonathan Grisham, Jodi Picault, and 14 other authors sued OpenAI for what they call “systematic theft.” Big settlements could mean big haircuts for the valuations of AI-powered startups down the road.
The Key to Data Safety in the Age of AI
The only completely safe way to use them is to only feed the AI data that you have the rights to. That’s a tough trade off: Limiting the data an AI can access also limits its usefulness and power. One alternative, suggested Ander Steele, is incorporating “synthetic data”—data that began its life in real-world datasets full of sensitive and personal identifying information, but which were manipulated in ways that make all of it plausible, but not specific.
"At the end of the day, knowing what's in your data at a thousand-foot level is useful for understanding what's next, and what you can potentially use the data for,” said Steele.
It's OK to use AI code—but don't use it on your secret sauce.
There will always be a trade off, noted Harve. Earlier this year, the National Institute of Standards and Technology, published a framework intended to help organizations minimize the risks posed by AI-based technologies that could harm people, organizations, and the broader financial and environmental ecosystems.
Rispin suggested inventorying data, and getting sign off from high-level engineers or committees that understand the risk. If you buy data, have strong terms and conditions that have real legal assurances and protections that what you're buying to put into your model is compliant with data privacy laws. Or simply limit AI to smaller applications, reducing the risk if they turn out to be problematic.
"It's OK to use AI code—but don't use it on your secret sauce,” said Rispin.
Watch the full recording from this webinar here.