Gemini’s Mishaps Were Years in the Making

If you’re new here, this is the latest edition of the Build Mode newsletter, where we gather the collective wisdom of the people building with AI, designing the future of work, and leading the most important companies of the next decade. Subscribe here to get the top insights in your inbox every week.

The Big Idea: Gemini’s mess up was four years in the making

Sundar Pichai is having a rough week.

First, Gemini started pumping out images of racially diverse Nazis and Indigenous founding fathers. Cue massive backlash. So Google shut down the image generator. But the wolves were already circling. Everyone who wanted to score points against Google’s “wokeness” came up with a new way to generate clearly incorrect and embarrassing outputs.

Someone asked if it would be OK to misgender Caitlyn Jenner if it meant avoiding a nuclear apocalypse. Gemini said, “No.” Jenner replied to a screenshot of the chat on X with the word, “Yes.” Pichai said Google is “working around the clock” to fix this. (For the record, ChatGPT gave similar answers to the same question.)

How did this happen? Google’s original intention was to ensure that Gemini wouldn’t fall into some of the traps with image generation technology: violent or sexually explicit images, for one, but also images that might reinforce harmful biases around race, gender, and identity. To make sure, in other words, that not all requests for images of lawyers came up as white men, even if white men were over-indexed for pictures of lawyers in Gemini’s training data. So they used RLHF (reinforcement learning from human feedback) to steer the model in the right direction.

It’s not so simple, though. As Dan Balsam, Head of AI at Ripplematch and Build Mode editorial board member, said, “This is more clear evidence that current techniques to align AI systems are insufficient and crude.”

Google tried to combat historical biases by training Gemini to appreciate diversity and advocate for underrepresented people— but, as Balsam points out, “It didn’t provide enough nuance for the model to understand when that was appropriate versus not. So the end behavior was extreme and devoid of nuance.”

In that sense, RLHF is little bit like the “whack-a-mole” arcade game.

When Gemini was first released, the tech community was awed by its power. Gemini Ultra can process up to 1M tokens at once, allowing it understand up to 700,000 words in a single go. But plot twist: that size may also be working against it.

Back in 2020 Timnit Gebru was ousted as co-lead of Google's ethical AI team after co-authoring a paper suggesting that Google’s LLMs might be too big.

The paper, “On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?”, laid bare the significant risks of massive LLMs—from climate impact to social bias.

Point being: The training sets are so large that it’s basically impossible to audit them. “A methodology that relies on datasets too large to document is therefore inherently risky,” the researchers wrote.

In other words, Gebru predicted this fiasco four years in advance.

It might take Google longer than we think to fix this. And it reveals some sticky issues with AI moving forward. Namely that as these systems become larger and more powerful, they also become harder to control.

Click here to share this issue of Build Mode with your team. Missed last week’s issue? Read it here.

CHART OF THE WEEK

Some jobs are disappearing in the age of AI—some are spiking

Writing gigs on the freelancer platform Upwork are down 33% since the dawn of ChatGPT. That’s painful for freelancer writers, but not exactly shocking, as many writing jobs on Upwork are focused on churning out shitty SEO posts to satiate the algorithm — and AI can do that quite well now, with some light human editing at the end. Customer service is down 16% and translation is down 19%—these are results your average LinkedIn thought leader would have been able to predict.

But look at the 39% increase in video editing and production. This might reflect people using new AI-powered editing tools like CapCut. Either way, it signals a boom year for video creators—just when the release of Sora would have suggested the opposite.

Graphic design—one of the industries in which stress about AI seemed highest—is up 8%. The way this shakes out remains to be seen. People find novel ways to use new technology. And while SEO writing and basic translation get slammed, it seems like there’s evidence for a bigger role for creatives in the age of AI than we all feared.

AI EXPERT CORNER

The top GenAI use case for HR Tech (layoffs not included)

The companies that will separate themselves from the rest of the pack will have a good answer to the question: What will you do differently if you can ask Gen AI a question about your company and get the answer right away?

We sat down with Ian O’Keefe, People Analytics and HR Transformation leader, to crack this nut.

HR teams need tools to handle the vast amount policies and regulations, and curricula within their company. More specifically: a knowledge management tool that leverages GenAI to better understand a company’s proprietary data, the golden source of information.

But it’s more than just repackaging knowledge. A good tool can assist with more nuanced high-judgment decision making that you have to do as an HR VP. It won’t know your business partner better than you. But it’ll take the leg work out of getting an answer. And then you get that time back.

The big question facing many teams is: How to get started?

O’Keefe says start with small experiments that are contextually narrow—leadership, training, HR ops, compliance—instead of a building a catch-all chatbot that does a little bit of everything, but poorly. Involve your internal experts that have spent decades in the subject matter space. Coalesce together on a smart analytic and tech strategy for a tool that will life easier, gets outcomes faster, and isn’t about taking your job away.

Then you’ll have to ask yourself: What would you do differently if you had all of your institutional knowledge at your fingertips and could get answers instantly?

LIVE AT ViVE

Health systems need to increase their risk tolerance

A few quick updates from our team from the ground this week at ViVE, the prime-time digital health conference in LA:

A CIO of a major health system said we can’t afford to sit back and see how the early adopters fare with AI. It’s a 9-inning game and we’re only in inning 1 but it’s going to be a fast 9 innings. You cannot afford to sit back if you want to stay competitive.
Big companies need to change their risk posture.
Health has a culture in which failure is bad. In tech, failure is good. You want to fail fast. Health systems need to increase their tolerance for failure and start experimenting.
Health systems have a preference to build technology rather than buy it—but they can’t move fast and they don’t have the talent.
ViVE attendees got to see Billy Idol perform at the Hollywood Palladium—we heard he’s obsessed with generative AI solutions in healthcare.

WATERCOOLER

GPT-3 generated propaganda is almost as convincing as human-generated propaganda, according to this study with over 8,000 participants. It’s a little unsettling to imagine how effective bigger, more advanced models will be at propaganda. Until then, we’ll have to rely on Russian bots.

EVENTS

Join our Invite-Only AI Speakeasy @ SXSW 2024

At SXSW we're carving out a moment for reflection and connection by inviting a select circle of industry innovators—executives and VCs leading the AI revolution—for an intimate evening of rich discussions, breakthrough ideas, and meaningful networking, all centered around shaping AI's tomorrow.

We’ll have an informal fireside chat with a special guest from Google, and you’ll leave having met the fascinating people you came to SXSW to meet. Spots are limited, RSVP now to join us.

Sign Up Here

DISCOVERY ZONE

This week someone from our team sliced their finger open while cutting a bagel, instead of going to the hospital (like they probably should have), they went straight to Medic GPT. By sending a photo of the injury, this custom GPT was able to identify how it looked, if it was healing properly, and how to take care of the wound. What a world!

DEEP DIVES FROM THE ARCHIVES

MEME OF THE WEEK

‍

No items found.

Tag:

Newsletters