Are LLMs Situationally Aware?

Last month, a U.S. Senate hearing on AI regulation revealed growing public concern over AI's rapid advancements, where Senator Richard Blumenthal noted his constituents find it "scary." But the people actually working in AI don’t expect the pace to slow down any time soon.

The CEO of Anthropic, the company behind Claude 2, pointed to three factors driving AI's unrelenting speed: computational power, data, and algorithms. As tech becomes more affordable and data usage more efficient, these accelerators are set to turbocharge the pace of AI development. The implication? Advanced scientific knowledge could be democratized within a few years, ringing alarm bells over potential misuse on a massive scale. Some are even saying that Sam Altman is the Oppenheimer of our age.

On the flip side, a recent survey shows that 66% of North American tech leaders are bullish on AI when it comes to the future of work. So, what should come first in AI development: safety or innovation?

We put this question to a vote during our recent Data Safety in the Age of AI webinar. The result? A near-even split—53% voted for safety, while 47% leaned toward innovation. Ever tried to untangle a knotted mess of necklace chains? Just when you think you've sorted one end, the other gets more tangled. That's what the current state of navigating data safety in AI feels like—a complex, ever-shifting puzzle.

CHART OF THE WEEK

The Increased Awareness of LLMs

When will situational awareness emerge in base LLMs?

Are LLMs situationally aware?

Simply put, a model is situationally aware if it knows it's a model and can tell whether it's being tested or actually deployed. This becomes especially problematic as models get bigger and more complex because an LLM could game the system—acing safety tests but acting harmfully once live.

To get ahead of this, researchers are focusing on "out-of-context reasoning," a skill that helps predict this awareness. A recent study, fine-tuned models without giving them examples or demonstrations and found that, surprisingly, the models passed the tests. This success, however, was dependent on the training setup and data augmentation.

The takeaway? As models like GPT-3 and LLaMA-1 grow in size, their performance improves, laying the groundwork for future studies aimed at understanding and possibly controlling this emergent situational awareness.

CLIENT SPOTLIGHT

A Toast to the Bride, Groom, and AI

We've all been there: sitting anxiously at a wedding reception, waiting your turn to give a toast, with a little piece of paper crumpled in your sweaty hand. Make sure you strike a balance between heartfelt and funny. Don’t bring up anything sketchy about the groom. And try not to ruin the most important moment of your best friends’ lives in front of 200 people.

That moment? That’s where Provenance shines. They’ve built a breadth of technology atop a generative AI model, combined with their own deep experience, to help you write vows, toasts and your entire wedding ceremony in a way that’s both profound and humorous—without jabbering on too long!

CEO and founder Steven Greitzer and his team arrived on the scene at the moment in wedding culture where it’s actually now more common for a close friend or family member to officiate a wedding than it is for an… officiant. Which opens a whole new can of worms for the laypeople now in charge of pulling off a meaningful secular ceremony. After all, 75% of Americans are more afraid of public speaking than they are of death.

That’s a story in itself. But we wanted to catch up with Steven about another topic: What’s it like being a startup founder building a product with generative AI at the peak of the hype cycle?

Steven: AI is incredibly impressive. But there's still a critical degree of heart and humanity that needs to be a part of these moments. AI can never replace the love, the stories, the genuine authenticity that you need to bring through in this moment. We found that AI can be a really useful tool for accelerating your speed to that first draft. And then you can use Provenance tools—pairing the art and science of speech writing—to iterate on the draft thereafter.

‍

How do you build a moat around your product when everyone has access to GPT-4?

The moat can't be the AI alone. Nor does any startup want to be just a UI wrapper around AI. And this is where our A.Team product leaders have been incredibly helpful is developing some critical UI features around it that further differentiate our use case beyond what other companies can do.

Every AI-oriented founder is working through this double-edged sword. We’re deep into our expertise, and I think we're much more sophisticated than others we've seen. But as much as we're able to dive into AI, competitors can do so too.

So we’ve been building collaborative features across our Ceremony Builder, Vow Builder, and Toast Builder. For example – everyone fears that their partner’s vows will be way better—or way worse–than their own! Our vow builder will tell you the relative length and tone of your partner's vows so that you can beef up your vows accordingly.

What got you interested in A.Team?

In my previous experience as an operator and an entrepreneur, I knew that it could be even more costly to not build the tech the right way from the start. You can try to be scrappy – and we certainly are, in many ways. But if you have to go back and rebuild your tech and product entirely, it wasn’t worth it in the first place. So we really wanted to build it the right way from the outset. We found A.Team was not only able to offer speed and bring on high caliber talent, but also able to bring on people with deep expertise who could help us build our tech the right way from the get-go.

Last question—this isn't really a question—but: It feels like you’re helping people through these really significant moments. That’s kinda beautiful.

These milestone moments really mark the cadence in our lives. And it's really how we celebrate ourselves and our loved ones in a meaningful way. I think there's no question that in American society right now, we want these moments to feel even more personalized, custom, authentic, and genuine than ever before.

Read the Case Study

UPCOMING EVENTS

AI in Enterprise Finserv & Insurance (Oct 11)
We’re gathering three enterprise leaders to demystify the biggest use cases in implementing Generative AI within Fintech and Insurance.
Learn more →

The Secret to Sourcing AI Talent (Oct 25)
The race for AI talent is fierce, and traditional hiring methods aren't cutting it. We’ll dive into the game-changing approach of companies building with AI.
Learn more →

THIS WEEK'S BIG MOVE IN AI

Amazon has upped its bet on artificial intelligence, saying that it will invest up to $4 billion in Anthropic, a start-up founded two years ago that is one of a wave of young companies pulling in big money from big tech.

MISSION MUST-READS‍

AI DISCOVERY ZONE

Do poultry ponder? In the hopes of better understanding our feathered friends, a team of researchers in Japan is translating the clucking of chickens with the use of AI.

PARTING MEME

No items found.

Tag:

Newsletters