About this episode
Welcome to episode 303 of The Cloud Pod – where the forecast is always cloudy! Justin, Ryan and exhausted dad Matt are here (and mostly awake) ready to bring the latest in cloud news! This week we’ve got more news from Nova, updates to Claude, earnings news, and a mini funeral for Skype – plus a new helping of Cloud Journey!
Titles we almost went with this week:
Claude researches so Ryan can nap
The best AI for Nova Corps, Amazon Nova Premiere JB
If you can’t beat them, change the licensing terms and make them fork, and then
reverse course… and profit
Q has invaded your IDE!!
Skype bites the dust
A big thanks to this week’s sponsor:
We’re sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You’ve come to the right place! Send us an email or hit us up on our Slack channel for more info.
Follow Up
02:50 Sycophancy in GPT-4o: What happened and what we’re doing about it
OpenAI wrote up a blog post about their sycophantic Chat GPT 4o upgrade last week, and they wanted to set the record straight.
They made adjustments at improving the models default personality to make it feel more intuitive and effective across a variety of tasks.
When shaping model behavior, they start with a baseline principle and instructions outlined in their model spec.
They also teach their models how to apply these principles by incorporating user signals like thumbs up and thumbs down feedback on responses.
In this update, though, they focused too much on short-term feedback and did not fully account for how users’ interactions with ChatGPT evolve. This skewed the results towards responses that were overly supportive – but disingenuous.
Beyond rolling back the changes, they are taking steps to realign the model behavior, including refining core training techniques and system prompts to explicitly steer the model away from sycophancy.
They also plan to build more guardrails to increase honesty and transparency principles in the model spec.
Additionally, they plan to expand ways for users to test