Just like that, we’re facing the imminent normalization of conversing with AI chatbots by talking out loud.
OpenAI today demoed a new AI model called GPT-4o and voice assistant that sounds human, reads your facial expressions and even gracefully handles interruptions (it stops talking then responds earnestly to your interruption comments — already superior to humans).
The tech press — taking its cue from OpenAI CEO Sam Altman, who tweeted the monosyllabic “her” — is implying that we all get to fall in love with AI Scarlett Johansson and that ChatGPT is now like that Joaquin Phoenix movie. Yeah, no, not really.
Speaking of repressed, lonely nerds, Google engineers tomorrow will introduce (among other AI things) their new multimodal voice assistant, which can converse about a live video taken by the phone about that video and the contents therein. They teased the announcement today in a video posted on Twitter.
All this casual chit-chat with AI reminds me of the Pi chatbot, which used to lead the natural-sounding audible conversation pack. I told you about Pi in this space and also said: This is the future of AI glasses, which this week’s announcement by major players bears out.
By the way, the Ray-Ban Meta glasses product was the Flavor of the Month ten minutes ago. Suddenly, chatting with Meta AI through the sunglasses seems slow, stilted and dated.
Three points stand out:
The OpenAI, Pi and Google chatbot voices are intensely Californian. Silicon Valley needs now to work on voices that don’t sound like Scarlett Johansson, or valley girls, or vocal-fry sorority girls. 99% of the world’s population doesn’t talk like that.
These demos are happening on smartphones, but the tech clearly belongs on glasses.
And, finally, it’s obvious that talking to a natural-sounding AI is going to eat the world, as an emergent human behavior.
Don’t believe me? Consider:
In the 90s, people felt self-conscious and embarrassed about talking on cell phones in public. It felt weird for a while. Until it didn’t. Talking on a phone anywhere, anytime became perfectly normal.
Once that became accepted, Bluetooth headsets emerged. While holding an unsightly Nokia 5110 against your head signaled “I’m talking on the phone now,” using a Bluetooth headset made wireless conversationalists indistinguishable from insane homeless drug addicts muttering to the voices in their heads. Over time, using a Bluetooth device in public to talk on your phone became normalized. So did insane homeless drug addicts.
In the early 2000s, camera phones got better, and for years people felt uncomfortable taking pictures with their phones — until everyone got used to it.
Then, after the iPhone came out (the iPhone shipped in 2007), people started taking a lot of selfies. The selfie-takers were mocked, criticized and ostracized, but everyone got used to that behavior, too.
Eventually, some brave pioneers started taking pictures in public with tablets. It was wrong and unacceptable then, and it will always be wrong and unacceptable.
In the past ten years, extreme posing, duck-faces, public twerking, emotional performances (crying for the camera and other stupid bullshit) and all manner of shameless pandering to an unseen audience and to-hell-with-the-people-around-me-physically became commonplace and normalized, though still mocked to hilarious effect by the Influencers in the Wild guy.
Point is: People initially think some new mobile device-enabled behavior is weird and bizarre, and then it becomes normal and accepted. It’s hard to believe now, but when the iPhone came out, the majority consensus was that nobody would use a phone without physical keys like the Blackberry had. Everybody hated on-screen keyboards. Now it’s the only acceptable option. “Normal” changes.
Shameless Self-Promotion
A glimpse at the powerful future of information
Why you’ll soon have a digital clone of your own
Researchers develop malicious AI ‘worm’ targeting generative AI systems
Read ELGAN.COM for more!
Love food and travel? Check out our Gastronomad blog!
My Location: Sitges, Spain
(Why Mike is always traveling.)