Multimodal AI video glasses get closer

Meta rolled out Llama 3.1 today. Here's why that means multimodal AI video glasses are just around the corner.

Jul 23, 2024

At OpenAI’s “OpenAI Spring Update” on May 13, the company dazzled attendees with GPT-4o and its ability to do multimodal input with video as one of the inputs. (Instead of just typing a prompt, as with the original ChatGPT, the input with GPT-4o can include text, audio, pictures and streaming video.)

The next day at “Google I/O 2024,” Google freaked everyone out with a demo of Project Astra, which is capable of multimodal AI through glasses. (I speculated in this space that the Astra team is using the same prototype hardware as seen in another impressive demo two years ago for Google’s now defunct translation glasses.)

The reason glasses are perfect for multimodal AI with video input is that the glasses with a camera can see what you’re seeing (and remember what you saw recently), and you can have rich conversations with the chatbot about whatever you’re both looking at.

Demos by OpenAI and Google were nice. But now, a super powerful AI capable of multi-modal input including video has been released by Meta, which maximizes the probability that we’ll see it in glasses soon.

What Meta announced today

Meta rolled out Llama 3.1 today. The most powerful version of the new model boasts 405 billion parameters. (Compare that to the original publically released ChatGPT, which had 20 billion parameters.) Like GPT-4o, Llama 3.1 405B supports video input, along with text, audio and images.

Unfortunately, Meta isn’t rolling out the video feature quite yet. From Meta’s documentation:

“As part of the Llama 3 development process we also develop multimodal extensions to the models, enabling image recognition, video recognition, and speech understanding capabilities. These models are still under active development and not yet ready for release.”

Obviously, Meta AI is part of the now-multimodal Ray-Ban Meta Glasses (although for that product video is not one of the modes — just still images). But the really interesting part is that Llama 3.1 is “open source.” Sure, it’s not really open source, but it is available to other companies. In fact, They added Llama 3.1 to Perplexity Pro just hours after its release.

Read the rest

Shameless Self-Promotion

AI chatbots are people, too. (Except they’re not.)
Get ready for Olympic-size threats during the Paris games
The promise and peril of ‘agentic AI’
More from Elgan Media!

My Location: Silicon Valley, California

(Why Mike is always traveling.)

Multimodal AI video glasses get closer

Meta rolled out Llama 3.1 today. Here's why that means multimodal AI video glasses are just around the corner.

What Meta announced today

Shameless Self-Promotion

My Location: Silicon Valley, California

Discussion about this post