AI Can Now See, Hear, And Speak
If you’ve not already heard, ChatGPT has received a major upgrade. The makers at OpenAI have given their bot some pretty major capabilities – it can now see, hear, and speak. It’s not C3PO but it’s a good start.
We’ve taken the new features for a test drive and, let’s just say, ChatGPT is now several steps closer to becoming a very real digital assistant that can understand and interact with the world like a human.
While still effectively in beta testing, these new capabilities offer a glimpse into a mindblowing AI-powered future. But (there’s always a but), an AI-powered future raises some important questions around ethics, privacy, and the potential societal impact of advancing artificial intelligence with little to no regulation. (as of November 2023).
Seeing The World Through AI Eyes
ChatGPT’s new image recognition tool allows users to upload a photo and ask ChatGPT to describe or make sense of what it sees. You could snap a picture of your Dad’s messy garage and have it generate a list of items to sell or donate (Hi Dad). Or take a photo of the content of your fridge and request recipe ideas.
Having tested this feature out in multiple settings it would be fair to say that ChatGPT performed surprisingly well at comprehending and describing photos, though it still makes some obvious mistakes. Using the photo feature it’s also capable of reading text within images, summarising articles, and even draft social media posts or classifieds based on photographed objects.
It’s worth mentioning that ChatGPT cannot analyse human faces or anything it considers to be private information. This can only be a good thing! And complex visual puzzles like Ikea furniture guides still confuse ChatGPT. We feel your pain AI.
The possibilities are endless but it’s early days and we’re certain there are far greater use cases than sorting out Dad’s garage. It will be incredibly interesting to watch how users make the most of the tool. We will also have to see how OpenAI manages restrictions to prevent facial recognition and other violations of privacy.
Testing, Testing, is this thing on?
Even more intriguing is ChatGPT’s new voice capabilities. Using the mobile app, you can now speak to the AI assistant and get audible responses in one of several natural-sounding synthesised voices. The voices feel authentic and mostly natural. They’re certainly a long way away from the muted tones of Siri and more in keeping with the awesome realness of Eleven Labs.
Our early tests with this feature have been interesting. Without the barrier of written words, it’s far easier to open up more and have very real, free-flowing conversations on both simple and complex matters. It also told my nephew a fabulous story about a walking, talking character called Mr Bum Bum Head. Lovely.
The voice feature may make ChatGPT feel like a supportive therapist, tutor, banter heavy mate or even companion to some lonely users. Whether be this good or bad is up in the air but it’s certainly opened up some interesting opportunities.
The seamless speech abilities give further weight to the idea that we are headed toward artificial general intelligence. Once algorithms are able to fully comprehend and respond to images, sounds, and language like a human, the possibilities get much more unsettling or interesting, depending on your camp.
Should We Be Afraid or Excited?
So…. do ChatGPT’s latest updates bring us closer to a utopian world of digital assistants? Or are we heading to hell in a handbasket? That depends on who you ask.
As with any rapidly advancing tech, there are reasonable concerns around how AI like ChatGPT could be misused. Its voice mimicry skills could potentially enable scams, fake news, and political manipulation. The image comprehension features could empower surveillance and invasive data collection if not carefully managed.
At Start, we believe there are many positive applications that outweigh the risks. AI assistants with multimodal understanding could help people in countless ways each day. The technology just needs governance to guide it forward in a thoughtful and responsible way.
Time will tell if ChatGPT fulfills its promise to make life simpler, safer, and better for humanity. The developers at OpenAI appear to be taking care to limit potential harms, but pressure from users and governments will help guarantee that.
Whether we like it or not, the AI revolution is accelerating. Fast. As users, it’s our job to steer it in an ethical direction that benefits as many people as possible. ChatGPT may not be perfect yet, but it provides a glimpse of what we could all be interacting with soon.