AI Can Now See, Hear, And Speak
If you’ve not already heard, ChatGPT has received a major upgrade. The makers at OpenAI have given their bot some pretty major capabilities – it can now see, hear, and speak. It’s not C3PO but it’s a good start.
We’ve taken the features for a test drive and, let’s just say, ChatGPT is now several steps closer to becoming a very real digital assistant that can understand and interact with the world like a human.
These capabilities offer a glimpse into a mind blowing AI-powered future. But (there’s always a but), an AI-powered future raises some important questions around ethics, privacy, and the potential societal impact of advancing artificial intelligence with little to no regulation. (as of July 2024).
Seeing The World Through AI Eyes
ChatGPT’s image recognition tool allows users to upload a photo and ask ChatGPT to describe or make sense of what it sees. You could snap a picture of your Dad’s messy garage and have it generate a list of items to sell or donate (Hi Dad). Or take a photo of the content of your fridge and request recipe ideas.
Having tested this feature out in multiple settings it would be fair to say that ChatGPT performed surprisingly well at comprehending and describing photos, though it still makes some obvious mistakes. Using the photo feature it’s also capable of reading text within images, summarising articles, and even draft social media posts or classifieds based on photographed objects.
It’s worth mentioning that ChatGPT cannot analyse human faces or anything it considers to be private information. This can only be a good thing! And complex visual puzzles like Ikea furniture guides still confuse ChatGPT. We feel your pain AI.
The possibilities are endless but it’s early days and we’re certain there are far greater use cases than sorting out Dad’s garage. It will be incredibly interesting to watch how users make the most of the tool. We will also have to see how OpenAI manages restrictions to prevent facial recognition and other violations of privacy.
Testing, Testing, is this thing on?
Even more intriguing is ChatGPT’s new voice capabilities. Using the mobile app, you can now speak to the AI assistant and get audible responses in one of several natural-sounding synthesised voices. The voices feel authentic and mostly natural. They’re certainly a long way away from the muted tones of Siri and more in keeping with the awesome realness of Eleven Labs.
Our early tests with this feature have been interesting. Without the barrier of written words, it’s far easier to open up more and have very real, free-flowing conversations on both simple and complex matters. It also told my nephew a fabulous story about a walking, talking character called Mr Bum Bum Head. Lovely.
The voice feature may make ChatGPT feel like a supportive therapist, tutor, banter heavy mate or even companion to some lonely users. Whether be this good or bad is up in the air but it’s certainly opened up some interesting opportunities.
The seamless speech abilities give further weight to the idea that we are headed toward artificial general intelligence. Once algorithms are able to fully comprehend and respond to images, sounds, and language like a human, the possibilities get much more unsettling or interesting, depending on your camp.
SearchGPT: A Google Alternative?
In a recent development, OpenAI announced SearchGPT, a prototype of new AI search features. This is designed to combine the strength of their AI models with information from the web to provide fast and timely answers with clear and relevant sources. It’s long been rumoured that OpenAI would be creating a competitor for Google – now we know.
SearchGPT aims to enhance the search experience by:
- Directly responding to questions with up-to-date information
- Allowing follow-up questions in a conversational manner
- Providing clear links to relevant sources
- Partnering with publishers to ensure proper attribution and discovery of high-quality content
While still in its testing phase, SearchGPT represents a significant step towards integrating AI more deeply into our daily search activity.
It will certainly be an interesting one to watch, especially as it relates to Google and SEO.
The AI Revolution: A Defining Moment for Humanity
In a recent opinion piece for The Washington Post, Sam Altman, co-founder and CEO of OpenAI, believes that we are at a crucial juncture in the development of AI (at July 2024). In his words, “The rapid progress being made on artificial intelligence means that we face a strategic choice about what kind of world we are going to live in.”
Altman presents two potential futures:
- A world where the United States and allied nations advance a global AI that spreads the technology’s benefits and opens access to it.
- An authoritarian world, where nations or movements that don’t share democratic values use AI to cement and expand their power.
He argues that there is no third option, and the time to decide which path to take is now.
As with everything in the AI space, there is a lot of uncertainty and misunderstanding. This certainly isn’t helped by the lack of regulation and the news / gossip circling around OpenAI HQ.
Should We Be Afraid or Excited?
So…. do OpenAI and ChatGPT’s latest updates bring us closer to a utopian world of digital assistants? Or are we heading to hell in a handbasket? That depends on who you ask.
As with any rapidly advancing tech, there are reasonable concerns around how AI like ChatGPT could be misused. Its voice mimicry skills could potentially enable scams, fake news, and political manipulation. The image comprehension features could empower surveillance and invasive data collection if not carefully managed.
The AI team at Start believe there are many positive applications that outweigh the risks. AI assistants with multimodal understanding could help people in countless ways each day. The technology just needs governance to guide it forward in a thoughtful and responsible way.
Altman suggests the need for a U.S. led group of like-minded countries and an innovative strategy to ensure that AI benefits the most people possible. This includes tight security measures, infrastructure development, workforce training, and the introduction of global standards for AI use.
Whether we like it or not, the AI revolution is accelerating. Fast. As users, it’s our job to steer it in an ethical direction that benefits as many people as possible. ChatGPT may not be perfect yet, but it provides a glimpse of what we could all be interacting with soon – a future where AI not only understands and responds to our queries but also sees, hears, speaks, and searches the vast expanse of human knowledge on our behalf. The question is: are we ready to shape this future responsibly?
Last updated July 2024