ChatGPT Is Talking and Texting. What Does That Mean for CRE?

New features but old problems are probably still at hand.

Choosing the time to adopt technology can be tricky. Users want the most advantageous features, but smart purchasing usually means waiting for a while to see what problems arise.

OpenAI, the company behind ChatGPT, released the new GPT-4o version, which the company claims can “reason across audio, vision, and text in real time.” They further call it “a step towards much more natural human-computer interaction—it accepts as input any combination of text, audio, image, and video and generates any combination of text, audio, and image outputs. It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation. It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API. GPT-4o is especially better at vision and audio understanding compared to existing models.

First, take a step back to think clearly, because there is a lot of rhetorical technique in play to shape how an audience thinks and responds, otherwise known as subtle selling.

Software, even generative artificial intelligence, doesn’t reason. Not to dismiss what they do accomplish with highly complex statistical systems that can match inputs to massive databases of linguistic connections and formulate answers. “A step towards” improved human-computer interaction is not an arrival at perfect systems. “As little as” in timing can mean a lot more as well. Average responses that are “similar to human response time” means there is a distribution. If it was often much faster, chances are someone would mention median time because the slower responses that pull down averages would be deemphasized. And “especially better at vision and audio understanding compared to existing models” means next to nothing because the reader doesn’t know the baseline or degree of improvement.

Demos are always created to look as good as possible. Still, having a conversation with an automated voice that sounds pretty human is potentially impressive. That would seem like a tool that would work in many CRE situations. The immediate response might be to have persuasive computerized systems that can handle customers via voice, text, or emails. But the desire to meld generative AI with voice assistant capabilities still leaves “many hurdles,” as the New York Times writes. They’re inclined to make with the phenomenon called “hallucination.”

“Those flaws are migrating into voice assistants,” the Times noted. “While chatbots can generate convincing language, they are less adept at taking actions like scheduling a meeting or booking a plane flight.”

Maybe things will improve, but they need to on a deep level. Wait to see what happens over time. Sparkly technology can be fun and attractive, but in a business setting, you can’t take a chance on an application that could cost a lot.