AI-driven audio cloning startup gives voice to Einstein chatbot
For this bit of deepfakery, you’ll have to pinch your ears: A digital Albert Einstein — with an artificial voice that has been (re)created by AI voice cloning technology drawing from audio recordings of the famous…
This bit of deepfakery is coming from the bizarre world of synthesized media. It’s a digital Albert Einstein, with a synthesized vocal. This was created using AI voice cloning technology based on audio recordings of Einstein’s actual voice.
Aflorithmic is the startup behind Einstein’s “uncanny Valley” audio deepfake. We covered it back in February.
The video engine that powers the “digital human” Einstein version is powered by UneeQ, a synthesized media company. It hosts the interactive chatbot on its website.
Alforithmic claims that the “digital Einstein”, is intended to be a demonstration of what conversational social commerce can do. This is a fancy way to say that deepfakes who look like historical figures will likely be trying to sell pizza soon enough, as industry watchers have warned .
Startup also claims it sees educational value in bringing long-deceased celebrities to interactive “life”.
Or, at least, an artificial approximation. The “life” is purely virtual, and Digital Einstein’s voice is not a pure tech-powered copy. Alforithmic claims it also worked with an actor voice modeling for the chatbot (because it was the only way to get Digital Einstein to speak words that the real-deal wouldn’t even dream of — like “blockchain “?).). There’s more to this than AI artifice.
Alforithmic’s CEO Matt Lehmann said that “This is the next milestone” in showcasing technology to make conversational and social commerce possible. There are many technical challenges to be overcome and flaws to be fixed, but we believe this is a great way to showcase where things are heading.
The startup discusses how it recreated Einstein’s voice in a blog. It says it was able reduce the time it takes to turn around input text from its computational knowledge engine to its API being capable of rendering a voiced reply, from 12 to under three seconds (which it calls “near-realtime”). It’s still enough to make the bot a little tedious.