In this episode of AI Unveiled, I speak with Dominik Mate Kovacs, founder and CEO of Colossyan, an AI video platform for workplace learning that generates life-like avatars.
Before starting Colossyan, Dominik was working at the intersection of AI and video, as the Co-Founder of Defudger, a visual content authentication platform for the insurance industry. So we start this episode by discussing his background and why he then chose to build technology to create synthetic videos.
Their customers use Colossyan for formal learning, sales enablement, onboarding, and other internal communications. He discusses two direct benefits: lower costs than studio-produced video and higher engagement than text and screen-recorded videos. This enables more localization and personalization and manages content chaos. Synthetic videos can be edited programmatically at a later time, so instead of creating duplicates, outdated videos are updated.
I asked Dominik about the end user’s perception, how close we are to passing the Turing test, and what the implications may be. I also asked Dominik to compare what Colossyan does to general text-to-video services like Runway and PikaLabs.
We then get into the nitty-gritty of the models that power life-like avatars and how one can create a personalized avatar. They just released the ability to do this over a webcam that is available to everyone.
This was a fun episode discussing an impressive technology, with many powerful use cases, but also many questions about deepfakes and ethics.
I hope you enjoy this episode.
Timestamps
* 2:36 – The journey to generating personalized avatars
* 6:59 – The future of workplace training
* 12:55 – Evaluating a human’s perception of avatars
* 19:05 – How to build a lifelike avatar
* 23:09 – Measuring accuracy in the uncanny valley
* 27:19 – Innovating AI with Product and Engineering
Highlighted Excerpts
GAURAV: What do you think workplace training will look like and how it will evolve in the next five to ten years?
DOMINIK: Customizability and personalization are huge factors, not just in marketing, but also in communication. So if you are creating content personalized for an audience or a team that just scales the complexity of your initiative.
If you are building an easy solution for the creation of complex videos in our case—that just falls in line with the high-level industry trends, which is all about personalizing the experience because it also helps in the engagement rate, which is directly in correlation with human revenue. So that’s incredibly important.
Additionally scalability in terms of localization. So the fact that we support more than 120 languages is a big selling point to all these large firms because they want to create content in their [native language]. Departments in specific countries have much better loyalty towards the employer if they hear native content, which is proven by data.
…A final factor that’s worth mentioning that is trending is more of an on-demand experience. So currently, if you were creating content, it’s a very complex process…But by providing easy software you can iterate on an existing piece of
DOMINIK: Content chaos is definitely a big problem within our space. So there is simply too much content created with, solutions that exist today, and that’s in line with what we are trying to solve. For example, you imagine that you create some kind of tutorial about an onboarding process and six months later it gets outdated. That content will live somewhere in your system and you will have to either delete it or people will be watching, but it will be outdated.
This is what content chaos is about. But if you can easily update your videos in our case and update that automatically across all your systems, then this content chaos effect is reduced.
GAURAV: How do organizations perceive this technology? How do the end users perceive avatars? They probably know that it’s a synthetic avatar versus a real-life human. What learnings do you have in terms of human perception of your technology?
DOMINIK: For external use cases, external training, or communication, they try to hide the fact that it’s so-called synthetic. Most of the companies claim that this is AI-generated or these are avatars. In some cases, they don’t and surprisingly the audience members believe these are real people because the quality is already so high for these use cases.
There is a mix there that I see on the market, but overall people appreciate it because compared to the content they received previously, it’s much more engaging. Also, we are making the product in a way that you can create different scenes which are different from one another.
GAURAV: Can you talk a little bit about what I need to do if I want my own avatar?
DOMINIK: It’s a fairly straightforward process. We need five minutes of recording from a person where they are reading English tongue twisters. The reason for that is because then we can understand their facial expressions and how they make specific sounds. And then we can use that for training their avatar on the recording data and then by an inference step if you type in new data, which is like new text in our case, then we can move the facial expressions accordingly.
We are researching heavily into other sorts of different architecture-based technologies, which will allow even more customization on the facial regions. I think in the future, people should be able to generate their avatar and how it looks.
So companies could create their own branded characters, which would look 100 percent realistic. In that case, my statement wouldn’t be true that you, you know, we, that we work with real content because then it wouldn’t be fully synthetic. But with the same results as today. So that is the future.