Over the last few months, a number of AI-based or AI-enhanced consumer-grade services have come out that really show how much artificial intelligence (or more specifically, machine learning) has to contribute if it’s deployed right.
They’re kinda magic, and also kinda banal in that way that make it obvious that yes, this is going to be a part of our mental toolkit going forward.
3 services in particular stand out to me. All these examples are in the larger realm of media, or more specifically, synthetic media. They’re a glimpse at the future, only they’re already here and available cheaply to anyone who wants to play with them.
If you haven’t tried them, I strongly encourage you to take them for a spin.
In order of ascending (subjective!) mind-blowing potential:
Deepl is a translation tool based on machine learning. It’s been around for some time and has a free tier. Nothing translates as quickly and reliably as Deepl. I know folks who constantly have it open in a window; who drop whole Word docs in (and get formatted Word docs back out); who when they’re too tired to read in a second or third language translate whole academic papers to read in their native language.
With Synthesia, you can generate videos of talking heads based on text input: You type it, they say it. Only they’re AI generated. (Or maybe they were real actors and just their facial movements are AI generated, I’m not sure.) Within minutes, you can have a fully AI-generated video of a very real looking face saying your words.
Descript is… an audio and video editing tool? I think that’s correct but it doesn’t convey the magic going on here. In Descript you can record or edit audio and video files. Descript then transcribes them fairly reliably and very quickly, in nearly real-time.
And then… let’s you edit the audio/video via the text editor in the transcript.
You cut out a paragraph or move it somewhere else in the text editor, and the audio/video file will be cut accordingly. Oh, within a couple of clicks you can also remove filler words like uhms and errrms and coughs, and to a degree, you can even “overdub”, meaning you can type a few new words and it’ll generate them in your voice.
This is an incredibly empowering tool. How much so is hard to convey without experiencing it first hand.
And yes, you can generate videos in Synthesia and then edit them in Descript. You could probably daisy-chain all of them together and go straight from a German text file into an English AI-generated video transcribed and cut via text editor and turned into a video podcast, or something.
What a time to be alive.