Why this is even possible now#
Two years ago, the claim in the headline would have been a lie. The video quality was bad, the AI voices sounded like bored announcers, and the captioning tools needed manual cleanup.
That changed some time in 2025. Between ElevenLabs v3, decent stock footage libraries, and Descript's transcript-based editing, the pipeline from written content to finished video clip is now genuinely 30 minutes of attention per piece, if your source material is already good.
This is the workflow I use for my own client content and for the weekly update videos on aitoolradar.io. It is not the best possible workflow. It is the one I actually stick with.
Before you start: what the source needs#
The whole pipeline falls apart if the blog post is weak. I learned this the hard way. A bad article becomes a bad video, and no amount of fancy editing hides it.
My minimum bar before I repurpose a post:
- At least 1.200 words of real substance
- Clear section headings that map to visual moments
- At least one concrete example or number in each section
- A specific reader in mind, not a generic audience
If the post fails any of those, I rewrite before I repurpose. Do not skip this.
Step 1 (5 min): Strip the blog into a script#
I open the blog post and strip it down to the parts that work spoken aloud. Written articles have a lot of connective tissue that sounds strange when read. Phrases like "as mentioned above" and "we will cover this below" disappear.
I also break every sentence over 20 words. Written prose tolerates long sentences. Spoken narration does not.
Claude does about 70 percent of this work. My prompt is roughly:
Take the following blog post and rewrite it as a spoken narration. Keep the structure and facts identical, but shorten sentences, remove references to "this article" or "below", and make it sound like one person talking to one other person. Do not add any new content.
I then read the output once and manually cut anything that still sounds written.
Step 2 (3 min): Generate the voice in ElevenLabs#
I use my own cloned voice on ElevenLabs, which I set up once months ago. If you do not want to clone your voice, their default voices are good enough that most viewers will not notice.
The key setting is stability. I run mine around 35 percent for the main narration. Too low and the voice gets weird; too high and it sounds flat. Test on a 30-second sample before you render the full script.
For a 1.500-word script, render time is maybe two minutes. I export as MP3 at 128 kbps, which is overkill for YouTube but avoids any conversion headaches later.
Step 3 (10 min): Assemble visuals in Descript#
This is where I used to spend three hours. Now it is 10 minutes because of one change: I stopped trying to produce bespoke B-roll.
My current rule: stock footage plus screen recordings plus text overlays, nothing custom. Descript handles all of this in one timeline. I import the MP3 from ElevenLabs, paste the script as captions, and Descript auto-aligns them.
For visuals, I use a mix of:
- Stock clips from Descript's built-in library (Storyblocks)
- Screenshots of the tools I mention, recorded at 1920x1080
- Text overlays for numbers and key phrases
- The occasional branded title card
The rule I enforce on myself: one visual change every 5-8 seconds. Static visuals on YouTube kill retention within 30 seconds.
For the full feature set, see the Descript guide. I still use it on the Pro plan, which is enough for weekly videos.
Step 4 (5 min): Auto-edit with Descript's AI features#
This is where the time savings compound. Descript's filler word removal, magic edit, and auto-chapters run in parallel in maybe two minutes. I manually review the cuts (the AI occasionally removes words it should not), but 90 percent of the edits are fine.
I also run Descript's auto-captioning, which is more accurate than YouTube's native captions. I export the SRT file to upload separately.
Step 5 (3 min): Short-form clips with Opus Clip#
If the full video is longer than three minutes, I also feed it through Opus Clip to generate short clips for TikTok, Instagram Reels and YouTube Shorts. Opus picks the moments and adds vertical captions automatically.
The output is not perfect. I reject about 40 percent of the suggested clips. But the 60 percent I keep would have taken me an hour each to produce manually.
Step 6 (4 min): Upload and schedule#
YouTube, Shorts, and any other platforms. I use YouTube's native scheduling, so this is just uploading the MP4 plus the SRT, writing a description that includes one link back to the blog post, and picking a thumbnail.
For thumbnails, I stopped using AI generation. The click-through rates were bad. I use Canva with a simple template: face on the left, 3-5 word headline on the right, one colour accent. Nothing fancy.
What this replaces#
Before this workflow, my "blog to video" process was either:
- Ignoring video entirely (which I did for years)
- Recording myself on camera for 45 minutes, editing for three hours, hating the result
The 30-minute pipeline is not as personal as a proper talking-head video. But I publish consistently, which matters more than any single video being great.
Where it falls apart#
I want to be honest about the limits. This workflow does not work for:
- Tutorials that need actual screen recording with live interaction
- Content where my face on camera is doing persuasion work (sales pages, trust-heavy pitches)
- Anything under 60 seconds where the entire game is the hook (pure TikTok content)
For those, I still record manually. The AI-assisted pipeline is for informational content where the information is the value, not the delivery.
Total cost#
Monthly, for this workflow:
- Claude Pro: 20 $
- ElevenLabs Creator: 22 $
- Descript Creator: 15 $
- Opus Clip Starter: 20 $
- Canva Pro (for thumbnails): 12,99 $
Total: roughly 90 $ per month. I produce four to six videos per month, so the per-video tool cost is 15-22 $. Cheaper than hiring any editor, obviously, and the cost scales to zero if I stop publishing.
The real lesson#
The videos are not as good as what a professional video team would produce. But the alternative is not professional video. The alternative is no video, because the time cost was too high to justify. This pipeline turns a binary (video or no video) into a gradient (good enough video, consistently).
For most content creators, consistent good-enough beats occasional great. The AI pipeline is what makes consistent good-enough affordable.
