I'm a decent scriptwriter and a strong prompt engineer. So when we needed launch videos for InnovAItion Partners, I figured: how hard could video production be with AI doing the heavy lifting?
Very hard, it turns out.
After 20+ hours with Google's Veo 3, I produced something watchable—barely. Then my friend and go-to editor Liz Phelps, who edits television programs for networks like Bravo, TLC, MTV and Vh1, took the same AI tools and created something professional in a fraction of the time.
The difference wasn't the technology. It was the decade of expertise behind her decisions.
Meet the Editor
Liz Phelps has spent over 15 years in post-production, working on everything from reality TV to documentaries. She knows why a cut works before she makes it. She hears problems in audio that I don't even notice. When she looks at raw footage, she sees story structure.
I write scripts and engineer prompts. Useful skills, but they don't teach you why a three-frame hold sells a joke, or how to cut to music so transitions feel inevitable rather than jarring.
This expertise gap matters because AI generates raw material. It doesn't make editorial judgments. Someone still has to decide which clips tell the story and which ones distract from it.
Video 1 — Solo Build, End to End
The Plot:
A marketer burns the midnight oil on an impossible deadline, then we cut to an alternate reality where AI handles the night shift and she sleeps peacefully.
Tools used to generate video components: Veo 3 only
Edit: Me (Veo 3 → CapCut after Descript defeated me)
The generation process was brutal. Veo 3 has eight-second amnesia: Every shot requires re-specifying character, wardrobe, setting, props in explicit detail. "Watch on left wrist" sometimes became "watch on both wrists." Hair style shifted mid-scene. The model mangled any on-screen text.
Descript, which everyone calls "easy," assumed I understood concepts like sound mixing and multi-track editing. I don't. CapCut at least let me drag clips around without a film degree.
Is it watchable? Yes. Does it look like an amateur cut? Also yes.
Video 2 — Same Generator, More Sources, Pro Edit
The Plot:
The story opens with a frazzled employee getting the dreaded request from their boss: “Write the article”—on a technical topic they know nothing about. InnovAItion Partners’ AI assistant, Drafter, saves the day, transforming the empty page into a full article.
Tools used to generate video components: Veo 3 + Loom
Edit: Liz + CapCut
I still did all the generation. I used Veo 3 for people shots, but I got smarter about it. For any shots that didn’t require dialogue, I took a screengrab of my Veo 3 video, then moved to Veo 2 and used its image-to-video capability for character and scene consistency. That enabled me to generate significantly more iterations of the character as raw material.
But because the script called for content transformation, I needed a different solution. Veo 3 is point blank not capable of generating scenes with numbers or text (learned that the hard way). So instead, I used Loom to record my computer screen for a live Google search, product motion and cursor flows.
Then I handed everything to Liz. She typically operates in professional-grade tools like Adobe Premiere Pro and After Effects, but used CapCut here so we could truly deliver AI-powered videos end-to-end.
What changed immediately with a pro cutting:
- Story sense. From ~20 clips and screen passes, she built a coherent narrative and discarded everything that didn't serve it.
- Continuity cover. She used cuts and transitions to hide the AI model's inconsistencies—problems I couldn't even see, let alone solve.
- Sound. She added sound effects and music that made every transition feel intentional.
Video 3 — Bigger System, Same Division of Labor
The Plot:
The video begins with a marketer eagerly posting on a partner’s behalf on LinkedIn. He waits for likes and comments, but the post receives little to no engagement. Disappointment sets in, followed by desperation, and the marketer pings colleagues and even texts his mother to get more post engagement.
InnovAItion Partners’ AI social media assistant jumps in and helps him draft a new post that goes viral.
Tools used to generate video components: Veo 3 + Lovable + Loom + Crayo (tested) + iPhone Screen Recording + Canva
Edit: Liz + CapCut
This piece needed more than people shots:
- Veo 3 and Veo 2 for character scenes (still eight-second chunks, still managing consistency by prompt and image-to-video).
- Lovable to vibe-code a mock version of LinkedIn. I didn’t want to post a fake LinkedIn post to my real account, nor did I want to create a fake LinkedIn profile lest the LinkedIn gods ban me forever. So I made an app that mimicked the core LinkedIn functionalities I needed to record for the video: the compose view, live post, and a performance panel that changes (bad → viral). It’s a real app that you can use to make mock LinkedIn posts:
- Loom for screen recording and true product motion.
- Crayo to mock a text thread with the main character’s mom. It forced a TikTok-style background and awkward VO; we replaced it with a real phone capture (Liz temporarily renamed me "Mom" in her phone). It looks like an actual text exchange because it is.
- Canva to mock a Teams exchange using screenshots + overlays. This was very manual. The only AI element here was the transition between panels.
Time reality: My generation + rough assembly on this one was ~20 hours before Liz even touched it. Anyone waving around “two hours, done” is hiding the retries and the judgment calls.
Why the Edit Matters (And What “Edit” Actually Means)
People hear “editing” and picture someone dragging clips to a timeline. That’s the least of it.
What Liz actually did:
- Clip selection: From a large pile of disjointed video clips, she kept the ones that carry meaning and discarded the ones that look “AI-ish” in the bad way. She also sliced and diced them.
- Order and rhythm: Where to hold, where to cut, where to breathe.
- Continuity smoothing: Cuts and transitions that hide model inconsistencies without calling attention to themselves.
- Music selection: Choosing tracks that matched the emotional arc of each video. Beyond picking the track, she cut to the beat—lining up transitions with downbeats or swells so the music cues reinforce the visual rhythm.
- Sound design: Tiny cues that trick the brain into reading a scene as intentional. Those little Teams beeps convince your ear the UI is real.
How long would it take me to be able to replicate what she did at the same skill level? A decade at least, if ever. It doesn’t matter that I’m good with AI. This is an artform that requires expertise, talent, and taste.
Concrete Lessons
- Write the story before you touch a model. The script tells you what to fake, what to film, and what to skip. And yes, you can use AI for this (I did).
- Design for failure modes. Reduce continuity anchors per shot. Avoid on-screen text and numbers unless you plan to replace them.
- Split your surfaces. Use real captures (Loom/phone) for places where motion fidelity matters; use fakes (Lovable/Canva) when control and safety matter more than pixel-perfect authenticity.
- Treat Veo 3 as a talented but forgetful actor. It can deliver, but you have to re-brief every single time.
- When a shot won’t cooperate, change the ask. New angle, new distance, new action. Stop pouring time into a composition the model can’t hold.
- Sound sells the cut. Even simple room tone and a few authentic effects will improve perceived quality more than one more hour of prompt polish.
- Know when to hand off. If you can bring in a real editor, do it. Give them coverage, choices, and clean assets; let them build the piece.