From Zero to Cinema: How I Actually Made a Short Film with AI Tools in 2026
Introduction
I made a 7-minute short film in six days. I have never been to film school. I own a single camera (a Sony a6400 I barely know how to use) and I did not touch it once during this project. Every frame, every line of dialogue, every note of the score came from AI tools. The short is called Static Glass and it is about a woman who discovers she can hear conversations from parallel timelines through old CRT televisions. It is not perfect. Parts of it are genuinely janky. But people who watched it did not guess it was made by one person with no crew and no budget.
Here is what I used, what worked, what broke, and what I wish someone had told me before I started.
The Tools I Actually Used
I tried more than a dozen tools over two months of experimenting. Five of them made it into the final workflow. Everything else was either too expensive, too inconsistent, or produced output I could not control well enough to tell a coherent story.
- Shot generation and scene assembly: LTX Studio
- Score and sound design: Suno
- Dialogue and voice acting: ElevenLabs
- Character design and reference images: Midjourney
- Final assembly, color grade, subtitles: Veed.io
Step 1: Getting the Character to Look Like the Same Person
This was the hardest part of the entire project. Not the script. Not the editing. Just getting my protagonist to look consistent across 23 different shots.
I used Midjourney for character design. The --cref (character reference) feature is supposed to lock a character's appearance across generations. In practice, I generated 47 images before I got 5 that looked like the same person in different lighting. Here is what I learned.
Upload a reference photo of a real person or a previous generation you are happy with. Use --cref with that image URL. Then use --cw (character weight) set to 100 for face shape and hair, or drop it to 50 if you just want general build and clothing style to carry through. I settled on --cw 80 for most shots and it was the sweet spot for my protagonist.
The bigger problem was outfits. midjourney kept changing her jacket color between shots. The fix: I generated a separate "costume reference" image, a full-body shot of her in the exact clothes I wanted, and used both --cref (for the face) and --sref (style reference) pointing to that costume image. Even then, maybe 30% of the outputs were usable. I threw away the rest.
I also built a small 12-page "lookbook" in Figma with the 5 approved character shots, the color palette I wanted (mostly desaturated blues and warm amber practicals), and reference frames from movies I was stealing the look from (heavily Blade Runner 2049 for the interiors, Her for the close-ups). I pasted these into ltx studio later as visual anchors.
This took a full day and it was worth it. Every shortcut I took here bit me later.
Step 2: Turning a Script Into Shots with ltx studio
I wrote a 3-act script in Google Docs. Nine scenes. Two characters, mostly dialogue, one chase sequence. I kept it small on purpose. Every additional character or location is another thing the AI can get wrong.
LTX Studio is not a prompt-to-video toy. It works more like a project management tool for video. You import your script as text, upload your character references and lookbook frames, and it builds a storyboard automatically. Each scene gets broken into numbered shots with camera positions, character placements, and basic blocking.
The automatic storyboard was maybe 60% usable on the first pass. I had to manually rework 12 of the 23 shots. LTX gives you a director mode where you type instructions for each shot: "track the protagonist as she walks from the kitchen to the living room, keep the camera at eye level, key light from the window on the left, warm tungsten." When it worked, it was fast. I got a 15-second tracking shot that genuinely impressed me, with consistent lighting across the entire move and no morphing artifacts on the character's face.
When it did not work: hands. Still bad. One shot had her hand passing through a doorframe. Another had six fingers for three frames. I fixed these by trimming the offending frames in Veed later, or by generating alternate takes in LTX and cutting between them.
The chase sequence was the worst. Running is still hard for AI video. Her legs would blur and occasionally swap positions. I ended up shooting the chase mostly in tight close-ups (face, feet, hands gripping a railing) and cutting fast. Old filmmaker tricks still work when the tool cannot deliver a clean wide.
I spent about 3 days in LTX generating, re-generating, and arranging shots. Total cost was around $40 in compute credits.
Step 3: Music That Actually Fit the Scene
I have used Suno before for generating songs for fun. Using it to score a film is a different thing entirely.
I started by generating a main theme. I described the mood in the prompt: "melancholic synthwave with warm analog pads, slow build, no drums for the first 30 seconds, Blade Runner soundtrack influence." I generated 8 versions and picked the one that felt right.
Then I used suno's Extend feature to create scene-specific variations. For the dialogue-heavy kitchen scene I extended from the 0:15 mark of the main theme and stripped the melody back to just pads and a single bass note. For the chase sequence I extended from the 0:45 mark and added percussion. The transitions between these cues are not seamless, but I smoothed them with crossfades in Veed and it works.
suno now lets you export stems in 2026: drums, bass, melody, vocals as separate WAV files. This matters because the default mix often buries dialogue. I brought the stems into Veed, dropped the melody stem by 6dB under dialogue, and pushed it back up during the silent montage at the end.
The one thing I could not get right: a sting. You know, that sharp orchestral hit when the character discovers something shocking. suno is not great at isolated one-second cues. I ended up pulling a sting from a royalty-free library instead. Not everything needs to be AI-generated.
Step 4: Dialogue That Does Not Sound Like a Robot
ElevenLabs is the part of this stack that most impressed me, and also the part that required the most human effort per second of output.
The basic text-to-speech is fine. You paste a line, pick a voice, it reads it. But "fine" dialogue kills a film. Flat line readings make your audience check their phones.
The feature that changed everything is Speech-to-Speech. You record yourself performing the line into a microphone. You do not need to be a good actor. What matters is the timing, the pauses, the breath, the stress on particular words. elevenlabs takes that performance and applies the target character's voice to it. I recorded every line of dialogue myself on a USB microphone in my apartment, doing maybe three takes per line. Some sessions I was whisper-yelling at 1 AM so I would not wake my neighbors.
My film has two characters. For the protagonist I used elevenlabs' "Adelaide" voice. For the antagonist (who appears only as a voice through a television) I used a custom voice I cloned from a 3-minute sample of my friend doing a low, gravelly register. The clone quality is good. Not perfect. There is a slight digital grain in some words, especially sibilants. I masked it partially by running the antagonist's dialogue through a radio static filter in Veed, which the story justified anyway.
The SFX generation is newer in elevenlabs. It can generate sound effects from text prompts: "footsteps on wet concrete," "glass breaking in an empty warehouse," "old TV static hum." I used it for about 60% of my Foley. The footsteps were good. The glass breaking was passable. The static hum was exactly what I needed. For the remaining 40%, particularly the more specific sounds (a CRT powering on, a cassette tape ejecting), I grabbed samples from a sound library. The AI SFX tool is genuinely useful for ambient beds and generic Foley, but it cannot yet do highly specific mechanical sounds reliably.
Step 5: Assembly, Color, and the Final Polish
I brought everything into Veed.io: the 23 LTX shots, the Suno stems, the ElevenLabs dialogue tracks, the Foley layers, and a couple of royalty-free stingers and SFX samples.
Veed is a browser-based editor and it is not Premiere. The timeline gets sluggish above about 8 tracks. I had 11 audio tracks at one point and had to bounce some of them down. But for a short film with a simple cut structure, it works fine. The AI features that actually helped:
The color match tool. I picked a reference frame from Blade Runner 2049, specifically a wide shot of K walking through an orange-hazed interior. Veed analyzed the color distribution and applied a LUT across all my clips. It was too aggressive at 100% strength. I backed it off to 65% and it unified the look without turning everything into a neon soup. My LTX shots had slightly different color temperatures because I had changed the lighting prompts between scenes. The color match smoothed most of that out.
The auto-subtitle tool got maybe 85% of the dialogue right. I had to manually fix the remaining 15%, mostly words it misheard because of the radio filter on the antagonist's voice. Exporting burned-in subtitles was straightforward.
The clean audio tool reduced background hiss in two dialogue clips where I had recorded too quietly. It introduced a slight metallic quality I did not love, so I used it sparingly.
Total edit time was about 8 hours spread across two evenings. Export took 4 minutes for a 1080p file.
What I Would Do Differently
I would cast fewer characters. Two was manageable. Three would have been harder. Four would have broken something.
I would build my lookbook before writing the script, not after. I wrote the script first, then discovered that some of the visual ideas I had written were hard to generate consistently. Working backward from what the tools can do reliably would have saved me a day of rewriting shots.
I would spend more time on the dialogue recordings. There is a line in the final film where my performance has a weird upward inflection at the end of a sentence that makes it sound like a question. I should have caught it in the recording phase. Re-recording one line after the film was assembled meant re-exporting the entire dialogue stem and re-syncing it. Annoying.
I would not try to make a chase scene again without a lot more practice. Action sequences expose every weakness in current AI video generation. Static shots, slow pans, dialogue scenes, close-ups: these all work well. Running, fighting, fast camera movement: the tech is not there yet for solo creators who need consistency across 20+ shots.
Honestly, the hardest part of this entire project was not technical. It was creative decision-making. AI tools can generate 50 versions of a shot in an hour, but you still have to decide which one works. That decision fatigue is real. After 4 hours of reviewing LTX generations, I could not tell if a shot was good or bad anymore. I had to step away, sleep on it, and come back with fresh eyes. The tools accelerate production, but they do not accelerate taste. If anything, they make taste more important because you have more choices to make, faster. Budget for mental breaks in your production schedule.
I also underestimated how much I would need to learn about traditional filmmaking to direct the AI effectively. I spent roughly 15 hours reading about shot composition, lighting ratios, and blocking before I could give ltx studio useful directions. The AI is not a replacement for knowing what a good shot looks like. It is a very fast executor of your creative vision, but if your creative vision is "I want it to look like a movie, I guess," the output will look like exactly that level of thought was put into it. Spend the time learning the craft. The AI will reward you for it.
The Production Checklist I Actually Followed
Pre-Production
- [ ] Write a short script (under 10 pages) with minimal characters and locations.
- [ ] Build a visual lookbook: 5 approved character shots, a color palette, and 3-5 reference frames from existing films.
- [ ] Generate the character reference images in midjourney using
--crefand--srefflags. Expect to discard at least two-thirds of the outputs. - [ ] Lock the costume reference image separately.
Production
- [ ] Paste the script into ltx studio and review the automatic storyboard shot by shot.
- [ ] For each shot, write specific camera and lighting directions. Re-generate shots that do not match.
- [ ] Tight close-ups for action. Wide shots only when the scene is static.
- [ ] Do not trust hands. Check every frame. Trim problem frames or re-generate the shot.
Audio
- [ ] Generate a main theme in suno and extend it into scene variations.
- [ ] Export stems and plan your volume automation before assembly.
- [ ] Record all dialogue performances yourself using Speech-to-Speech in elevenlabs. Do multiple takes per line.
- [ ] Generate ambient Foley and generic SFX in elevenlabs. Source specific mechanical sounds from libraries.
Post-Production
- [ ] Assemble everything in veed.io. Keep your track count low or bounce groups down.
- [ ] Apply color match at partial strength (50-70%) across all clips.
- [ ] Generate auto-subtitles and manually correct errors.
- [ ] Use clean audio sparingly. It can degrade voice quality.
- [ ] Export at 1080p. Test on multiple screens before considering it done.
I spent about $110 total on credits across all five tools and six days of work. The film exists. It tells a story someone might actually watch. Not because the AI is magic, but because I had to sit there and make a thousand small decisions the tools could not make for me. The tools are fast, but they are not tasteful. That part is still on you.
Written by the LaunchToolsAI Creative Editorial Team. Pure US English.

 & ElevenLabs (2026 Guide)](/images/articles/ai-cinema-blueprint.png)