The fastest way to generate compelling character videos with Sora 2 is to combine three elements in every prompt: a specific lighting setup (e.g., "single side light casting half the face in shadow"), a precise shot type (e.g., "medium close-up with slow push-in"), and a physical emotion cue (e.g., "jaw tense, eyes glistening"). The 10 ready-to-copy templates below cover drama, fantasy, sci-fi, documentary, and performance styles.
Character-driven AI video lives or dies on specificity. Generic prompts produce generic faces. These templates are built around the same principles professional portrait photographers and cinematographers use—light direction, shot framing, emotional subtext, and environmental context—translated into language that Sora 2 and other AI video tools respond to reliably.
Every effective portrait prompt is built from four layers that work together to produce a coherent, emotionally resonant character video.
Each template is ready to copy and use. Adjust names, settings, and durations to fit your project.
Extreme close-up of an elderly war veteran's face, deep wrinkles and weathered skin telling decades of stories, single dramatic side light casting half the face in shadow, a single tear forming in the corner of one eye, shallow depth of field with soft bokeh background, desaturated color grade with warm skin tones, slow subtle camera push-in, 4K quality, 20 seconds duration.
A powerful warrior queen in ornate golden armor standing on a cliff edge at dusk, long silver hair flowing in the wind, glowing magical runes on her gauntlets, epic wide shot transitioning to medium close-up, dramatic rim lighting from the setting sun, volumetric fog in the valley below, cinematic color grade with deep blues and warm golds, 4K quality, 28 seconds duration.
A young street musician playing violin in a busy subway station, eyes closed in deep concentration, commuters blurring past in the background, warm tungsten light from overhead lamps illuminating her face, medium shot with shallow depth of field isolating the subject, authentic documentary style, natural ambient sound implied, 4K quality, 22 seconds duration.
A focused scientist in her 40s examining glowing blue samples in a dark laboratory, face illuminated by the cool blue light of the specimen, reflections in her glasses, medium close-up with slow rack focus from the sample to her expression of discovery, clinical yet dramatic lighting, high contrast between the glowing sample and dark surroundings, 4K quality, 18 seconds duration.
A young woman standing alone at a rain-soaked bus stop at night, city lights reflecting in puddles around her, rain streaking through the frame, medium shot with the camera slowly pulling back to reveal her isolation in the urban landscape, cool blue and amber color palette, melancholic mood, cinematic anamorphic lens flares from passing headlights, 4K quality, 25 seconds duration.
An ancient-looking elder with kind eyes sitting by a fireplace, hands gesturing as he tells a story, warm firelight dancing across his deeply lined face, close-up shot slowly rotating around him, rich amber and shadow tones, wisps of smoke from the fire, intimate and timeless atmosphere, shallow depth of field with soft background, 4K quality, 30 seconds duration.
A young hacker in a dark room surrounded by multiple glowing monitors, face bathed in shifting blue and green neon light, holographic data streams reflected in her eyes, medium close-up with subtle camera drift, high contrast cyberpunk aesthetic, rain visible through a grimy window behind her, intense focused expression, 4K quality, 20 seconds duration.
A ballet dancer mid-performance on a dark stage, single spotlight creating a dramatic pool of light, tutu catching the light as she executes a slow pirouette, medium shot transitioning to full body, high contrast between the bright spotlight and deep black background, graceful motion blur on the spinning fabric, theatrical and elegant atmosphere, 4K quality, 24 seconds duration.
A weathered desert nomad on camelback silhouetted against a vast orange sunset sky, wide establishing shot slowly zooming to a medium close-up revealing his sun-scorched face and piercing eyes, warm golden and amber tones, heat haze distorting the horizon, dust particles catching the last light, epic and solitary atmosphere, 4K quality, 26 seconds duration.
A battle-worn hero kneeling in the rain over a fallen comrade, head bowed, armor cracked and bloodied, dramatic overhead shot slowly descending to eye level, cold desaturated color grade with a single warm light source from a distant fire, rain streaking through the frame, emotionally devastating atmosphere, cinematic wide aspect ratio, 4K quality, 32 seconds duration.
Use these descriptor banks to customize any template or build your own portrait prompts from scratch.
Small prompt adjustments produce dramatically different results. These four principles separate good character videos from great ones.
"Dramatic" is vague. "Single tungsten lamp from the left casting a hard shadow across the right cheek" is actionable. The AI generates what it can visualize, not what it can feel.
Emotions live in micro-expressions and posture. Specify jaw tension, eye moisture, shoulder position, and hand placement. These physical anchors produce consistent emotional reads across generations.
The space around a character tells their story. A cluttered desk implies obsession; an empty room implies loss. Let the environment do narrative work so your prompt stays focused on the character.
Always close with resolution and duration: "4K quality, 20 seconds duration." This signals to the model that the prompt is complete and prevents it from generating filler content to fill an unspecified runtime.
A strong portrait prompt combines three layers: subject description (age, appearance, emotion), lighting specification (direction, quality, color temperature), and camera instruction (shot type, movement, depth of field). The more precisely you describe the light source and its relationship to the subject's face, the more cinematic the result. Always include a mood or atmosphere keyword to anchor the emotional tone.
Describe the emotion through physical details rather than abstract labels. Instead of 'sad expression,' write 'eyes glistening with unshed tears, lips pressed together, jaw slightly tense.' Sora 2 responds better to observable physical cues. Pairing the expression with a matching environment—rain, dim light, isolation—reinforces the emotional read and produces more consistent results.
Extreme close-up (ECU) and medium close-up (MCU) are the most effective for emotional character work. ECU captures micro-expressions and skin texture; MCU balances face and upper body for context. Avoid wide shots for portrait-focused prompts—they dilute the character's presence. Add a slow push-in or subtle camera drift to give static portraits a cinematic feel without distracting movement.
Ground fantasy characters in physical reality first: describe the weight and texture of their armor, the way light catches the material, how wind affects their hair or cloak. Then layer in the fantastical elements—glowing runes, magical effects, otherworldly atmosphere. This approach prevents the AI from generating generic fantasy clichés and produces characters that feel like they exist in a real world.
Single-source side lighting (Rembrandt lighting) consistently produces the most dramatic portrait results in AI video. Describe it as 'single light source from the left/right casting half the face in shadow.' For softer drama, use 'warm practical light from a fireplace' or 'cool blue light from a monitor screen.' Avoid generic terms like 'good lighting'—specificity is what separates cinematic results from flat outputs.
Yes, these prompts are designed to be platform-agnostic and work across Sora 2, Runway ML Gen-3, and Pika Labs. Minor adjustments may improve results on each platform: Runway ML responds well to film stock references like 'shot on 35mm film'; Pika Labs benefits from shorter, more direct descriptions. The core structure—subject, lighting, camera, mood—transfers across all major AI video tools.
Aim for 60 to 100 words per prompt. Below 40 words, the AI lacks enough direction and defaults to generic outputs. Above 120 words, conflicting instructions can cause inconsistencies. Structure your prompt in this order: subject and action, environment and lighting, camera movement and shot type, mood and style, technical specs (resolution, duration). This sequence mirrors how cinematographers think and aligns with how AI video models process descriptions.
Use environmental and physical details as storytelling shorthand. Worn hands suggest labor; a military medal on a civilian jacket implies a veteran; a half-empty coffee cup and dark circles suggest exhaustion. These details communicate backstory without narration. Combine them with a camera move that reveals context gradually—starting tight on a detail, then pulling back to show the full character in their environment.