- AI Creator Studio
- Posts
- AI videos just got a major upgrade: Transform static images into action
AI videos just got a major upgrade: Transform static images into action

Hey there, creative human! 🌟
This week, I’m diving into one of the most powerful features in AI video generation: References. Imagine blending multiple images—portraits, landscapes, objects—into a single, seamless video.
Let’s dive into how this feature works and why it’s about to change the way you produce video content. 🚀

🌍 Latest AI Power Moves
China's AI momentum continues post-DeepSeek, with major tech players launching innovative tools that could reshape content creation:
🇨🇳 Alibaba releases Qwen 2.5: A new AI language model that reportedly surpasses DeepSeek-V3, capable of generating highly detailed, instruction-following images and videos that rival top-tier AI generators. See examples here.
🎵 YuE drops as China's latest open-source model: This full-song music generation model creates tracks up to five minutes long from lyrics or audio clips. Most impressive? It can blend different songs, like mixing "Plastic Love" with "Harder Better Faster Stronger."
🎨 Freepik introduces custom brand style AI generation: Create your own branded illustration style by uploading 10-50 reference images and getting consistent visuals across all your content. While not part of the Chinese AI wave, this Spain-based company's innovation is worth noting.

DEEP DIVE
🎭 Combining References in AI Video Generation
One of the hottest generative AI features right now is References (also known as Elements or Ingredients), which allows for blending multiple images into a stunning single video output.
Take a look at this AI video generated using just two static images — one with the women's portrait photos and another of the building background:
![]() Reference image 1: a static photo of two women to be used for motion | ![]() Reference image 2: a photograph of a building intended for use as background |
By merging these shots and utilizing the Reference feature, the following video output is achieved:
Needless to say, the reference feature in AI video generation is about to change your marketing game.
🛠️ Tools That Support the Reference Feature
The feature has rapidly spread across the AI video generation landscape. Currently supported by:
🎬 Testing with Pika 2.1
For this newsletter's test, I chose Pika 2.1. Why? Their results on social media caught my eye, plus they offer the most intuitive interface I've seen for this kind of work.
For my test video, I decided to aim high — literally. Pika suggested a skydiving template on their main page, which presents a perfect challenge: if AI can accurately render my face while I'm supposedly falling through the sky, it can handle just about any product demo you throw at it. Plus, it's a great way to test both facial accuracy and complex motion in one go.
Using Elements in Pika is straightforward: you have a prompt field for description, plus the ability to attach multiple reference images:

The generative interface in Pika. I used two portrait photos and one landscape photo for the background settings.
Output1.mp4
My first attempt turned me into Superman — not exactly the professional look I was going for 😀

Lessons learned:
Portrait references need clean backgrounds. My original portraits included landscape elements that confused the AI, creating unexpected blends with the scene. Solution: use background-removed images (any removal tool will do).
Basic prompts yield basic results. To improve the output, I reverse-engineered by using imagetoprompt.com on a professional skydiving photo, generating this detailed prompt:
A person is airborne against a bright sky, in an exhilarating pose. The scene captures a vast landscape city below, emphasizing the height and thrill of the skydive.
Output2.mp4
After tweaking the portrait images and refining the prompt, the results improved dramatically. Face and body reproduction was nearly perfect, though the sleeve movement needs work:

Output3.mp4
While there's no direct fix for motion issues, you can use the Negative prompt section to improve specific details. Remember: each adjustment requires generating an entirely new video - you can't just tweak one aspect:

Just six months ago, generating a video with realistic faces and natural movement seemed a long shot. Now, we can create entire scenes with recognizable people, detailed environments, and smooth motion. That got me thinking—if AI can handle something as complex as skydiving, what else is possible?
To test its limits, I tried something completely different: a meme-worthy moment featuring two of the most talked-about figures in tech and politics. The results? Let’s just say AI had some interesting ideas… 😁


INSPIRED BY AI
🎨 Creative of the Week
This week's creative hack helps solve a common challenge for food brands and restaurants: creating authentic-looking food photography that feels like user-generated content.
![]() | ![]() |
Use the following settings in Midjourney to achieve the results:
high angle iPhone photo of {pasta marinara/juicy burger}, neatly arranged on a wooden table, a single glass of {wine/beer}, nice contrasting light and shadows, film grain and natural look --profile q1zhx29 --ar 3:4
That's it for this week! I hope you'll take some time to experiment with Reference feature and test different prompts to find what works best for your content. Remember - clean backgrounds and detailed prompts are your best friends.
Catch you next Tuesday!
Radu
