Blog

Turn an empty stadium into a packed crowd with AI VFX

Turn an empty stadium into a packed crowd with AI VFX

Empty stadium transformed into a packed crowd scene with Beeble Canvas

Creating a convincing stadium crowd has traditionally required significant resources. Productions often rely on hundreds of extras, crowd replication techniques, green screen shoots, or extensive visual effects to create the feeling of a live event.

In this Beeble Canvas tutorial, we transform an empty stadium into a packed venue using generative AI, compositing, and SwitchX. The result is a shot that feels like it was captured during a live match.

Watch the tutorial

Practice footage provided by ActionVFX.

Starting with an empty stadium

Our goal is to add a large, active crowd to the stands while preserving the original player, camera work, and overall visual style.

The workflow is built around three stages: generating a crowd-filled background, isolating the player, and using SwitchX to match the final composite back to the original footage.

Step 1: Generate the crowd background

We begin by trimming the shot to 238 frames and extracting the first frame. This frame gives us a clean view of the empty stadium before the player enters the scene.

Clean first frame of the empty stadium before the player enters

Canvas Image Generator node configured to fill the stands with a crowd

Using Nano Banana with an Image Generator node, we generate multiple versions of the frame with the stadium filled with active crowds.

After reviewing the results, we select the strongest crowd-filled image and use it as the foundation for the next step.

Multiple Nano Banana crowd-filled variations of the stadium frame

To bring the scene to life, we extend the image into video using Image-to-Video models. We test three different options: Seedance 2.0, Happy Horse, and Kling 3.0. Across all three models, we use a prompt that asks for a static, fixed-angle tripod shot, a massive cheering crowd filling the stadium, no players introduced into the scene, and a photorealistic 4K look.

Image-to-Video model options tested: Seedance 2.0, Happy Horse, and Kling 3.0

Each model produces a slightly different interpretation. Seedance and Happy Horse both deliver strong results, but Kling 3.0 produces the most convincing crowd movement and becomes our final background plate.

Kling 3.0 crowd movement chosen as the final background plate

The empty stadium transformed into a packed, energy-filled venue

At this point, we have transformed an empty stadium into a packed, energy-filled venue.

Step 2: Isolate and composite the football player

With the crowd background ready, the next task is to bring the football player back into the shot. Because the player is not visible in the opening frame, Auto Mode in Video Matte is not the right choice. Instead, we use Select Mode and open the Smart Select tool.

Select Mode and the Smart Select tool open in Video Matte

We move to frame 99, where the player is clearly visible, and simply type "man" to select the football player. Video Matte then propagates that selection throughout the entire clip, isolating the player even though the footage was never shot on a green screen.

Selecting the football player by typing "man" at frame 99

Video Matte propagating the player selection across the entire clip

To help the composite feel more natural, we add a small amount of blur to the player layer before merging it with the generated stadium background.

A small blur added to the isolated player layer before merging

Using a Merge node, we place the isolated player over the crowd-filled stadium.

Merge node placing the isolated player over the crowd-filled stadium

The scene immediately begins to feel larger in scale, while preserving the original performance and camera movement.

Step 3: Match the final shot with SwitchX

At this stage, the composite is working, but the color and tone still feel disconnected from the original footage.

The generated stadium background appears cooler, bluer, and more contrasty than the source video. To bring everything back into the visual language of the original shot, we render the final composite through SwitchX.

Composite where the generated background looks cooler and bluer than the source

We select a reference frame from the original footage where both the stadium and the player are visible and send it to SwitchX.

Importantly, we leave the alpha mask empty.

Reference frame sent to SwitchX with the alpha mask left empty

Without an alpha mask, SwitchX preserves everything in the frame and focuses on relighting and recoloring the image. This behaves similarly to Fill Mode in the dedicated SwitchX tab, allowing the generated crowd and original player to inherit the color and tone of the source footage.

SwitchX relighting the composite to match the original footage color and tone

The result is a crowd-filled stadium that feels naturally integrated into the original scene, rather than added in post-production.

Why this workflow matters

Crowd scenes have historically been one of the most resource-intensive elements of sports, concerts, and large-scale event productions.

Canvas offers a different approach. By combining generative AI with masking, compositing, and color matching, we can dramatically increase the perceived scale of a scene while maintaining creative control over the final image.

Rather than replacing traditional VFX techniques, this workflow combines generative tools with established post-production practices to create results that remain grounded in the original footage.

For filmmakers, commercial producers, and VFX artists, it provides a practical way to create larger, more cinematic scenes without expanding the production footprint.

Credits usage breakdown

Crowd VFX

  • Video Matte = 8 Credits
  • Image Generator = 4 Credits

Video Generator

  • Seedance = 189 Credits
  • Happy Horse = 100 Credits
  • Kling 3 = 60 Credits

Final Composite

  • SwitchX = 80 Credits

Total

  • 441 Credits = $14.7

Ready to create larger crowd scenes in Canvas?