Crypto Games
  • Crypto Games
    • News
    • Presales
    • Airdrops & Giveaways
    • Tournaments & Events
    • Reviews
    • Guides
    • Editorials
  • Reviews
  • Others
    • Blockchains
      • News
    • Dapps
    • Press Release
  • Regular Games
Reading: Nvidia's New AI Model Turns 1 Photo Into a 60-Second Controllable World on a Single GPU
Share
Telegram News
Crypto Games Crypto Games
Font ResizerAa
  • Crypto Games
  • Reviews
  • Others
  • Regular Games
Search
  • Crypto Games
    • News
    • Presales
    • Airdrops & Giveaways
    • Tournaments & Events
    • Reviews
    • Guides
    • Editorials
  • Reviews
  • Others
    • Blockchains
    • Dapps
    • Press Release
  • Regular Games
Follow US
Copyright © 2026 CryptoGames.GG. All Rights Reserved.
Crypto Games > Blog > Artificial Intelligence (AI) > Nvidia’s New AI Model Turns 1 Photo Into a 60-Second Controllable World on a Single GPU
Artificial Intelligence (AI)

Nvidia’s New AI Model Turns 1 Photo Into a 60-Second Controllable World on a Single GPU

Staycalm4now By Staycalm4now - Owner Last updated: May 16, 2026 7 Min Read
We may include affiliate links in our content, meaning we could earn a commission—or receive blockchain-based assets—if you click a link and make a purchase or take a specific action. Additionally, we use generative AI to help draft and refine our posts for clarity and grammar. All content is fact-checked and reviewed by a human editor before publication.
Nvidia's New AI Model Turns 1 Photo Into a 60-Second Controllable World on a Single GPU
SHARE

Give SANA-WM one image and a camera path. Thirty-four seconds later, you have a minute-long, 720p video where the camera moves exactly where you told it to go. No multi-GPU cluster. No cloud rental. One consumer graphics card.

That’s the pitch from Nvidia‘s latest open-source release, and the benchmarks back it up.

What SANA-WM does

Ad image Ad image

SANA-WM is a 2.6 billion-parameter world model, a system that takes a single image, a text prompt and a six-degrees-of-freedom (6-DoF) camera trajectory, then synthesizes a photorealistic video that follows that trajectory frame by frame. The output is 720p, runs up to 60 seconds, and maintains camera precision that beats models five times its size.

This isn’t image-to-video generation where you type a prompt and hope for the best. You’re defining the camera path through a 3D-consistent scene. The model tracks rotation, translation and movement with metric-scale accuracy across the full minute.

NVIDIA just unleashed SANA-WM and it’s an absolute MONSTER for the future of open source AI!

A blazing-fast 2.6B-parameter open-source world model that doesn’t just generate video… it creates controllable, physics-rich, high-fidelity worlds on demand.
Why this is insanely… pic.twitter.com/fQ4IOHGhEK

— Brian Roemmele (@BrianRoemmele) May 16, 2026

The single-GPU trick

Most competing world models require eight GPUs just for inference. LingBot-World uses 14 billion parameters across two models on eight GPUs and produces 0.6 videos per hour. HY-WorldPlay needs eight GPUs for 1.1 videos per hour at 480p.

SANA-WM generates 24.1 videos per hour on a single GPU at 720p. With the full refinement pipeline running on eight H100s, throughput hits 22.0 videos per hour, a 36x advantage over LingBot-World at comparable visual quality scores.

The distilled inference variant, running on a single RTX 5090 with NVFP4 quantization, denoises a complete 60-second 720p clip in 34 seconds. Total memory footprint for the full pipeline: 74.7 GB, which fits inside an H100’s 80 GB budget.

How it handles minute-long video without melting

Standard attention in video diffusion models scales quadratically with sequence length. A 60-second video at 720p means 961 latent frames. On a single GPU, standard softmax attention simply runs out of memory.

SANA-WM solves this with a hybrid architecture. Fifteen of its 20 transformer blocks use frame-wise Gated DeltaNet (GDN), a recurrent mechanism that maintains a constant-size memory state regardless of how long the video gets. A decay gate forgets stale frames. A delta-rule correction updates only the difference between what the model predicts and what it needs, not the entire state.

The remaining five blocks use traditional softmax attention, placed at regular intervals to anchor long-range spatial consistency where the recurrent approach alone falls short.

To prevent the gradient instability that killed earlier attempts, the team developed a key-scaling formula (1/√(D·S)) that keeps the transition matrix bounded. Without it, training crashes with NaN errors within the first 16 steps.

Camera control that tracks two timescales

Controlling a camera through a minute-long scene requires precision at two different temporal rates. SANA-WM uses a dual-branch system to handle both.

The coarse branch operates at the latent-frame rate, applying Unified Camera Positional Encoding (UCPE) to capture the global trajectory structure across the full sequence.

The fine branch addresses a compression problem: each latent token represents eight raw video frames, each with its own camera pose. The fine branch computes pixel-wise Plücker ray maps from all eight frames, packs them into a 48-channel tensor, and injects this data after each self-attention output. This recovers the intra-stride camera motion that the coarse branch can’t see.

In ablation tests, UCPE alone achieved a Camera Motion Consistency (CamMC) score of 0.2453. Adding Plücker mixing dropped it to 0.2047, the best among all compared methods.

A second-stage refiner cleans up drift

Raw output from the first stage is spatiotemporally consistent but can develop structural artifacts over long sequences. A second-stage refiner, built on the 17 billion-parameter LTX-2 model with rank-384 LoRA adapters, corrects these issues in just three denoising steps.

The impact is measurable. On hard camera trajectories, visual quality degradation from the first 10 seconds to the last 10 seconds (ΔIQ) dropped from 3.09 to 0.31 after refinement. On simple trajectories, it went from 3.79 to 1.17.

Training took 18.5 days on 64 H100s

The entire training pipeline used 212,975 video clips drawn from seven public and synthetic sources, with metric-scale 6-DoF pose annotations generated by a modified version of the VIPE camera-pose annotation engine.

Training proceeded in four progressive stages over approximately 15 days, preceded by a 3.5-day VAE adaptation step. The process started with short five-second clips to establish the GDN architecture, then scaled to full 60-second sequences with camera control, and finished with distillation to reduce inference to four denoising steps.

Custom fused Triton kernels for GDN scan and gate operations contributed 1.5x to 2x throughput gains throughout training.

Benchmark results against the field

On Nvidia’s purpose-built 60-second benchmark (80 scenes across four categories with simple and hard trajectory splits), SANA-WM with the refiner achieved rotation errors of 4.50° and 8.34° on simple and hard splits respectively, versus 10.47° and 18.99° for LingBot-World. VBench visual quality scores hit 80.62 and 81.89, matching LingBot-World’s 81.82 and 81.89 while outputting at 720p instead of 480p on a single GPU instead of eight.

The model has acknowledged limitations. There’s no explicit 3D scene memory, which means it can drift in dynamic scenes or unusual viewpoints. The authors suggest using the fast stage-one model to search through trajectory options, then selectively running the refiner on promising results.

Open source under Apache 2.0

SANA-WM is available through the NVlabs/Sana GitHub repository with Apache 2.0 licensing for the code. Individual dataset and model weight licenses vary. The repo also hosts SANA, SANA-1.5, SANA-Sprint, and SANA-Video.

Three inference variants ship with the release: a bidirectional generator for highest-quality offline synthesis, a chunk-causal autoregressive generator for sequential streaming, and the distilled autoregressive variant that hits the 34-second mark on a single RTX 5090.

You Might Also Like

MetaSpace NFT Staking Begins with $MLD Rewards Starting January 10

Moku Grand Arena Season One Debuts with $1 Million Prize Pool

Might & Magic Fates launches Season of Path of Legends today

Avalanche Indie Game Dev Platform Grotto Launched Runtime SDK

YGG Collaborates with Pirate Nation’s Proof of Play for Web3 Game Publishing

TAGGED:AllNVIDIASANA-WM
Share This Article
Facebook X Whatsapp Whatsapp Reddit Telegram Copy Link Print
Share
By Staycalm4now
Owner
Follow:
George Tsagkarakis, known as Staycalm4now is a professional author in the crypto gaming industry since early 2018. He has experienced all the growth of Blockchain Gaming and helped multiple projects achieve their goals and established a player base. He is the co-founder of egamers.io and now the Founder and owner of CryptoGames.gg He is also the COO of MyStage, an AI x Crypto Startup.
Previous Article Bored Club Hosts Multiple Dinners For BAYC & MAYC Holders Across Canada Bored Club Hosts Multiple Dinners For BAYC & MAYC Holders Across Canada
Next Article Myth Legends Sets May 22 Google Play launch, Confirmed For CROSS Gamechain Myth Legends Sets May 22 Google Play launch, Confirmed For CROSS Gamechain
Leave a Comment
Subscribe
Login
Notify of
Please login to comment
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
FacebookLike
XFollow
YoutubeSubscribe
TiktokFollow
TelegramFollow

Stay Updated

Join our telegram Channel and stay in the loop with the most important news.
Latest News
Myth Legends Sets May 22 Google Play launch, Confirmed For CROSS Gamechain
Myth Legends Sets May 22 Google Play launch, Confirmed For CROSS Gamechain
May 16, 2026
Nvidia's New AI Model Turns 1 Photo Into a 60-Second Controllable World on a Single GPU
Nvidia's New AI Model Turns 1 Photo Into a 60-Second Controllable World on a Single GPU
May 16, 2026
Bored Club Hosts Multiple Dinners For BAYC & MAYC Holders Across Canada
Bored Club Hosts Multiple Dinners For BAYC & MAYC Holders Across Canada
May 16, 2026
EVE Frontier Cycle 6 Launches on June 25 With New Features
EVE Frontier Cycle 6 Launches on June 25 With New Features
May 16, 2026
Featured image for Play-to-Earn News: May 2026 Roundup
Play-to-Earn News: May 2026 Roundup
April 28, 2026
My Pet Hooligan launches $HOOLI token on Solana with mini-game claims.
My Pet Hooligan launches $HOOLI token on Solana with mini-game claims.
May 15, 2026
Imperial Faction goes live in Gladiator Mayhem as Lightning Forge Games expands roster
Imperial Faction goes live in Gladiator Mayhem as Lightning Forge Games expands roster
May 15, 2026
Featured image for Esports Cryptocurrency: How Crypto Tokens Are Reshaping Esports
Esports Cryptocurrency: How Crypto Tokens Are Reshaping Esports
May 14, 2026

You Might Also Like

Best Crypto Presales Games to Watch this November – $EV2, Lumiterra, Tapzi, & Based Eggman
Crypto GamesPresales

Best Crypto Presales Games to Watch this November – $EV2, Lumiterra, Tapzi, & Based Eggman

9 Min Read
gTrade Launches $400,000 “Trick or Trade” Halloween Contest on Arbitrum
Crypto GamesTournaments & Events

gTrade Launches $400,000 “Trick or Trade” Halloween Contest on Arbitrum

2 Min Read
"Seraph Season 5 Launches December 18 with Exciting New Chaos Dungeon Rewards"
Crypto GamesNews

Seraph Season 5 Starts Dec 18 Featuring New Chaos Dungeon Rewards

4 Min Read
FIFA Rivals launches legacy World Cup mode with new features
Crypto GamesNews

FIFA Rivals launches legacy World Cup mode with new features

2 Min Read

Always Stay Up to Date

Subscribe to our newsletter to get our newest articles instantly!
[mc4wp_form]
Crypto Games GG Logo. Crypto Games GG Logo.

CryptoGames.GG is a Crypto Games List and News Portal.

We share valuable information about Play To Earn Games and Other Web3 Projects.

While CryptoGames.GG uses AI to produce and draft content; every piece of information is fact-checked by a human, reviewed, and edited as needed.

News

  • Crypto Games
    • News
    • Presales
    • Airdrops & Giveaways
    • Tournaments & Events
    • Reviews
    • Guides
    • Editorials
  • Reviews
  • Others
    • Blockchains
      • News
    • Dapps
    • Press Release
  • Regular Games

The Boring Stuff

  • About Us
  • RSS Feeds
  • Contact
  • Disclaimer
  • Terms and Conditions
  • Privacy Policy
  • Review Process Statement

Join Our New Telegram Group

Discover the most importa news, from presales to giveaways and game updates.
Join Now
2026 CryptoGames.GG All Rights Reserved
wpDiscuz
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?