The problem
Filmmakers and visual storytellers think in spatial and cinematic terms — scene composition, character blocking, camera movement — not in text prompts. The customer needed a platform that could interpret a spatial canvas as AI generation instructions. Nothing on the market could turn canvas-based spatial interactions into coordinated multi-model AI generation across depth estimation, pose detection, image generation, and Gaussian splatting.
What we shipped
A multi-agent AI Visual Storytelling Platform: (1) a Spatial Grammar Agent — Claude Sonnet on Bedrock interprets canvas element arrangements and decides generative actions; (2) a Scene Reconstruction Agent — depth estimation and Gaussian splatting on SageMaker turn 2D images into navigable pseudo-3D environments; (3) a Generative Director Agent — interprets OpenPose skeletons and triggers regeneration on g6e.12xlarge / p4de.24xlarge GPU instances; (4) Step Functions orchestration selects and chains models based on spatial context. ECS Fargate runs the canvas backend; DynamoDB persists spatial state.
The outcome
Spatial grammar interpretation accuracy moved from 78% to 94% over the engagement. Canvas interaction response stays under 200ms via Lambda. Four agentic capabilities live in production: Magnetic Storyboard to Animatic, Image to Navigable Scene, Editable Generative Scene, and Drag-to-Direct AI filmmaking. Real-time multi-user collaboration via WebSocket. Artist-in-the-loop control held throughout via the spatial canvas interaction model.
Customer name redacted at the customer’s request. Numbers, services, and architecture are unchanged.