← Back to feed
AI Research 84% 1 min readJun 26, 2026, 4:05 PM

HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration

Evolving story · 1 updatesHAT-4D: Multi-Object 4D Reconstruction FrameworkTimeline →
30-second summary

Researchers propose HAT-4D, a novel agentic framework using VLMs to reconstruct 3D geometry, temporal dynamics, and physical interactions of multiple objects from a single monocular video, addressing occlusions and complex dynamics in multi-object interactions.

Full story

Extracting dynamic 4D object interactions from massive, in-the-wild monocular videos offers a highly efficient data collection pathway for scaling Embodied AI and training VLAs. However, existing monocular 4D reconstruction methods primarily focus on isolated objects, often failing under the severe occlusions and complex dynamics inherent in multi-object interactions. To bridge this gap, we propose HAT-4D, the first agentic framework designed to reconstruct the 3D geometry, temporal dynamics, and physical interactions of multiple objects from a single video. By integrating VLMs with a multi-le

Source: HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration. Read the full piece at the source.

Sources · 1

Summary and analysis generated by AI (mistral). Always verify against the original sources.

Related
TickrWire

AI news intelligence. We aggregate, verify, summarise and explain the latest artificial intelligence news from open, legal sources.

Daily AI digest

Top AI stories, summarised, in your inbox each morning.

© 2026 TickrWire. Summaries and analysis are AI-generated and may contain errors.Privacy