Best Practices

Content

People perform countless precise physical tasks every day. These movements are packed with information: contact points, physics knowledge, how objects work, how motions should be performed to achieve a goal, and so on.

We’ve been trained since birth to interpret motion, when we see a hand moving back and forth, we know it means hello, a robot has no idea.

That’s where Motion2Text comes in: we turn human motion into detailed textual information. This creates a shared language we can use to train Physical AI models: language itself.

With it, we can fully leverage human movement to train robots and harness the power of large-scale LLMs to help make sense of it all. Here’s how it works:

  • You send raw human demonstrations, egocentric videos of people performing manipulation tasks.

  • Motion2Text extracts dense annotations, transcribing what’s happening into fine-grained language: motion semantics, relative positions, cause and effect, task success, and more.

    We’re building the semantic layer of motion, one that’s interpretable, scalable, and bridges human movement with machine learning through language. The future of robots begins with… humans.