MotionScript: Natural Language Descriptions for Expressive 3D Human Motions

Payam Jome Yazdian1Rachel Lagasse1Hamid Mohammadi2Eric Liu1Li Cheng2Angelica Lim1

1School of Computing Science, Simon Fraser University, Burnaby, BC, Canada
2Dept. of Electrical and Computer Engineering, University of Alberta, Edmonton, Alberta, Canada

Human Annotation

A person goes from a standing position to a bended knee while gesturing.


MotionScript Generated Caption

His left elbow and his left knee are at right angle. From this stance, the left knee extends, and not long after, it bends. Just moments before, both hands are further down than his hips, near to the left ankle. With that pose, his right hand spreads significantly apart from the left ankle and right after, gets closer to the right foot. In the second right before, he moves upwards and in the meantime, moves towards the front. A second later, he shifts far to the left. Simultaneously, both hands spread away from the left foot. Meanwhile, his left hand spreads significantly apart from his left ankle. Shortly after, he shifts downwards and a moment later, he shifts downwards briskly. The right knee is unbent and from that pose, the right knee bends significantly.

teaser image showing an avatar dancing

Paper Code Video Gallery

Abstract


We introduce MotionScript, a novel framework for generating highly detailed, natural language descriptions of 3D human motions. Unlike existing motion datasets that rely on broad action labels or generic captions, MotionScript provides fine-grained, structured descriptions that capture the full complexity of human movement—including expressive actions (e.g., emotions, stylistic walking) and interactions beyond standard motion capture datasets. MotionScript serves as both a descriptive tool and a training resource for text-to-motion models, enabling the synthesis of highly realistic and diverse human motions from text. By augmenting motion datasets with MotionScript captions, we demonstrate significant improvements in out-of-distribution motion generation, allowing large language models (LLMs) to generate motions that extend beyond existing data. Additionally, MotionScript opens new applications in animation, virtual human simulation, and robotics, providing an interpretable bridge between intuitive descriptions and motion synthesis. To the best of our knowledge, this is the first attempt to systematically translate 3D motion into structured natural language without requiring training data.

Video


Approach


image showing framework of how MotionScript works

MotionScript framework is the process of generating textual representations of human motion, directly derived from 3D skeleton sequences. First, posecodes, a quantifiable representation of static pose attributes, are extracted. Next, temporal changes in the posecodes are analyzed using Algorithm 1 to segment dynamic motion over joints, which are represented as motioncodes, a novel representation of movement patterns. Finally, a selection process is used to filter out redundant motioncodes, aggregating them to transform the motioncodes into concise and coherent natural language sentences.

Results




MotionScript Captions vs. Human Annotations


Here, we present examples of dancing and exercise motions from the HumanML3D dataset, along both the original human-annotated captions and those generated by MotionScript. These examples demonstrate MotionScript's ability to convert raw 3D motion sequences into meaningful and structured natural language descriptions.

Human Annotation

The body is in a dancing action while doing a performance.


MotionScript Caption

The left elbow advances from behind the right one to in front of the right one. At the same time, she moves a great distance to the right. Meanwhile, she moves a bit downwards. At the same time, the right hand moves from behind the left one to a position in front of the left one and comes significantly closer to the left knee. She moves slightly downwards speedily. Simultaneously, she shifts forward.

teaser image showing an avatar dancing

Human Annotation

The person is performing while in a dancing action.


MotionScript Caption

She shifts a great distance backward rapidly. The left hand lifts from below the neck ascending to above the neck. She shifts slightly downwards and a second later, she shifts to the right just a little bit. In the second right before, both knees are bent slightly and from this position, her right knee bends and speedily. Meanwhile, her left elbow is almost completely bent and from this stance, her left elbow extends.

teaser image showing an avatar dancing

Human Annotation

This person is going backwards and is dancing.


MotionScript Caption

The right hand moves nearer to the right foot. Immediately after, he moves a great distance backward quickly. Right after, the left hand raises from below his neck to above his neck.

teaser image showing an avatar dancing

Human Annotation

The subject is performing while making a dance pose.


MotionScript Caption

He moves forward and not long after, moves backward.

teaser image showing an avatar dancing

Human Annotation

Someone is in a dancing action while doing a performance.


MotionScript Caption

He shifts backward just a little bit.

teaser image showing an avatar dancing

Human Annotation

A person is doing hand exercises while making a dance pose.


MotionScript Caption

The right hand moves from the left side of the right shoulder to the right side of the right shoulder, and not long after, it moves from the right side of the right shoulder to the left side of the right shoulder. In the second right before, the right elbow is ahead of his left elbow and is partly bent, and from this stance, his right elbow extends.

teaser image showing an avatar dancing

Human Annotation

The subject is lowering a body part and is making hand exercises.


MotionScript Caption

The elbows are a bit bent, and from that pose, both elbows bend.

teaser image showing an avatar dancing

© This webpage was in part inspired from this template.