Articulate3D: Zero-Shot Text-Driven 3D Object Posing

2025

Oishi Deb1, Anjun Hu1, Ashkan Khakzar1,2, Philip Torr1, Christian Rupprecht1
1University of Oxford, 2Google DeepMind

We propose Articulate3D that reposes a 3D asset through language control. Despite advances in vision and language understanding, reposing a 3D asset through language control is a surprisingly difficult task.

Approach

To achieve this goal, we decompose the problem into two steps. We modify a powerful image-generator to create target images conditioned on the original pose and a text instruction. We can then align the mesh to the target images through a multi-view pose optimization step.

We introduce a self-attention rewiring mechanism RSActrl that decouples the source structure from pose within an image generative model, allowing it to maintain a consistent structure across varying poses.

We can then apply this modification to a multi-view diffusion model such as MVDream without the need for retraining. We observed that differentiable rendering is an unreliable signal for articulation optimization; instead, we use keypoints to establish correspondences between input and target images.

Our method works on a variety of 3D meshes of different shapes and sizes.

Flow and Architecture Diagram of Articulate3D

This consists of 2 main steps:

  1. RSActrl Module
  2. Multi-view Mesh Optimization
    1. Keypoint Detection
    2. Keypoint Alignment

Results

Below we show the results for the following user's prompt to get a posed output.

1. User's Prompt: "A Humming bird is folding its wings."

Starting Mesh Position Starting mesh position of a hummingbird Interactive Starting Mesh
Load 3D Model
Arrow
Visuals from 8 viewpoints [Grey background visuals are target 2D images]
Arrow
Final Result Interactive Final Result
Load 3D Model

2. User's Prompt: "A Tiger is lifting its front legs."

Starting Mesh Position Starting mesh position of a tiger Interactive Starting Mesh
Load 3D Model
Arrow
Visuals from 8 viewpoints [Grey background visuals are target 2D images]
Arrow
Final Result Interactive Final Result
Load 3D Model

More Interactive Results

Below we show a number of results for user text prompts on various meshes to get a posed output. Left side of the arrow shows the input mesh, and right side is the final output.

3. User's Prompt: "A Phoenix is gliding up."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

4. User's Prompt:"A humming bird is looking up."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

5. User's Prompt: "A seagull is stretching its wings up."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

6. User's Prompt: "A tiger is walking."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

7. User's Prompt: "A Frog is jumping."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

8. User's Prompt: "A hummingbird bending its head down."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

9. User's Prompt: "A brown bird is raising its wings."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

10. User's Prompt: "A tiger is bending towards sitting pose."

Starting Mesh Position
Interactive Starting Mesh
Load 3D Model
Arrow
Final Result
Interactive Final Result
Load 3D Model

Animations

Here, we showcase the animations generated based on user prompts.

1. User's Prompt: "A humming bird is bending its head."

Arrow

2. User's Prompt: "A seagull is stretching its wings up."

Arrow

3. User's Prompt: "A humming bird is looking up."

Arrow

4. User's Prompt: "A tiger is bending towards sitting pose."

Arrow

Acknowledgements

We would like to thanks Hirokatsu Kataoka, Minghao Chen, Orest Kupyn, Paul Engstler, David Fan and Zheng Xing for insightful and technical discussions.

Paper

BibTeX

  @InProceedings{deb2025articulate3d,
    author    = {Oishi Deb and Anjun Hu and Ashkan Khakzar and Philip Torr and Christian Rupprecht},
    title     = {Articulate3D: Zero-Shot Text-Driven 3D Object Posing},
    booktitle = {xx},
    month     = {xx},
    year      = {2025},
    pages     = {xx}
  }