AI and Animation – the path ahead

Generative text to image AI is pretty amazing. If you can write, you can create an image. Generative AI for video is also making great progress. But AI is also being used in more traditional animation flows, such as creating an animation clip for a character, or for creating a 3D model to put in a scene. Which approach is better?

There is so much going on in the AI and animation space. If you are not following the space, here is a taste:

  • AI Video Generation: Runway takes generative AI for images to video clips.
  • AI Dubbing: Wonder Studio will replace a human actor with a rendered character. Flawless AI can dub not only the voice track to a new language, but adjust the mouth positions of actors to match the dialog.
Gen-2 video from a sentence about rain in a rainforest.

There are also tools designed to work in existing 3D animation pipelines rather than replacing them.

  • AI Mocap: Motion capture is not new, but many tools are now using AI to improve their quality or extract directly from video without special equipment (Google MediaPipe, Rokoko, Move.ai, DeepMotion, Plask.ai, … it’s a long list these days).
  • AI Assisted Motion: Cascadeur use AI and physics to make action animations feel more realistic with less effort. Aimator is a similar new offering – pose some frames, get AI to generate everything in-between to reduce time spent by animators.
  • AI Animation Generation: There are a number of text-to-animation research papers, with first products emerging like MotionGPT.

So if you want to create a computer generated animation in a year’s time, which technology do I think is going to win?

Okay, it is not so much “win” as it is more about picking the best technology for your situation. For myself, I care about repeatable quality so I personally like the approaches that fit into existing animation workflows.

Why? I have been using AI for some projects outside the animation space. Two general strategies are:

  • Use AI for the end-to-end process
  • Use AI for stages in the current process

If you are creating an image using Midjourney, DALL-E, Stable Diffusion etc, then you give it text and you end up with a final image. Don’t like the image? Tweak the text (and other knobs made available) and try again. But you cannot say “move the character 1 foot to the left” (at least not yet). The control you have over the final image is limited. If it does what you want, then it’s great. Otherwise it can be a world of pain.

And this is the problem I see. End-to-end AI replacement of a process means you lose control, and for some projects that control matters.

If you want repeatability and reliability, then AI automating the full process in a single step can be risky

So for animation, rather than relying on consistent image quality across a whole video of generative AI, I am more bullish on generating animation clips and then using traditional rendering pipelines to generate the final images (with full materials control, physics based reflections, color correction, etc.). It is not that AI generated final videos are not impressive – they are! But if you want full control to guarantee consistency I think a better approach is to have each phase of an animation pipeline automatable. By generating files for each stage in the process, a human can get in and adjust the automatically generated content when they need to. This results in a more repeatable, predictable process you can use across multiple projects.

And I think this is true of AI solutions in general. The more control you need, the more you should think about AI for individual stages, so you have greater control over the final result.


Leave a comment