Dance Control
Here we start with a typical image generated with Stable Diffusion. As you might guess, the prompt involves the future, some dancers, and some painters.
data:image/s3,"s3://crabby-images/e9e27/e9e276bcf705d39c9fff47e378c732aae9694c3b" alt=""
We used RunwayML to extract depth data from a video sequence, and the results look like this.
data:image/s3,"s3://crabby-images/5ca15/5ca15cb179e4655b8c132c181860a26a2a0690ab" alt=""
Feeding that into Stable Diffusion with ControlNet set up for depth, we get a very different image. For this post, we are using the popular, and super convenient, Stable Diffusion web UI.
data:image/s3,"s3://crabby-images/9e751/9e7515b1732f9b8b9d7f9eee1d1b16d956032eb9" alt=""
The original video was shot indoors, and we can also use RunwayML to create a mask and remove the depth image background, letting the model hallucinate its own setting. Conveniently, we also acquire a third arm.
data:image/s3,"s3://crabby-images/2e439/2e4390d9851c8ca8d4dd5cb9e28872ff1a281de2" alt=""
We can also swap in a different background, for more control and variety there.
data:image/s3,"s3://crabby-images/9189a/9189a5dbdcd8f3e6776e4fe3ab622a96782c1c59" alt=""
In the end, we found that simply using After Effects to composite the dancer onto a background, and using the depth_midas
preprocessor obviated the need for computing the depth map ourselves, and instead of composing depth maps, we just let the model estimate the depth of a composite image and use this for the ControlNet depth input. The result is below.
The keen-eyed readers among you will of course have noticed the source of the background, a animation created with Cinema4d all those aeons ago.