Training-Free Neural Matte Extraction for Visual Effects: Limitations and Conclusion

6 Jul 2024

Author:

(1) Sharif Elcott, equal contribution of Google Japan (Email: [email protected]);

(2) J.P. Lewis, equal contribution of Google Research USA (Email: [email protected]);

(3) Nori Kanazawa, Google Research USA (Email: [email protected]);

(4) Christoph Bregler, Google Research USA (Email: [email protected]).

Table of Links

5 LIMITATIONS AND CONCLUSION

We have introduced a matte extraction approach using the deep image prior. The algorithm is simple, requiring only a few tens of lines of code modification to an existing U-net. Our approach is training-free and is thus particularly suitable for the diverse, few-ofa-kind subjects in entertainment video production. It also may be of intrinsic theoretical interest in terms of the nature and solution of the matte extraction problem. A further potential use would be to produce ground-truth mattes to be used for DL training. As is the case with many matting algorithms, it assumes coarse guidance in the form of a trimap or similar constraints. This can be created by the artist using readily available semi-automatic tools.

Computational cost is the major limitation of the method, in common with classic methods [Levin et al. 2008]. Compute times for the examples shown in the paper are measured in minutes (but not hours) on a single previous generation Nvidia Volta GPU. This restricts the use of our algorithm to high-quality offline applications where extensive non-real-time computation is the norm, primarily movies and videos. On the other hand, the computation can take advantage of support for multiple GPUs provided in deep learning frameworks, and intermediate results can be visualized.

Our method can produce temporally consistent matte extractions from video by warm-starting the optimization from the previous frame (see accompanying video), however in our experience this requires that the trimaps have smooth motion from frame-to-frame. A topic for future work is to consider recurrent or other network architectures that might make the trimap choice more forgiving. This paper has focused on introducing the DIP matting algorithm. There was relatively little architecture and parameter exploration, and further improvements may be possible.

ACKNOWLEDGMENTS

G.G. Heitmann, Peter Hillman, and Kathleen Beeler gave helpful insights and feedback.

REFERENCES

Adobe 2018. How to use Select Subject in Photoshop for One-Click Selections. https: //www.photoshopessentials.com

Yagiz Aksoy, Tae-Hyun Oh, Sylvain Paris, Marc Pollefeys, and Wojciech Matusik. 2018. Semantic soft segmentation. ACM Trans. Graph. 37, 4 (2018).

Godzilla vs. Cat 2021. https://www.youtube.com/watch?v=nf7GsKFepDg.

Mikhail Erofeev, Yury Gitman, Dmitriy Vatolin, Alexey Fedorov, and Jue Wang. 2015. Perceptually Motivated Benchmark for Video Matting. In British Machine Vision Conference.

Yossi Gandelsman, Assaf Shocher, and Michal Irani. 2019. "Double-DIP": Unsupervised Image Decomposition via Coupled Deep-Image-Priors. In Comp. Vision and Pattern Recognition

Kaiming He, Christoph Rhemann, Carsten Rother, Xiaoou Tang, and Jian Sun. 2011. A global sampling method for alpha matting. Comp. Vision and Pattern Recognition

Kaiming He, Jian Sun, and Xiaoou Tang. 2010. Guided Image Filtering. In European Conf. Comp. Vision.

G.G. Heitman. 2020. Communication from technical artist, Weta Digital

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Int. Conf. Learning Representations.

Anat Levin, Dani Lischinski, and Yair Weiss. 2008. A Closed-Form Solution to Natural Image Matting. IEEE Trans. PAMI 30, 2 (2008).

Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian Curless, Steve Seitz, and Ira Kemelmacher-Shlizerman. 2020. Real-Time High-Resolution Background Matting. arXiv (2020).

Y. A Mishima. 1992. Software Chromakeyer Using Polyhedric Slice. In NICOGRAPH.

Christoph Rhemann, Carsten Rother, Jue Wang, Margrit Gelautz, Pushmeet Kohli, and Pamela Rott. 2009. A Perceptually Motivated Online Benchmark for Image Matting. In Comp. Vision and Pattern Recognition.

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI (2015).

Mike Seymour. 2020. Art of LED Wall Virtual Production. fxguide.com.

Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. 2021. Semantic Image Matting. In Comp. Vision and Pattern Recognition.

Dmitry Ulyanov, Andrea Vedaldi, and Victor Lempitsky. 2018. Deep Image Prior. In Comp. Vision and Pattern Recognition.

Jue Wang and Michael F. Cohen. 2007. Image and Video Matting: A Survey. Found. Trends Comput. Graph. Vis. 3, 2 (2007).

Yong Xu, Baoling Liu, Yuhui Quan, and Hui Ji. 2022. Unsupervised Deep Background Matting Using Deep Matte Prior. IEEE Trans. Circuits Syst. Video Technol. (2022).

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.