GAN Inversion: A brief walkthrough — Part II

This series intends to explain the concept of GAN Inversion [1], which aims to provide an intuitive understanding of the latent space by inverting an image into the latent space to obtain its latent code.

In Part I, we briefly introduced the concept of GANs and their latent spaces, in addition to introducing the concept of GAN Inversion [1] and its approaches and showed a brief comparison with VAE-GANs.

In this post, which is the second part of a four-part series on GAN Inversion, we discuss the characteristics of GAN inversion as well as latent space navigation using the technique.

1. Characteristics of GAN inversion

Semantic Aware

Layer-wise

Out of distribution

We will be exploring some of these researches in more detail in the following parts of this series. But for now, let’s explore latent space navigation together with GAN inversion.

2. Latent space navigation

Interpretable directions

Fig. 1: Latent space

Numerous researches enforce disentanglement to obtain better interpretable directions in the latent space for attributes (e.g., gender, age, hair color in face images). Jahanian et al. [5] suggested an approach to steer in interpretable directions, linear and non-linear, in the GAN latent space for meaningful geometric transformation (e.g., zoom, shift, color manipulation) without enforcing disentanglement during GAN training. They observed that these interpretable directions have similar effects across all object classes generated by the GAN.

Nurit et al. [4] noticed that the output of the first layer of the generator is a coarse representation of the generated image. Hence, applying a meaningful geometric transformation on the first layer output is similar to applying it to the original image itself. They show that containing this problem to a single layer can help obtain interpretable directions in a closed-form, referring to computing interpretable directions without any training or optimization using just the generator’s weights. They investigate the unsupervised exploration of the transformations in the latent space using PCA. However, it still remains an open problem for extracting attributes such as gender and age in the latent space, requiring training or optimization.

Non-interference

Shen et al. [11] performs conditional manipulation in the sub-space to find orthogonal semantic directions to support non-interference. Conditional manipulation refers to obtaining directions in the latent space to support decoupling between coupled attributes. Their approach can be seen in Fig. 2. With hyper-planes defining separation boundaries for semantic attributes, n₁ and n₂ represent the unit normal vectors from two hyper-planes, and n₁ — (n₁ᵀn₂)n₂ represents the new orthogonal semantic direction.

Fig. 2: Conditional manipulation in subspace for non-interference by [11]

The results from their conditional manipulation can be seen in Fig. 3.

Fig. 3: Results from conditional manipulation proposed by [11]

Voynov et al. [9] and Ramesh et al. [10] use Jacobian decomposition to identify more disentangled directions within the latent space.

3. Summary

References

[2] Zhu, J., Shen, Y., Zhao, D., & Zhou, B. (2020). In-Domain GAN Inversion for Real Image Editing. ArXiv, abs/2004.00049.

[3] Gu, J., Shen, Y., & Zhou, B. (2020). Image Processing Using Multi-Code GAN Prior. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 3009–3018.

[4] Spingarn-Eliezer, N., Banner, R., & Michaeli, T. (2021). GAN Steerability without optimization. ArXiv, abs/2012.05328.

[5] Jahanian, A., Chai, L., & Isola, P. (2020). On the “steerability” of generative adversarial networks. ArXiv, abs/1907.07171.

[6] Bau, D., Zhu, J., Strobelt, H., Zhou, B., Tenenbaum, J.B., Freeman, W.T., & Torralba, A. (2019). GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. ArXiv, abs/1811.10597.

[7] Bau, D., Zhu, J., Wulff, J., Peebles, W.S., Strobelt, H., Zhou, B., & Torralba, A. (2019). Seeing What a GAN Cannot Generate. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 4501–4510.

[8] Leng, G., Zhu, Y., & Xu, Z.J. (2021). Force-in-domain GAN inversion. ArXiv, abs/2107.06050.

[9] Voynov, A., & Babenko, A. (2020). Unsupervised Discovery of Interpretable Directions in the GAN Latent Space. ArXiv, abs/2002.03754.

[10] Ramesh, A., Choi, Y., & LeCun, Y. (2018). A Spectral Regularizer for Unsupervised Disentanglement. ArXiv, abs/1812.01161.

[11] Shen, Y., Gu, J., Tang, X., & Zhou, B. (2020). Interpreting the Latent Space of GANs for Semantic Face Editing. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 9240–9249.

Written by: Sanjana Jain, AI Researcher, and Sertis Computer Vision Team