#
U-GAT-IT Architecture

Paper: U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive
Layer-Instance Normalization for Image-to-Image Translation

### Architecture:

### Generator:

- Let x ∈ {Xs, Xt} represent a sample from the source and the target domain.
- Our translation model Gs→t consists of an encoder Es, a decoder Gt, and an auxiliary classifier ηs
- where ηs(x) represents the probability that x comes from Xs.
- Let E_s^k(x) be the k-th activation map of the encoder.
- Inspired by CAM. the auxiliary classifier is trained to learn the importance weights of the k-th feature map, w_s^k, by using the global average pooling and global max pooling.
- By exploiting the importance weights, we can calculate a set of domain specific attention feature map. a_s(x) = w_s * E_s(x) , where n is the number of encoded feature maps.
- Then out translation model Gs→t becomes equal to Gt(as(x)).
- At decoder Gt, the residual blocks with AdaLIN whose parameters are dynamically computer by a fully connected layer from the attention map.

### Discriminator:

- Let x ∈ {Xt, Gs→t(Xs)} represent a sample from the target domain and the translated source domain.
- The discriminator Dt consists of an encoder E_Dt, a classifier C_Dt, and an auxiliary classifier ηDt.
- Unlike the other translation models, both ηDt(x) and Dt(x) are now trained to discriminate whether x comes from Xt or Gs→t(Xs).
- Given a sample x, Dt(x) exploits the attention feature maps a_Dt(x) = w_Dt * E_Dt(x) using the importance weight w_Dt on the encoded feature maps E_Dt(x) that is trained by η_Dt(x).
- Then the discriminator Dt(x) becomes equal to C_Dt(a_Dt(x)).

### Loss Function:

**Adversarial Loss**: An adversarial loss is employed to match the distribution of the translated images to the target image distribution.

**Cycle Loss**: To alleviate the mode collapse problem, we apply a cycle consistency constraint to the generator. Given an image x ∈ Xs , after the sequential translations of x from Xs to Xt and from Xt to Xs, the image should be successfully translated back to the original domain.

**Identity Loss**: To ensure that the color distributions of the input image and output image are similar, we apply an identity consistency constraint to the generator. Given an image x ∈ Xt, after the translation of x using Gs→t, the image should not change.

**CAM Loss**: By exploiting the information from the auxiliary classifiers ηs and η_Dt, given an image x ∈ {Xs, Xt}. Gs→t and Dt get to where they need to improve or what makes the most difference between two domains in the current state.

**Full Objective**: Finally, we jointly train the encoders, de-coders, discriminators, and auxiliary classifiers to optimize the final objective:

#### Resource

Written on
August
16th,
2019
by
Karthik
Feel free to share!