LOTR: Face Landmark Localization Using Localization Transformer


Proposed methods

  1. A Transformer-based landmark localization network named Localization Transformer (LOTR).
  2. A modified loss function, namely smooth-Wing loss, which addresses gradient discontinuity and training stability issues in an existing loss function called the Wing loss [4].

LOTR: Localization Transformer

Fig. 1: The overview of Localization Transformer (LOTR) . It consists of three main modules: 1) a visual backbone, 2) a Transformer network, and 3) a landmark prediction head. This figure corresponds to Fig. 1 from the paper [1].

Smooth-Wing Loss

Fig. 2: Comparison of Wing loss and smooth-Wing loss (top) and their gradient (bottom) in the global view (left). For the Wing loss (blue dashed lines), the gradient changes abruptly at the points |x| = w (bottom-middle) and at x = 0 (bottom-right). On the other hand, the proposed smooth-Wing loss (orange solid lines) is designed to eliminate these gradient discontinuities. This figure corresponds to Fig. 2 from the paper [1].
Eq. 1: Wing loss where w is the threshold, ϵ is a parameter controlling the steepness of the logarithm part, and c = w−w ln(1 + w/ϵ).
Eq. 2: Smooth-Wing loss, a modification of Wing loss in Eq. 1 with threshold t ; 0 < t < w.


Dataset & pre-processing


Table 1: The architectures of the different LOTR models

Evaluation Metrics



Table 2: Comparison with the state-of-the-arts on the WFLW dataset.
Fig. 3: Sample images of the test set of the WFLW dataset with predicted landmarks from the LOTR-HR+ model. Each column displays the images with different subsets. Each row displays images with a different range of NMEs: < 0.05 (top), 0.05–0.06 (middle), and > 0.06 (bottom). This figure corresponds to Fig. 3 from the paper [1].


Table 3 : The evaluation results for different LOTR models on the JD-landmark test set; * and ** denote the first and second place entries.







Leading big data and AI-powered solution company https://www.sertiscorp.com/

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

BigGAN deep in Tensorflow 2

How we decide to lend.

Faster Inference for NLP Pipeline’s using Hugging Face Transformers and ONNX Runtime

Visual Search for Image Retrieval

Top 10 Time Series Machine Learning Model and Tools for Stock Market Prediction

Machine learning is Fun!

Get Better fastai Tabular Model with Optuna

Machine Learning for Beginners-Quarantine Edition(with examples)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Leading big data and AI-powered solution company https://www.sertiscorp.com/

More from Medium

DETR and Efficient DETR

A Milestone in Object Detection with Transformers

4 Parameters to Consider When Choosing Hardware for Deep Learning Inference

The Kinetics Dataset: Train and Evaluate Video Classification Models