LOTR: Face Landmark Localization Using Localization Transformer


Proposed methods

LOTR: Localization Transformer

Fig. 1: The overview of Localization Transformer (LOTR) . It consists of three main modules: 1) a visual backbone, 2) a Transformer network, and 3) a landmark prediction head. This figure corresponds to Fig. 1 from the paper [1].

Smooth-Wing Loss

Fig. 2: Comparison of Wing loss and smooth-Wing loss (top) and their gradient (bottom) in the global view (left). For the Wing loss (blue dashed lines), the gradient changes abruptly at the points |x| = w (bottom-middle) and at x = 0 (bottom-right). On the other hand, the proposed smooth-Wing loss (orange solid lines) is designed to eliminate these gradient discontinuities. This figure corresponds to Fig. 2 from the paper [1].
Eq. 1: Wing loss where w is the threshold, ϵ is a parameter controlling the steepness of the logarithm part, and c = w−w ln(1 + w/ϵ).
Eq. 2: Smooth-Wing loss, a modification of Wing loss in Eq. 1 with threshold t ; 0 < t < w.


Dataset & pre-processing


Table 1: The architectures of the different LOTR models

Evaluation Metrics



Table 2: Comparison with the state-of-the-arts on the WFLW dataset.
Fig. 3: Sample images of the test set of the WFLW dataset with predicted landmarks from the LOTR-HR+ model. Each column displays the images with different subsets. Each row displays images with a different range of NMEs: < 0.05 (top), 0.05–0.06 (middle), and > 0.06 (bottom). This figure corresponds to Fig. 3 from the paper [1].


Table 3 : The evaluation results for different LOTR models on the JD-landmark test set; * and ** denote the first and second place entries.






Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store