Vision Transformers: A Review — Part I

  • Part I — Introduction to Transformer & ViT
  • Part II & III — Key problems of ViT and its improvement

1. What is Transformer?

“Jane is a travel blogger and also a very talented guitarist.”

Figure 1. The architecture of the Transformer model (image from [1])

2. Vision Transformer

Figure 2. The architecture of ViT (image from [2])

3. Summary





Leading big data and AI-powered solution company

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Deep learning for time series forecasting framework updates

How to make SnapChat Lenses?

BA/MA Thesis: Single Shot Multi-Object Detection and 6D Pose Estimation

GSoC ’20 #7 [Week 11&12]

Generating MRI Images of Brain Tumors with GANs

(Reference) Custom Image Augmentation with Keras by Ceshine Lee — 라임오렌지파이와 일상

Intuitive Neural Networks

Predicting Churn with PySpark ML

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Leading big data and AI-powered solution company

More from Medium

Understanding representations of concepts in Visual Transformers by analyzing attention maps from…

RAFT: A Machine Learning Model for Estimating Optical Flow

SimCLR — STL10 Implementation

Contrastive Representation Learning — A Comprehensive Guide (part 2, scoring representations)