Vision Transformers: A Review — Part I

  • Part I — Introduction to Transformer & ViT
  • Part II & III — Key problems of ViT and its improvement

1. What is Transformer?

“Jane is a travel blogger and also a very talented guitarist.”

Figure 1. The architecture of the Transformer model (image from [1])

2. Vision Transformer

Figure 2. The architecture of ViT (image from [2])

3. Summary





Leading big data and AI-powered solution company

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Creating a Recommender Engine Using Amazon Personalize

How does Machine Learning work?

深度學習筆記(24):Sequence to Sequence Model

Paper Review: “Instance-aware Image Colorization”

Is having dyslexia very similar to dropout algorithm in neural networks?

What’s the difference between gradient descent and stochastic gradient descent?

Creating Face Recognition Model Using CNN Architectures.

Dimensionality reduction

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Leading big data and AI-powered solution company

More from Medium

Review — Vision Transformer with Deformable Attention

SimCLR — STL10 Implementation

Paper Summary — MetaFormer is Actually What You Need for Vision