
I build a transformer module in PyTorch from scratch in order to understand its architecture.

Transformer From Scratch

Re-implementing the transformer mechanism from scratch (practice purpose)

Based on the famous paper: Attention Is All You Need


  1. attention architecture
  2. transformer architecture
  3. position encoding


  1. training code
  2. zero paddings for batched training
  3. initial embeddings


The rest is very similar to other deep learning models I have written in PyTorch so I will probably stop the project here.