>.


I build a transformer module in PyTorch from scratch in order to understand its architecture.

Transformer From Scratch

Re-implementing the transformer mechanism from scratch (practice purpose)

Based on the famous paper: Attention Is All You Need

Done:

  1. attention architecture
  2. transformer architecture
  3. position encoding

TODO:

  1. training code
  2. zero paddings for batched training
  3. initial embeddings

SIDE NOTE:

The rest is very similar to other deep learning models I have written in PyTorch so I will probably stop the project here.