>.
I build a transformer module in PyTorch from scratch in order to understand its architecture.
Transformer From Scratch
Re-implementing the transformer mechanism from scratch (practice purpose)
Based on the famous paper: Attention Is All You Need
Done:
- attention architecture
- transformer architecture
- position encoding
TODO:
- training code
- zero paddings for batched training
- initial embeddings
SIDE NOTE:
The rest is very similar to other deep learning models I have written in PyTorch so I will probably stop the project here.