Transformer From Scratch

Re-implementing the transformer mechanism from scratch (practice purpose)

Based on the famous paper: Attention Is All You Need

Done:

TODO:

SIDE NOTE:

The rest is very similar to other deep learning models I have written in PyTorch so I will probably stop the project here.