Why Transformers are Slowly Replacing CNNs in Computer Vision?

0

Why Transformers are Slowly Replacing CNNs in Computer Vision?

Attention Mechanism

Self attention module used for SAGAN

Self Attention

The goal of self-attention is to capture the interaction amongst all n entities. Self-attention is a weighted combination of all other word embeddings.

The Transformer Model

The encoder

The Decoder

Vision Transformer

Vision Transformer
Without Positional encoding added to the patches, Both these sequences looks same for the transformer.

ConViT

Conclusion

References

Don’t forget to give us your 👏 !

Loading

Leave a Reply