Coder Social home page Coder Social logo

visiontransformers's Introduction

History

Paper Method Domain Year
Attention is all you need Transformer Natural Language Processing 2017
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Vision Transformer Computer Vision 2020/2021

What are transformers in the field of NLP?

Transformers are an innovative kind of deep learning architecture used in Natural Language Processing. They stand out in handling sequences, like sentences, by employing self-attention mechanisms to grasp word relationships. This enables them to comprehend context without being limited by word order. Transformers consist of elements like multi-head attention, positional encoding, and feedforward networks. They're applied in models such as BERT for contextual word understanding, GPT for text creation, and T5 for diverse tasks. Transformers have transformed NLP and surpassed previous models by better grasping language context.

What is Embedding?

Before delving into the details of transformer architecture, it may be a good idea to establish a foundation on the Embedding. I found this video by Jay Allamar about Word2vec algorithm, and reading this article.

Transformers introduce a cutting-edge architecture comprising two main elements: the encoder and the decoder.

Encoder

The encoder handles input sequences, such as sentences. It resembles the comprehension component. It evaluates the importance of each word relative to the others. This process occurs in parallel multiple times, considering word positions. Following this it enhances the understanding of each individual word.

Decoder

The decoder generates output sequences, like translations, based on the encoder's comprehension. It takes into account its own words and the input sequence. With attention to word importance and positions, it constructs the final output.

For tasks like translation, these components collaborate. During the learning process, the model adjusts its parameters to align its output with the desired result.

To sum up, transformers excel at capturing word relationships and context, making them invaluable for various language tasks. They harness attention, parallel processing, and refinement steps to excel in this domain truly.

Transformer Architecture Components

Encoder

The encoder handles the input sequence, breaking it down into meaningful representations through a series of layers:

  • Self-Attention Mechanism: This core component allows the model to weigh the importance of each word relative to all others, capturing relationships and context regardless of position.

  • Multi-Head Attention: The encoder employs multiple parallel self-attention mechanisms, or "heads," capturing diverse relationships and nuances.

  • Positional Encoding: To account for word order, positional encoding is added to input embeddings, conveying word position information.

  • Feedforward Neural Network: After attention stages, a feedforward neural network refines word representations using captured context.

Decoder

The decoder generates the output sequence based on the processed input sequence:

  • Self-Attention Mechanism (Decoder-side): This mechanism enables the decoder to focus on its own generated output, avoiding undue attention to irrelevant input parts.

  • Multi-Head Attention (Encoder-Decoder Attention): The decoder aligns its output with relevant parts of the input by attending to the encoder's output, which is crucial for tasks like translation.

  • Positional Encoding: Similar to the encoder, the decoder employs positional encoding to understand word order in generated sequences.

  • Feedforward Neural Network: The decoder's feedforward network refines generated output using both input sequence and preceding words.

The transformer architecture excels at capturing context and dependencies in sequences, making it highly effective for various NLP tasks. The self-attention mechanisms, multi-head attention, and feedforward networks collectively empower transformers to understand relationships, context, and nuances in data.

visiontransformers's People

Contributors

farnoosh27 avatar

Stargazers

Arian Fotouhi avatar Ali Hejazizo avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.