license: creativeml-openrail-m
BART (Bidirectional and Auto-Regressive Transformer) Architecture
BART’s primary task is used to generate clean semantically coherent text from corrupted text data but it can also be used for a variety of different NLP sub-tasks like language translation, question-answering tasks, text summarization, paraphrasing, etc.
As BART is an autoencoder model, it consists of an encoder model and a decoder model. For its encoder model, BART uses a bi-directional encoder that is used in BERT, and for its decoder mode, it uses an autoregressive decoder that forms the core aspect of a GPT -1 model.
An autoregressive decoder is a neural network architecture that takes the previous input tokens as well as the current token to predict the next token at every time step. It is important to remember that the input accepted by a decoder is an embedding created by its corresponding encoder network.
Both the encoder and decoder architecture is built by the combination of multiple blocks or layers where each block processes information in a specific way.
It consists of 3 primary blocks:
Multi-head Attention block
Addition and Normalization block
Feed-forward layers
Multi-head attention block
This is one of the most important blocks as in this layer multiple levels of masking( replacing random tokens in a sentence with the
Parallel #1 thread: Entire sentence is replaced by the
Parallel #2 thread: Multiple bi-gram tokens are replaced by the
Parallel #3+ thread: Arbitrary words within the sentence are replaced by the
This masking is done in parallel instead of sequentially to avoid accumulating previous step errors for the same input sentence. Addition and Normalization block
Different parameters within the multiple blocks contain values within different ranges, hence to add those values together, we scale the values of all the parameters into a single range using a monotonic function whose value converges to a constant value k as the input closes to infinity. This is performed so that uniform weight for all parameters is ensured while concatenating multiple parameters into a single one. Feed-forward Layers
The feed-forward layers compose the basic building block of any neural network and are composed of hidden layers containing a fixed number of neurons. These layers contain the process, and store information coming from the previous layers as weights and forward the processed/ updated information to the next layer. The feed-forward neural network layers are specially designed to move information in a sequential uni-directional manner.
BART Model for Text Auto Completion in NLP
BART stands for Bidirectional and Auto-Regressive Transformer. It is a denoising autoencoder that is a pre-trained sequence-to-sequence method, that uses masked language modeling for Natural Language Generation and Translation. It is developed by Lewis et al. in 2019. BART architecture is similar to an encoder-decoder network except that it uses a combination of BERT and GPT models. The BART models can be fine-tuned over small supervised datasets to create domain-specific tasks.
Denoising autoencoder
An autoencoder is a special type of neural network that learns to encode an input sentence into lower dimensional representations and decode the embedded representations back to the corresponding original input sentences. In a general case, when the input and output sentence of an autoencoder is the same, over a large number of iterations, the autoencoder network directly maps the input token to the output tokens, and the embedded representation that is usually learned between them becomes redundant. Therefore, we modify the input sentence by randomly deleting word tokens and replacing them with a special
Implementation
Implementing a pre-trained BART model for automatic text completion:
''' from transformers import BartForConditionalGeneration, BartTokenizer
bart_model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0) # takes a while to load tokenizer = BartTokenizer.from_pretrained("facebook/bart-large")
sent = "GeekforGeeks has a
tokenized_sent = tokenizer(sent, return_tensors='pt')
generated_encoded = bart_model.generate(tokenized_sent['input_ids'])
print(tokenizer.batch_decode(generated_encoded, skip_special_tokens=True)[0])
'''