Layers

Layers are the fundamental building blocks for NLP models. They can be used to assemble new layers, networks, or models.

DenseEinsum implements a feedforward network using tf.einsum. This layer contains the einsum op, the associated weight, and the logic required to generate the einsum expression for the given initialization parameters.
MultiHeadAttention implements an optionally masked attention between query, key, value tensors as described in "Attention Is All You Need". If from_tensor and to_tensor are the same, then this is self-attention.
CachedAttention implements an attention layer with cache used for auto-agressive decoding.
MultiChannelAttention implements an variant of multi-head attention which can be used to merge multiple streams for cross-attentions.
TalkingHeadsAttention implements the talking heads attention, as decribed in "Talking-Heads Attention".
Transformer implements an optionally masked transformer as described in "Attention Is All You Need".
TransformerDecoderLayer TransformerDecoderLayer is made up of self multi-head attention, cross multi-head attention and feedforward network.
ReZeroTransformer implements Transformer with ReZero described in "ReZero is All You Need: Fast Convergence at Large Depth".
OnDeviceEmbedding implements efficient embedding lookups designed for TPU-based models.
PositionalEmbedding creates a positional embedding as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
SelfAttentionMask creates a 3D attention mask from a 2D tensor mask.
MaskedSoftmax implements a softmax with an optional masking input. If no mask is provided to this layer, it performs a standard softmax; however, if a mask tensor is applied (which should be 1 in positions where the data should be allowed through, and 0 where the data should be masked), the output will have masked positions set to approximately zero.
MaskedLM implements a masked language model. It assumes the embedding table variable is passed to it.
ClassificationHead A pooling head over a sequence of embeddings, commonly used by classification tasks.
GatedFeedforward implements the gated linear layer feedforward as described in "GLU Variants Improve Transformer".