AWS Trainium & Inferentia documentation
BERT
BERT
Overview
BERT is a bidirectional transformer pretrained on unlabeled text to predict masked tokens in a sentence and to predict whether one sentence follows another. The main idea is that by randomly masking some tokens, the model can train on text to the left and right, giving it a more thorough understanding. BERT is also very versatile because its learned language representations can be adapted for other NLP tasks by fine-tuning an additional layer or head.
You can find all the original BERT checkpoints under the BERT collection.
Export to Neuron
To deploy 🤗 Transformers models on Neuron devices, you first need to compile the models and export them to a serialized format for inference. Below are two approaches to compile the model, you can choose the one that best suits your needs. Here we take the feature-extraction
as an example:
Option 1: CLI
You can export the model using the Optimum command-line interface as follows:
optimum-cli export neuron --model google-bert/bert-base-uncased --task feature-extraction --batch_size 1 --sequence_length 128 bert_feature_extraction_neuronx/
Execute optimum-cli export neuron --help
to display all command line options and their description.
Option 2: Python API
from optimum.neuron import NeuronModelForFeatureExtraction
input_shapes = {"batch_size": 1, "sequence_length": 128}
compiler_args = {"auto_cast": "matmul", "auto_cast_type": "bf16"}
neuron_model = NeuronModelForFeatureExtraction.from_pretrained(
"google-bert/bert-base-uncased",
export=True,
**input_shapes,
**compiler_args,
)
# Save locally
neuron_model.save_pretrained("bert_feature_extraction_neuronx")
# Upload to the HuggingFace Hub
neuron_model.push_to_hub(
"bert_feature_extraction_neuronx", repository_id="my-neuron-repo" # Replace with your HF Hub repo id
)
NeuronBertModel
class optimum.neuron.NeuronBertModel
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Bare Bert Model transformer outputting raw hidden-states without any specific head on top, used for the task “feature-extraction”.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronBertModel
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Example:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronBertModel
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-uncased-neuronx-bs1-sq128")
>>> model = NeuronBertModel.from_pretrained("optimum/bert-base-uncased-neuronx-bs1-sq128")
>>> inputs = tokenizer("Dear Evan Hansen is the winner of six Tony Awards.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> last_hidden_state = outputs.last_hidden_state
>>> list(last_hidden_state.shape)
[1, 13, 384]
NeuronBertForMaskedLM
class optimum.neuron.NeuronBertForMaskedLM
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Masked language Bert Model with a language modeling
head on top, for masked language modeling tasks on Neuron devices.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronBertForMaskedLM
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Example:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronBertForMaskedLM
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> model = NeuronBertForMaskedLM.from_pretrained("optimum/legal-bert-base-uncased-neuronx")
>>> inputs = tokenizer("This [MASK] Agreement is between General Motors and John Murray.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 13, 30522]
NeuronBertForSequenceClassification
class optimum.neuron.NeuronBertForSequenceClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronBertForSequenceClassification
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Example:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronBertForSequenceClassification
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-multilingual-uncased-sentiment-neuronx")
>>> model = NeuronBertForSequenceClassification.from_pretrained("optimum/bert-base-multilingual-uncased-sentiment-neuronx")
>>> inputs = tokenizer("Hamilton is considered to be the best musical of human history.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 2]
NeuronBertForTokenClassification
class optimum.neuron.NeuronBertForTokenClassification
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a token classification head on top (a linear layer on top of the hidden-states output) e.g. for Named-Entity-Recognition (NER) tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronBertForTokenClassification
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Example:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronBertForTokenClassification
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-NER-neuronx")
>>> model = NeuronBertForTokenClassification.from_pretrained("optimum/bert-base-NER-neuronx")
>>> inputs = tokenizer("Lin-Manuel Miranda is an American songwriter, actor, singer, filmmaker, and playwright.", return_tensors="pt")
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> list(logits.shape)
[1, 20, 9]
NeuronBertForQuestionAnswering
class optimum.neuron.NeuronBertForQuestionAnswering
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Bert with a span classification head on top for extractive question-answering tasks like SQuAD (a linear
layers on top of the hidden-states output to compute span start logits
and span end logits
).
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronBertForQuestionAnswering
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Example:
>>> import torch
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronBertForQuestionAnswering
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-cased-squad2-neuronx")
>>> model = NeuronBertForQuestionAnswering.from_pretrained("optimum/bert-base-cased-squad2-neuronx")
>>> question, text = "Are there wheelchair spaces in the theatres?", "Yes, we have reserved wheelchair spaces with a good view."
>>> inputs = tokenizer(question, text, return_tensors="pt")
>>> start_positions = torch.tensor([1])
>>> end_positions = torch.tensor([12])
>>> outputs = model(**inputs, start_positions=start_positions, end_positions=end_positions)
>>> start_scores = outputs.start_logits
>>> end_scores = outputs.end_logits
NeuronBertForMultipleChoice
class optimum.neuron.NeuronBertForMultipleChoice
< source >( model: ScriptModule config: PretrainedConfig model_save_dir: typing.Union[str, pathlib.Path, tempfile.TemporaryDirectory, NoneType] = None model_file_name: typing.Optional[str] = None preprocessors: typing.Optional[typing.List] = None neuron_config: typing.Optional[ForwardRef('NeuronDefaultConfig')] = None **kwargs )
Parameters
- config (
transformers.PretrainedConfig
) — PretrainedConfig is the Model configuration class with all the parameters of the model. Initializing with a config file does not load the weights associated with the model, only the configuration. Check out theoptimum.neuron.modeling.NeuronTracedModel.from_pretrained
method to load the model weights. - model (
torch.jit._script.ScriptModule
) — torch.jit._script.ScriptModule is the TorchScript module with embedded NEFF(Neuron Executable File Format) compiled by neuron(x) compiler.
Neuron Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.
This model inherits from ~neuron.modeling.NeuronTracedModel
. Check the superclass documentation for the generic methods the
library implements for all its model (such as downloading or saving)
forward
< source >( input_ids: Tensor attention_mask: Tensor token_type_ids: typing.Optional[torch.Tensor] = None **kwargs )
Parameters
- input_ids (
torch.Tensor
of shape(batch_size, num_choices, sequence_length)
) — Indices of input sequence tokens in the vocabulary. Indices can be obtained usingAutoTokenizer
. SeePreTrainedTokenizer.encode
andPreTrainedTokenizer.__call__
for details. What are input IDs? - attention_mask (
Union[torch.Tensor, None]
of shape(batch_size, num_choices, sequence_length)
, defaults toNone
) — Mask to avoid performing attention on padding token indices. Mask values selected in[0, 1]
:- 1 for tokens that are not masked,
- 0 for tokens that are masked. What are attention masks?
- token_type_ids (
Union[torch.Tensor, None]
of shape(batch_size, num_choices, sequence_length)
, defaults toNone
) — Segment token indices to indicate first and second portions of the inputs. Indices are selected in[0, 1]
:- 1 for tokens that are sentence A,
- 0 for tokens that are sentence B. What are token type IDs?
The NeuronBertForMultipleChoice
forward method, overrides the __call__
special method. Accepts only the inputs traced during the compilation step. Any additional inputs provided during inference will be ignored. To include extra inputs, recompile the model with those inputs specified.
Example:
>>> from transformers import AutoTokenizer
>>> from optimum.neuron import NeuronBertForMultipleChoice
>>> tokenizer = AutoTokenizer.from_pretrained("optimum/bert-base-cased-swag-neuronx")
>>> model = NeuronBertForMultipleChoice.from_pretrained("optimum/bert-base-cased-swag-neuronx")
>>> num_choices = 4
>>> first_sentence = ["Members of the procession walk down the street holding small horn brass instruments."] * num_choices
>>> second_sentence = [
... "A drum line passes by walking down the street playing their instruments.",
... "A drum line has heard approaching them.",
... "A drum line arrives and they're outside dancing and asleep.",
... "A drum line turns the lead singer watches the performance."
... ]
>>> inputs = tokenizer(first_sentence, second_sentence, truncation=True, padding=True)
# Unflatten the inputs values expanding it to the shape [batch_size, num_choices, seq_length]
>>> for k, v in inputs.items():
... inputs[k] = [v[i: i + num_choices] for i in range(0, len(v), num_choices)]
>>> inputs = dict(inputs.convert_to_tensors(tensor_type="pt"))
>>> outputs = model(**inputs)
>>> logits = outputs.logits
>>> logits.shape
[1, 4]