AWS Trainium & Inferentia documentation
Optimum Neuron Distributed
Optimum Neuron Distributed
The optimum.neuron.distributed
module provides a set of tools to perform distributed training and inference.
Parallelization
Selecting Model-Specific Parallelizer Classes
Each model that supports parallelization in optimum-neuron
has its own derived Parallelizer
class. The factory class ParallelizersManager
allows you to retrieve such model-specific Parallelizer
s easily.
Provides the list of supported model types for parallelization.
is_model_supported
< source >( model_type_or_model: typing.Union[str, transformers.modeling_utils.PreTrainedModel, optimum.neuron.utils.peft_utils.NeuronPeftModel] )
Returns a tuple of 3 booleans where:
- The first element indicates if tensor parallelism can be used for this model,
- The second element indicates if sequence parallelism can be used on top of tensor parallelism for this model,
- The third element indicates if pipeline parallelism can be used for this model.
parallelizer_for_model
< source >( model_type_or_model: typing.Union[str, transformers.modeling_utils.PreTrainedModel, optimum.neuron.utils.peft_utils.NeuronPeftModel] )
Returns the parallelizer class associated to the model.
Utils
Lazy Loading
Distributed training / inference is usually needed when the model is too big to fit in one device. Tools that allow for lazy loading of model weights and optimizer states are thus needed to avoid going out-of-memory before parallelization.
optimum.neuron.distributed.lazy_load_for_parallelism
< source >( tensor_parallel_size: int = 1 pipeline_parallel_size: int = 1 )
Context manager that makes the loading of a model lazy for model parallelism:
- Every
torch.nn.Linear
is put on thetorch.device("meta")
device, meaning that it takes no memory to instantiate. - Every
torch.nn.Embedding
is also put on thetorch.device("meta")
device. - No state dict is actually loaded, instead a weight map is created and attached to the model. For more
information, read the
optimum.neuron.distributed.utils.from_pretrained_for_mp
docstring.
If both tensor_parallel_size
and pipeline_parallel_size
are set to 1, no lazy loading is performed.
optimum.neuron.distributed.make_optimizer_constructor_lazy
< source >( optimizer_cls: typing.Type[ForwardRef('torch.optim.Optimizer')] )
Transforms an optimizer constructor (optimizer class) to make it lazy by not initializing the parameters. This makes the optimizer lightweight and usable to create a “real” optimizer once the model has been parallelized.