PEFT documentation

C3A: Parameter-Efficient Fine-Tuning via Circular Convolution

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

C3A: Parameter-Efficient Fine-Tuning via Circular Convolution

C3A is a parameter-efficient fine-tuning technique that leverages Circular Convolution to achieve high rank adaptation within reasonable resource limits.

Note that you should use a much larger learning rate (LR) for C3A than for other methods. For example, a LR of 1e-1 for C3A is a good starting point. Besides, a much smaller weight decay should be used. You can refer to the method_comparison folder for more details.

For the block_size, it affects tunable parameters and performance. To start with, you can choose a $\mathrm{gcd}(d_1,d_2)$ near $\frac{\sqrt{d_1\times d_2}}{r}$, where $r$ is the rank for LoRA you would use for this task.

C3A currently has the following constraints:

  • Only nn.Linear layers are supported.
  • Quantized layers are not supported.
  • The block size should be a common divisor of both the input and output sizes of target layers.

If these constraints don’t work for your use case, consider other methods instead.

The abstract from the paper is:

Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models, leveraging low-rank matrices $\mathbf{A}$ and $\mathbf{B}$ to represent weight changes (i.e., $\Delta \mathbf{W} = \mathbf{B} \mathbf{A}$). This method reduces trainable parameters and mitigates heavy memory consumption associated with full delta matrices by sequentially multiplying $\mathbf{A}$ and $\mathbf{B}$ with the activation. Despite its success, the intrinsic low-rank characteristic may limit its performance. Although several variants have been proposed to address this issue, they often overlook the crucial computational and memory efficiency brought by LoRA. In this paper, we propose Circular Convolution Adaptation (C3A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization. Extensive experiments demonstrate that C3A consistently outperforms LoRA and its variants across various fine-tuning tasks.

C3AConfig

class peft.C3AConfig

< >

( task_type: typing.Union[str, peft.utils.peft_types.TaskType, NoneType] = None peft_type: typing.Union[str, peft.utils.peft_types.PeftType, NoneType] = None auto_mapping: typing.Optional[dict] = None base_model_name_or_path: typing.Optional[str] = None revision: typing.Optional[str] = None inference_mode: bool = False block_size: int = 256 target_modules: Optional[Union[list[str], str]] = None bias: str = 'none' modules_to_save: Optional[list[str]] = None layers_to_transform: Optional[Union[list[int], int]] = None layers_pattern: Optional[Union[list[str], str]] = None block_size_pattern: Optional[dict] = <factory> init_weights: Optional[Union[bool, Literal['gaussian', 'kaiming_uniform', 'xavier_uniform']]] = 'xavier_uniform' )

Parameters

  • block_size (int) — block size for C3A, must be divisible by both the input size and the output size of the target layer. If you have no idea what block_size you should use, set it to the greatest common divisor of all input & output sizes of your target layers. Increasing this would result in less parameters.
  • target_modules (Union[list[str],str]) — The names of the modules to apply C3A to.
  • bias (str) — Bias type for C3A. Can be ‘none’, ‘all’ or ‘c3a_only’. If ‘all’ or ‘c3a_only’, the corresponding biases will be updated during training. Be aware that this means that, even when disabling the adapters, the model will not produce the same output as the base model would have without adaptation.
  • modules_to_save (list[str]) —list of modules apart from C3A layers to be set as trainable and saved in the final checkpoint.
  • layers_to_transform (Union[list[int],int]) — The layer indexes to transform, if this argument is specified, it will apply C3A on the layer indexes that are specified in this list. If a single integer is passed, it will apply C3A on the layer at this index.
  • layers_pattern (str) — The layer pattern name, used only if layers_to_transform is different from None and if the layer pattern is not in the common layers pattern.
  • block_size_pattern (dict) — The mapping from layer names or regexp expression to block_size which are different from the default specified. For example, {"model.decoder.layers.0.encoder_attn.k_proj": 1280}
  • init_weights (Union[bool, Literal["gaussian", "kaiming_uniform", "xavier_uniform"]]) — The initialization of the C3A weights. Set this to False if the weights should be initialized to a commonly used distribution. Set this to True if the weights should be initialized to zeros.

This is the configuration class to store the configuration of a C3AModel.

C3AModel

class peft.C3AModel

< >

( model config adapter_name low_cpu_mem_usage: bool = False ) torch.nn.Module

Parameters

  • model (torch.nn.Module) — The model to be adapted.
  • config (C3AConfig) — The configuration of the C3A model.
  • adapter_name (str) — The name of the adapter, defaults to "default".

Returns

torch.nn.Module

The C3A model.

Creates C3A model from a pretrained transformers model.

The method is described in detail in [TODO].

Attributes:

disable_adapter_layers

< >

( )

Disable all adapters.

When disabling all adapters, the model output corresponds to the output of the base model.

enable_adapter_layers

< >

( )

Enable all adapters.

Call this if you have previously disabled all adapters and want to re-enable them.

merge_and_unload

< >

( progressbar: bool = False safe_merge: bool = False adapter_names: Optional[list[str]] = None )

Parameters

  • progressbar (bool) — whether to show a progressbar indicating the unload and merge process
  • safe_merge (bool) — whether to activate the safe merging check to check if there is any potential Nan in the adapter weights
  • adapter_names (list[str], optional) — The list of adapter names that should be merged. If None, all active adapters will be merged. Defaults to None.

This method merges the C3A layers into the base model. This is needed if someone wants to use the base model as a standalone model.

set_adapter

< >

( adapter_name: str | list[str] )

Parameters

  • adapter_name (str or list[str]) — Name of the adapter(s) to be activated.

Set the active adapter(s).

unload

< >

( )

Gets back the base model by removing all the C3A modules without merging. This gives back the original base model.

< > Update on GitHub