Target modules for applying PEFT / LoRA on different models

Large Language Model brings the evolution of the ability to understand more complex queries of human language. Parameter-efficient fine-tuning (PEFT) and Low-Rank Adaptation (LoRA) are one of the most powerful techniques for fine-tuning large language models (LLMs) efficiently. PEFT and LoRA provide significant efficiency improvements without requiring significant computational resources.

Parameter-Efficient Fine-Tuning

Parameter-Efficient Fine-Tuning (PEFT) techniques allow for the efficient adaptation of large pretrained models to a range of downstream applications by fine-tuning a small number of model parameters rather than the entire model. This reduces the computational and storage costs. This technique also makes it feasible to fine-tune large models even on limited hardware.

Low-Rank Adaptation (LoRA)

Low-Rank Adaptation (LoRA) is one of the most common lightweight training techniques for LLMs that significantly reduces the number of trainable parameters. It works by including a smaller number of new weights into the model, and only these are trained. Training with LoRA becomes significantly faster, more memory-efficient, and produces smaller model weights.

Solution 1:

Let's say that you load some model of your choice:

model = AutoModelForCausalLM.from_pretrained("some-model-checkpoint")

Then you can see available modules by printing out this model:

print(model)

You will get something like this (SalesForce/CodeGen25):

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(51200, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear(in_features=4096, out_features=4096, bias=False)
          (rotary_emb): LlamaRotaryEmbedding()
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear(in_features=11008, out_features=4096, bias=False)
          (up_proj): Linear(in_features=4096, out_features=11008, bias=False)
          (act_fn): SiLUActivation()
        )
        (input_layernorm): LlamaRMSNorm()
        (post_attention_layernorm): LlamaRMSNorm()
      )
    )
    (norm): LlamaRMSNorm()
  )
  (lm_head): Linear(in_features=4096, out_features=51200, bias=False)
)

Solution 2:

Here method to get all linear.

import bitsandbytes as bnb

def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            # model-specific
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

In the futur release you can use directly target_modules="all-linear" in your LoraConfig

Solution 3:

To solve this, getting a list of Lora compatible modules programmatically,

target_modules = 'all-linear',

which seems available in latest PEFT versions. However, that would raise an error when applying to google/gemma-2b model. (dropout layers were for some reason added to the target_modules, see later for the layers supported by LORA).

From documentation of the PEFT library:

only the following modules: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.

Creating this function for getting all Lora compatible modules from arbitrary models:

import torch
from transformers import Conv1D

def get_specific_layer_names(model):
    # Create a list to store the layer names
    layer_names = []
    
    # Recursively visit all modules and submodules
    for name, module in model.named_modules():
        # Check if the module is an instance of the specified layers
        if isinstance(module, (torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, Conv1D)):
            # model name parsing 

            layer_names.append('.'.join(name.split('.')[4:]).split('.')[0])
    
    return layer_names

list(set(get_specific_layer_names(model)))

Which yields on gemma-2B

[
 'down_proj',
 'o_proj',
 'k_proj',
 'q_proj',
 'gate_proj',
 'up_proj',
 'v_proj']

This list was valid for a target_modules selection

peft.__version__
'0.10.1.dev0'

transformers.__version__
'4.39.1'

By following the above steps, you can implement target modules for applying PEFT / LoRA on different models. These fine-tuning techniques are powerful techniques for adapting large pre-trained models to specific tasks with minimal computational resources. If you are working on computer vision, natural language processing, or sequential data tasks, applying PEFT and LoRA can significantly enhance your model's performance and efficiency.

Thank you for reading the article.

Recent Articles

Target modules for applying PEFT / LoRA on different models

Recommended Projects

Topic modeling using K-means clustering to group customer reviews

Automatic Eye Cataract Detection Using YOLOv8

Medical Image Segmentation With UNET

Voice Cloning Application Using RVC

Real-Time License Plate Detection Using YOLOv8 and OCR Model

Build A Book Recommender System With TF-IDF And Clustering(Python)

Optimizing Chunk Sizes for Efficient and Accurate Document Retrieval Using HyDE Evaluation