- how to use a custom embedding model locally on Langchain?
- [Solved] Cannot import name 'LangchainEmbedding' from 'llama_index'
- Langchain, Ollama, and Llama 3 prompt and response
- Understanding Tabular Machine Learning: Key Benchmarks & Advances
- How to load a huggingface pretrained transformer model directly to GPU?
- [Solved] TypeError when chaining Runnables in LangChain: Expected a Runnable, callable or dict
- How to Disable Safety Settings in Gemini Vision Pro Model Using API?
- [Solved] Filter langchain vector database using as_retriever search_kwargs parameter
- [Solved] ModuleNotFoundError: No module named 'llama_index.graph_stores'
- Best AI Text Generators for High Quality Content Writing
- Tensorflow Error on Macbook M1 Pro - NotFoundError: Graph execution error
- How does GPT-like transformers utilize only the decoder to do sequence generation?
- How to set all tensors to cuda device?
- How should I use torch.compile properly?
- How do I check if PyTorch is using the GPU?
- WARNING:tensorflow:Using a while_loop for converting cause there is no registered converter for this op
- How to use OneCycleLR?
- Error in Python script "Expected 2D array, got 1D array instead:"?
- How to save model in .pb format and then load it for inference in Tensorflow?
- Top 6 AI Logo Generator Up Until Now- Smarter Than Midjourney
Target modules for applying PEFT / LoRA on different models
Large Language Model brings the evolution of the ability to understand more complex queries of human language. Parameter-efficient fine-tuning (PEFT) and Low-Rank Adaptation (LoRA) are one of the most powerful techniques for fine-tuning large language models (LLMs) efficiently. PEFT and LoRA provide significant efficiency improvements without requiring significant computational resources.
Parameter-Efficient Fine-Tuning
Parameter-Efficient Fine-Tuning (PEFT) techniques allow for the efficient adaptation of large pretrained models to a range of downstream applications by fine-tuning a small number of model parameters rather than the entire model. This reduces the computational and storage costs. This technique also makes it feasible to fine-tune large models even on limited hardware.
Low-Rank Adaptation (LoRA)
Low-Rank Adaptation (LoRA) is one of the most common lightweight training techniques for LLMs that significantly reduces the number of trainable parameters. It works by including a smaller number of new weights into the model, and only these are trained. Training with LoRA becomes significantly faster, more memory-efficient, and produces smaller model weights.
Solution 1:
Let's say that you load some model of your choice:
model = AutoModelForCausalLM.from_pretrained("some-model-checkpoint")
Then you can see available modules by printing out this model:
print(model)
You will get something like this (SalesForce/CodeGen25):
LlamaForCausalLM(
(model): LlamaModel(
(embed_tokens): Embedding(51200, 4096, padding_idx=0)
(layers): ModuleList(
(0-31): 32 x LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=4096, out_features=4096, bias=False)
(k_proj): Linear(in_features=4096, out_features=4096, bias=False)
(v_proj): Linear(in_features=4096, out_features=4096, bias=False)
(o_proj): Linear(in_features=4096, out_features=4096, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=4096, out_features=11008, bias=False)
(down_proj): Linear(in_features=11008, out_features=4096, bias=False)
(up_proj): Linear(in_features=4096, out_features=11008, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
)
)
(norm): LlamaRMSNorm()
)
(lm_head): Linear(in_features=4096, out_features=51200, bias=False)
)
Solution 2:
Here method to get all linear.
import bitsandbytes as bnb
def find_all_linear_names(model):
lora_module_names = set()
for name, module in model.named_modules():
if isinstance(module, bnb.nn.Linear4bit):
names = name.split(".")
# model-specific
lora_module_names.add(names[0] if len(names) == 1 else names[-1])
if "lm_head" in lora_module_names: # needed for 16-bit
lora_module_names.remove("lm_head")
return list(lora_module_names)
In the futur release you can use directly target_modules="all-linear"
in your LoraConfig
Solution 3:
To solve this, getting a list of Lora compatible modules programmatically,
target_modules = 'all-linear',
which seems available in latest PEFT versions.
However, that would raise an error when applying to google/gemma-2b
model.
(dropout layers were for some reason added to the target_modules
, see later for the layers supported by LORA).
From documentation of the PEFT library:
only the following modules: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.
Creating this function for getting all Lora compatible modules from arbitrary models:
import torch
from transformers import Conv1D
def get_specific_layer_names(model):
# Create a list to store the layer names
layer_names = []
# Recursively visit all modules and submodules
for name, module in model.named_modules():
# Check if the module is an instance of the specified layers
if isinstance(module, (torch.nn.Linear, torch.nn.Embedding, torch.nn.Conv2d, Conv1D)):
# model name parsing
layer_names.append('.'.join(name.split('.')[4:]).split('.')[0])
return layer_names
list(set(get_specific_layer_names(model)))
Which yields on gemma-2B
[
'down_proj',
'o_proj',
'k_proj',
'q_proj',
'gate_proj',
'up_proj',
'v_proj']
This list was valid for a target_modules selection
peft.__version__
'0.10.1.dev0'
transformers.__version__
'4.39.1'
By following the above steps, you can implement target modules for applying PEFT / LoRA on different models. These fine-tuning techniques are powerful techniques for adapting large pre-trained models to specific tasks with minimal computational resources. If you are working on computer vision, natural language processing, or sequential data tasks, applying PEFT and LoRA can significantly enhance your model's performance and efficiency.
Thank you for reading the article.