β¨ Prompt Enhancer Trainer + Inference Playground
Train, test, and debug your LoRA-enhanced Gemma model easily.Use ZerpGPU to Train else CPU will work for other stuff
π§© View Trainable Parameters in Your LoRA-Enhanced Model
π§© Code Debug β Understand What's Happening Line by Line
π§° Step-by-Step Breakdown
1οΈβ£ f"[INFO] Loading base model: {base_model}"
β Logs which model is being loaded (e.g., google/gemma-2b-it)
2οΈβ£ AutoModelForCausalLM.from_pretrained(base_model)
β Downloads the base Gemma model weights and tokenizer.
3οΈβ£ get_peft_model(model, config)
β Wraps the model with LoRA and injects adapters into q_proj, k_proj, v_proj, etc.
4οΈβ£ Expected console output: [INFO] Loading base model: google/gemma-2b-it [INFO] Preparing dataset... [INFO] Injecting LoRA adapters... trainable params: 3.5M || all params: 270M || trainable%: 1.3%
5οΈβ£ trainer.train()
β Starts training loop and shows live progress.
6οΈβ£ upload_file(...)
β Uploads all model files to your chosen HF repo.
π What βAdapter (90)β Means
When you initialize LoRA on Gemma, it finds 90 target layers such as:
q_proj,k_proj,v_projo_projgate_proj,up_proj,down_proj
Each layer gets small trainable matrices (A, B).
So:
Adapter (90) β 90 modules modified by LoRA.
To list them:
for name, module in model.named_modules():
if "lora" in name.lower():
print(name)
π§© Universal Dynamic LoRA Trainer & Inference β Code Explanation
This project provides an end-to-end LoRA fine-tuning and inference system for language models like Gemma, built with Gradio, PEFT, and Accelerate.
It supports both training new LoRAs and generating text with existing ones β all in a single interface.
1οΈβ£ Imports Overview
- Core libs:
os,torch,gradio,numpy,pandas - Training libs:
peft(LoraConfig,get_peft_model),accelerate(Accelerator) - Modeling:
transformers(for Gemma base model) - Hub integration:
huggingface_hub(for uploading adapters) - Spaces:
spacesβ for execution within Hugging Face Spaces
2οΈβ£ Dataset Loading
- Uses a lightweight MediaTextDataset class to load:
- CSV / Parquet files
- or directly from a Hugging Face dataset repo
- Expects two columns:
short_promptβ Input textlong_promptβ Target expanded text - Supports batching, missing-column checks, and configurable max record limits.
3οΈβ£ Model Loading & Preparation
- Loads Gemma model and tokenizer via
AutoModelForCausalLMandAutoTokenizer. - Automatically detects target modules (e.g.
q_proj,v_proj) for LoRA injection. - Supports
float16orbfloat16precision withAcceleratorfor optimal memory usage.
4οΈβ£ LoRA Training Logic
- Core formula:
[ W_{eff} = W + lpha imes (B @ A) ] - Only A and B matrices are trainable; base model weights remain frozen.
- Configurable parameters:
r(rank),alpha(scaling),epochs,lr,batch_size - Training logs stream live in the UI, showing step-by-step loss values.
- After training, the adapter is saved locally and uploaded to Hugging Face Hub.
5οΈβ£ CPU Inference Mode
- Runs entirely on CPU, no GPU required.
- Loads base Gemma model + trained LoRA weights (
PeftModel.from_pretrained). - Optionally merges LoRA with base model.
- Expands the short prompt β long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).
6οΈβ£ LoRA Internals Explained
- LoRA injects low-rank matrices (A, B) into attention Linear layers.
- Example: [ Q_{new} = Q + lpha imes (B @ A) ]
- Significantly reduces training cost:
- Memory: ~1β2% of full model
- Compute: trains faster with minimal GPU load
- Scalable to large models like Gemma 3B / 4B with rank β€ 16.
7οΈβ£ Gradio UI Structure
- Train LoRA Tab:
Configure model, dataset, LoRA parameters, and upload target.
Press π Start Training to stream training logs live. - Inference (CPU) Tab:
Type a short prompt β Generates expanded long-form version via trained LoRA. - Code Explain Tab:
Detailed breakdown of logic + simulated console output below.
π§Ύ Example Log Simulation
print(f"[INFO] Loading base model: {base_model}")
# -> Loads Gemma base model (fp16) on CUDA
# [INFO] Base model google/gemma-3-4b-it loaded successfully
print(f"[INFO] Preparing dataset from: {dataset_path}")
# -> Loads dataset or CSV file
# [DATA] 980 samples loaded, columns: short_prompt, long_prompt
print("[INFO] Initializing LoRA configuration...")
# -> Creates LoraConfig(r=8, alpha=16, target_modules=['q_proj', 'v_proj'])
# [CONFIG] LoRA applied to 96 attention layers
print("[INFO] Starting training loop...")
# [TRAIN] Step 1 | Loss: 2.31
# [TRAIN] Step 50 | Loss: 1.42
# [TRAIN] Step 100 | Loss: 0.91
# [TRAIN] Epoch 1 complete (avg loss: 1.21)
print("[INFO] Saving LoRA adapter...")
# -> Saves safetensors and config locally
print(f"[UPLOAD] Pushing adapter to {hf_repo_id}")
# -> Uploads model to Hugging Face Hub
# [UPLOAD] adapter_model.safetensors (67.7 MB)
# [SUCCESS] LoRA uploaded successfully π
π§© Universal Dynamic LoRA Trainer & Inference β Code Explanation
This project provides an end-to-end LoRA fine-tuning and inference system for language models like Gemma, built with Gradio, PEFT, and Accelerate.
It supports both training new LoRAs and generating text with existing ones β all in a single interface.
1οΈβ£ Imports Overview
- Core libs:
os,torch,gradio,numpy,pandas - Training libs:
peft(LoraConfig,get_peft_model),accelerate(Accelerator) - Modeling:
transformers(for Gemma base model) - Hub integration:
huggingface_hub(for uploading adapters) - Spaces:
spacesβ for execution within Hugging Face Spaces
2οΈβ£ Dataset Loading
- Uses a lightweight MediaTextDataset class to load:
- CSV / Parquet files
- or directly from a Hugging Face dataset repo
- Expects two columns:
short_promptβ Input textlong_promptβ Target expanded text - Supports batching, missing-column checks, and configurable max record limits.
3οΈβ£ Model Loading & Preparation
- Loads Gemma model and tokenizer via
AutoModelForCausalLMandAutoTokenizer. - Automatically detects target modules (e.g.
q_proj,v_proj) for LoRA injection. - Supports
float16orbfloat16precision withAcceleratorfor optimal memory usage.
4οΈβ£ LoRA Training Logic
- Core formula:
[ W_{eff} = W + lpha imes (B @ A) ] - Only A and B matrices are trainable; base model weights remain frozen.
- Configurable parameters:
r(rank),alpha(scaling),epochs,lr,batch_size - Training logs stream live in the UI, showing step-by-step loss values.
- After training, the adapter is saved locally and uploaded to Hugging Face Hub.
5οΈβ£ CPU Inference Mode
- Runs entirely on CPU, no GPU required.
- Loads base Gemma model + trained LoRA weights (
PeftModel.from_pretrained). - Optionally merges LoRA with base model.
- Expands the short prompt β long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).
6οΈβ£ π§ What LoRA Does (A & B Injection Explained)
When you fine-tune a large model (like Gemma or Llama), youβre adjusting billions of parameters in large weight matrices.
LoRA avoids this by injecting two small low-rank matrices (A and B) into selected layers instead of modifying the full weight.
Step 1: Regular Linear Layer
[ y = W x ]
Here, W is a huge matrix (e.g., 4096Γ4096).
Step 2: LoRA Layer Modification
Instead of updating W directly, LoRA adds a lightweight update:
[ W' = W + \Delta W ] [ \Delta W = B A ]
Where:
- A β β^(r Γ d)
- B β β^(d Γ r)
- and r βͺ d (e.g., r=8 instead of 4096)
So youβre training only a tiny fraction of parameters.
Step 3: Where LoRA Gets Injected
It targets critical sub-layers such as:
- q_proj, k_proj, v_proj β Query, Key, Value projections in attention
- o_proj / out_proj β Output projection
- gate_proj, up_proj, down_proj β Feed-forward layers
When you see:
Adapter (90)
That means 90 total layers (from these modules) were wrapped with LoRA adapters.
Step 4: Training Efficiency
- Base weights (
W) stay frozen - Only
(A, B)are trainable - Compute and memory are drastically reduced
| Metric | Full Fine-Tune | LoRA Fine-Tune |
|---|---|---|
| Trainable Params | 2B+ | ~3M |
| GPU Memory | 40GB+ | <6GB |
| Time | 10β20 hrs | <1 hr |
Step 5: Inference Equation
At inference time: [ y = (W + lpha imes B A) x ]
Where Ξ± controls the strength of the adapterβs influence.
Step 6: Visualization
Base Layer: y = W * x
LoRA Layer: y = (W + B@A) * x β β | βββ Small rank-A adapter (trainable) βββββ Small rank-B adapter (trainable)
Step 7: Example in Code
from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")
config = LoraConfig(
r=8,
lora_alpha=16,
target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
lora_dropout=0.05
)
model = get_peft_model(model, config)
model.print_trainable_parameters()
Expected output:
trainable params: 3,278,848 || all params: 2,040,000,000 || trainable%: 0.16%