Prompt Enhancer Trainer + Inference UI

Base model

Dataset folder or HF repo

CSV/Parquet file

Short prompt column

Long prompt column

HF repo to upload LoRA

Batch size

DataLoader num_workers

LoRA rank

LoRA alpha

Epochs

Learning rate

Max training records

Logs (streaming)

Base model

LoRA HF repo

Short prompt

Generated long prompt

🧩 View Trainable Parameters in Your LoRA-Enhanced Model

Base Model

Trainable Parameters Info

🧩 Code Debug — Understand What's Happening Line by Line

🧰 Step-by-Step Breakdown

1️⃣ f"[INFO] Loading base model: {base_model}"
→ Logs which model is being loaded (e.g., google/gemma-2b-it)

2️⃣ AutoModelForCausalLM.from_pretrained(base_model)
→ Downloads the base Gemma model weights and tokenizer.

3️⃣ get_peft_model(model, config)
→ Wraps the model with LoRA and injects adapters into q_proj, k_proj, v_proj, etc.

4️⃣ Expected console output: [INFO] Loading base model: google/gemma-2b-it [INFO] Preparing dataset... [INFO] Injecting LoRA adapters... trainable params: 3.5M || all params: 270M || trainable%: 1.3%

5️⃣ trainer.train()
→ Starts training loop and shows live progress.

6️⃣ upload_file(...)
→ Uploads all model files to your chosen HF repo.

🔍 What “Adapter (90)” Means

When you initialize LoRA on Gemma, it finds 90 target layers such as:

q_proj, k_proj, v_proj
o_proj
gate_proj, up_proj, down_proj

Each layer gets small trainable matrices (A, B).
So:

Adapter (90) → 90 modules modified by LoRA.

To list them:

for name, module in model.named_modules():
    if "lora" in name.lower():
        print(name)

🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation

This project provides an end-to-end LoRA fine-tuning and inference system for language models like Gemma, built with Gradio, PEFT, and Accelerate.
It supports both training new LoRAs and generating text with existing ones — all in a single interface.

1️⃣ Imports Overview

Core libs: os, torch, gradio, numpy, pandas
Training libs: peft (LoraConfig, get_peft_model), accelerate (Accelerator)
Modeling: transformers (for Gemma base model)
Hub integration: huggingface_hub (for uploading adapters)
Spaces: spaces — for execution within Hugging Face Spaces

2️⃣ Dataset Loading

Uses a lightweight MediaTextDataset class to load:
- CSV / Parquet files
- or directly from a Hugging Face dataset repo
Expects two columns:
short_prompt → Input text
long_prompt → Target expanded text
Supports batching, missing-column checks, and configurable max record limits.

3️⃣ Model Loading & Preparation

Loads Gemma model and tokenizer via AutoModelForCausalLM and AutoTokenizer.
Automatically detects target modules (e.g. q_proj, v_proj) for LoRA injection.
Supports float16 or bfloat16 precision with Accelerator for optimal memory usage.

4️⃣ LoRA Training Logic

Core formula:
[ W_{eff} = W + lpha imes (B @ A) ]
Only A and B matrices are trainable; base model weights remain frozen.
Configurable parameters:
r (rank), alpha (scaling), epochs, lr, batch_size
Training logs stream live in the UI, showing step-by-step loss values.
After training, the adapter is saved locally and uploaded to Hugging Face Hub.

5️⃣ CPU Inference Mode

Runs entirely on CPU, no GPU required.
Loads base Gemma model + trained LoRA weights (PeftModel.from_pretrained).
Optionally merges LoRA with base model.
Expands the short prompt → long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).

6️⃣ LoRA Internals Explained

LoRA injects low-rank matrices (A, B) into attention Linear layers.
Example: [ Q_{new} = Q + lpha imes (B @ A) ]
Significantly reduces training cost:
- Memory: ~1–2% of full model
- Compute: trains faster with minimal GPU load
Scalable to large models like Gemma 3B / 4B with rank ≤ 16.

7️⃣ Gradio UI Structure

Train LoRA Tab:
Configure model, dataset, LoRA parameters, and upload target.
Press 🚀 Start Training to stream training logs live.
Inference (CPU) Tab:
Type a short prompt → Generates expanded long-form version via trained LoRA.
Code Explain Tab:
Detailed breakdown of logic + simulated console output below.

🧾 Example Log Simulation

print(f"[INFO] Loading base model: {base_model}")
# -> Loads Gemma base model (fp16) on CUDA
# [INFO] Base model google/gemma-3-4b-it loaded successfully
print(f"[INFO] Preparing dataset from: {dataset_path}")
# -> Loads dataset or CSV file
# [DATA] 980 samples loaded, columns: short_prompt, long_prompt
print("[INFO] Initializing LoRA configuration...")
# -> Creates LoraConfig(r=8, alpha=16, target_modules=['q_proj', 'v_proj'])
# [CONFIG] LoRA applied to 96 attention layers
print("[INFO] Starting training loop...")
# [TRAIN] Step 1 | Loss: 2.31
# [TRAIN] Step 50 | Loss: 1.42
# [TRAIN] Step 100 | Loss: 0.91
# [TRAIN] Epoch 1 complete (avg loss: 1.21)
print("[INFO] Saving LoRA adapter...")
# -> Saves safetensors and config locally
print(f"[UPLOAD] Pushing adapter to {hf_repo_id}")
# -> Uploads model to Hugging Face Hub
# [UPLOAD] adapter_model.safetensors (67.7 MB)
# [SUCCESS] LoRA uploaded successfully 🚀

🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation

This project provides an end-to-end LoRA fine-tuning and inference system for language models like Gemma, built with Gradio, PEFT, and Accelerate.
It supports both training new LoRAs and generating text with existing ones — all in a single interface.

1️⃣ Imports Overview

Core libs: os, torch, gradio, numpy, pandas
Training libs: peft (LoraConfig, get_peft_model), accelerate (Accelerator)
Modeling: transformers (for Gemma base model)
Hub integration: huggingface_hub (for uploading adapters)
Spaces: spaces — for execution within Hugging Face Spaces

2️⃣ Dataset Loading

Uses a lightweight MediaTextDataset class to load:
- CSV / Parquet files
- or directly from a Hugging Face dataset repo
Expects two columns:
short_prompt → Input text
long_prompt → Target expanded text
Supports batching, missing-column checks, and configurable max record limits.

3️⃣ Model Loading & Preparation

Loads Gemma model and tokenizer via AutoModelForCausalLM and AutoTokenizer.
Automatically detects target modules (e.g. q_proj, v_proj) for LoRA injection.
Supports float16 or bfloat16 precision with Accelerator for optimal memory usage.

4️⃣ LoRA Training Logic

Core formula:
[ W_{eff} = W + lpha imes (B @ A) ]
Only A and B matrices are trainable; base model weights remain frozen.
Configurable parameters:
r (rank), alpha (scaling), epochs, lr, batch_size
Training logs stream live in the UI, showing step-by-step loss values.
After training, the adapter is saved locally and uploaded to Hugging Face Hub.

5️⃣ CPU Inference Mode

Runs entirely on CPU, no GPU required.
Loads base Gemma model + trained LoRA weights (PeftModel.from_pretrained).
Optionally merges LoRA with base model.
Expands the short prompt → long descriptive text using standard generation parameters (e.g., top-p / top-k sampling).

6️⃣ 🧠 What LoRA Does (A & B Injection Explained)

When you fine-tune a large model (like Gemma or Llama), you’re adjusting billions of parameters in large weight matrices.
LoRA avoids this by injecting two small low-rank matrices (A and B) into selected layers instead of modifying the full weight.

Step 1: Regular Linear Layer

[ y = W x ]

Here, W is a huge matrix (e.g., 4096×4096).

Step 2: LoRA Layer Modification

Instead of updating W directly, LoRA adds a lightweight update:

[ W' = W + \Delta W ] [ \Delta W = B A ]

Where:

A ∈ ℝ^(r × d)
B ∈ ℝ^(d × r)
and r ≪ d (e.g., r=8 instead of 4096)

So you’re training only a tiny fraction of parameters.

Step 3: Where LoRA Gets Injected

It targets critical sub-layers such as:

q_proj, k_proj, v_proj → Query, Key, Value projections in attention
o_proj / out_proj → Output projection
gate_proj, up_proj, down_proj → Feed-forward layers

When you see:

Adapter (90)

That means 90 total layers (from these modules) were wrapped with LoRA adapters.

Step 4: Training Efficiency

Base weights (W) stay frozen
Only (A, B) are trainable
Compute and memory are drastically reduced

Metric	Full Fine-Tune	LoRA Fine-Tune
Trainable Params	2B+	~3M
GPU Memory	40GB+	<6GB
Time	10–20 hrs	<1 hr

Step 5: Inference Equation

At inference time: [ y = (W + lpha imes B A) x ]

Where α controls the strength of the adapter’s influence.

Step 6: Visualization

Base Layer: y = W * x

LoRA Layer: y = (W + B@A) * x ↑ ↑ | └── Small rank-A adapter (trainable) └──── Small rank-B adapter (trainable)

Step 7: Example in Code

from peft import LoraConfig, get_peft_model
from transformers import AutoModelForCausalLM

model = AutoModelForCausalLM.from_pretrained("google/gemma-2b-it")

config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05
)

model = get_peft_model(model, config)
model.print_trainable_parameters()
Expected output:
trainable params: 3,278,848 || all params: 2,040,000,000 || trainable%: 0.16%

✨ Prompt Enhancer Trainer + Inference Playground

🧩 View Trainable Parameters in Your LoRA-Enhanced Model

🧩 Code Debug — Understand What's Happening Line by Line

🧰 Step-by-Step Breakdown

🔍 What “Adapter (90)” Means

🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation

This project provides an end-to-end LoRA fine-tuning and inference system for language models like Gemma, built with Gradio, PEFT, and Accelerate.It supports both training new LoRAs and generating text with existing ones — all in a single interface.

1️⃣ Imports Overview

2️⃣ Dataset Loading

3️⃣ Model Loading & Preparation

4️⃣ LoRA Training Logic

5️⃣ CPU Inference Mode

6️⃣ LoRA Internals Explained

7️⃣ Gradio UI Structure

🧾 Example Log Simulation

🧩 Universal Dynamic LoRA Trainer & Inference — Code Explanation

1️⃣ Imports Overview

2️⃣ Dataset Loading

3️⃣ Model Loading & Preparation

4️⃣ LoRA Training Logic

5️⃣ CPU Inference Mode

6️⃣ 🧠 What LoRA Does (A & B Injection Explained)

Step 1: Regular Linear Layer

Step 2: LoRA Layer Modification

Step 3: Where LoRA Gets Injected

Step 4: Training Efficiency

Step 5: Inference Equation

Step 6: Visualization

Step 7: Example in Code

This project provides an end-to-end LoRA fine-tuning and inference system for language models like Gemma, built with Gradio, PEFT, and Accelerate.
It supports both training new LoRAs and generating text with existing ones — all in a single interface.