Skip to content

Developer Guide

This page is for contributors working on the codebase itself.

Repository Map

src/labcore_llm/
  config/      # TOML loader and defaults
  data/        # dataset abstractions
  model/       # GPT model implementation
  tokenizer/   # char + BPE tokenizers
  trainer/     # training loop, scheduler, checkpointing

scripts/       # data prep, export, quantize, fine-tune helpers
  data/        # data preparation
  export/      # HF export + GGUF quantization
  finetune/    # LoRA instruction fine-tuning
  benchmark/   # inference benchmark
  demo/        # pointer to root demo entrypoint

configs/       # TOML presets and base profiles
  base/        # base reference presets
  presets/     # model-size/tokenizer preset families
  hardware/    # hardware-oriented variants
  experimental/# work-in-progress presets

tests/
  unit/        # isolated component tests
  integration/ # end-to-end workflow tests

Local Dev Environment

python -m venv .venv
## PowerShell
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
python -m pip install -e ".[torch,dev]"

Validation Commands

Run tests:

python -m pytest -q

Run lint rules aligned with CI:

ruff check src scripts tests train.py generate.py demo_gradio.py --select E9,F63,F7,F82

CI Workflows

  • .github/workflows/ci.yml: lint + tests
  • .github/workflows/docs.yml: MkDocs build and deploy

Contribution Quality Bar

  • Keep commits focused and atomic.
  • Update docs for behavior/CLI changes.
  • Add tests for bug fixes and new logic.
  • Do not commit large data/model artifacts.

Packaging Notes

  • Project uses src/ layout with setuptools.
  • Optional dependency groups are defined in pyproject.toml.
  • Entry scripts are plain Python files, not console-script wrappers.