Skip to content

LabCore LLM

This documentation is the operational guide for running LabCore end to end: data preparation, training, inference, and export. English pages in docs/ are the source of truth, and French pages in docs/fr/ are concise mirrors.

Python PyTorch License

Reference Preset Used Across Docs

All examples are standardized on this reference setup:

  • Dataset: tinyshakespeare
  • Tokenizer: char
  • CONFIG_EXAMPLE = configs/base/base.toml
  • Canonical training override: --max-iters 5000
  • CHECKPOINT = checkpoints/ckpt_last.pt
  • META_TXT = data/processed/meta.json
  • META_BIN = data/meta.json

Tip

Keep these values unchanged for your first full run. Most failures come from checkpoint and metadata path mismatches.

Quick Install

python -m pip install -e ".[torch,dev]"

For Hugging Face export and demo UI:

python -m pip install -e ".[torch,hf,demo]"

Quick Start Commands

python scripts/data/prepare_data.py --dataset tinyshakespeare --tokenizer char --output-format txt --output-dir data/processed
python train.py --config configs/base/base.toml --tokenizer char --max-iters 5000
python generate.py --checkpoint checkpoints/ckpt_last.pt --meta data/processed/meta.json --tokenizer char --prompt "To be"

Expected artifacts:

  • checkpoints/ckpt_last.pt
  • checkpoints/train_log.json
  • data/processed/meta.json

End-to-End Flow

scripts/data/prepare_data.py -> train.py -> generate.py/demo_gradio.py -> scripts/export/export_hf.py -> scripts/export/quantize_gguf.py

Documentation Map

Guides

Reference

Development