LabCore LLM
This documentation is the operational guide for running LabCore end to end: data preparation, training, inference, and export.
English pages in docs/ are the source of truth, and French pages in docs/fr/ are concise mirrors.
Reference Preset Used Across Docs
All examples are standardized on this reference setup:
- Dataset:
tinyshakespeare - Tokenizer:
char CONFIG_EXAMPLE = configs/base/base.toml- Canonical training override:
--max-iters 5000 CHECKPOINT = checkpoints/ckpt_last.ptMETA_TXT = data/processed/meta.jsonMETA_BIN = data/meta.json
Tip
Keep these values unchanged for your first full run. Most failures come from checkpoint and metadata path mismatches.
Quick Install
For Hugging Face export and demo UI:
Quick Start Commands
python scripts/data/prepare_data.py --dataset tinyshakespeare --tokenizer char --output-format txt --output-dir data/processed
python train.py --config configs/base/base.toml --tokenizer char --max-iters 5000
python generate.py --checkpoint checkpoints/ckpt_last.pt --meta data/processed/meta.json --tokenizer char --prompt "To be"
Expected artifacts:
checkpoints/ckpt_last.ptcheckpoints/train_log.jsondata/processed/meta.json
End-to-End Flow
scripts/data/prepare_data.py -> train.py -> generate.py/demo_gradio.py -> scripts/export/export_hf.py -> scripts/export/quantize_gguf.py
Documentation Map
Guides
- Getting Started: environment setup and a reproducible first run.
- Data Pipeline: build
txtorbindatasets and metadata. - Training: run training, checkpointing, and format selection.
- Inference & Demo: CLI generation and Gradio demo.
- Fine-Tuning: LoRA instruction tuning workflow.
- Export & Deployment: HF export and GGUF conversion.
Reference
- Configuration: complete TOML key reference.
- Operations: artifacts, release flow, and operational checks.
- Troubleshooting: anchored fixes for common failures.
- Benchmarks: benchmark template and reporting method.
Development
- Developer Guide: contributor workflow and validation commands.