Troubleshooting
Use this page for fast diagnosis of common setup, data, and runtime failures. All anchors below are referenced from guide pages.
Setup
Torch not installed
Symptoms: ModuleNotFoundError: torch, or scripts fail on import.
Fix:
Use the official selector if you need a specific CUDA build: https://pytorch.org/get-started/locally/
CUDA not detected
Symptoms: torch.cuda.is_available() returns False, or scripts print CPU fallback warnings.
Quick check:
Checks:
- NVIDIA driver installed and up to date
- Correct PyTorch build for your CUDA runtime
- Same Python environment used for install and execution
Data and Metadata
Meta path mismatch
Symptoms: generation/training behaves incorrectly or cannot load tokenizer metadata.
Expected mapping:
txtpipeline ->data/processed/meta.jsonbinpipeline ->data/meta.json
Binary shards not found
Symptoms: Binary shards not found during training.data_format = "bin".
Fix:
python scripts/data/prepare_data.py --dataset tinyshakespeare --tokenizer char --output-format bin --output-dir data/processed
Char vocab missing
Symptoms: Char tokenizer requires vocab in meta.json.
Fix:
python scripts/data/prepare_data.py --dataset tinyshakespeare --tokenizer char --output-format txt --output-dir data/processed
Then pass the matching --meta data/processed/meta.json to generation/export commands.
Runtime
Out of memory
Reduce in this order:
training.batch_sizemodel.block_sizetraining.gradient_accumulation_steps- Model size / preset complexity
FlashAttention not available
LabCore falls back automatically:
- FlashAttention (preferred, if available)
- PyTorch SDPA fallback
- Standard causal attention fallback
Windows path and policy issues
- Activate your venv before commands.
- Run commands from repository root.
- Quote paths when directories contain spaces.
- If PowerShell policy blocks scripts, use
python ...commands directly.
Windows: pytest PermissionError in %TEMP%
On Windows, pytest may fail with PermissionError when using the default temp directory. This can happen when another process (e.g., antivirus) holds a file handle. Workaround: run pytest with a local base temp directory:
This repository also sets --basetemp=.pytest_tmp by default in pyproject.toml.