Placing a Model
RAG-DocBot uses a GGUF-format language model file to power the inference service. You must provide this file yourself.
Steps
-
Download a compatible GGUF model.
GGUF models are available on Hugging Face. Choose a model appropriate for your hardware (CPU or GPU) and available RAM.
-
Rename the file to
modelfile.gguf.The inference service expects the file at a specific path. Rename your downloaded file:
mv your-downloaded-model.gguf modelfile.gguf -
Move the file to the
models/directory.mv modelfile.gguf ./models/ -
Start (or restart) the services.
docker compose up -dThe inference service will load the model on startup.
Notes
- The model file must be named exactly
modelfile.ggufand placed in themodels/directory created by the installer. - Larger models (e.g. 13B+ parameter models) require more RAM. For CPU-only servers, quantised (Q4 or Q5) variants are recommended.
- If the inference service fails to start, check
docker compose logs inferencefor model loading errors.