Skip to main content

Model Selection & Sizing


Memory Constraints

The Jetson Orin Nano Super Developer Kit has 8 GB of shared CPU+GPU RAM. This memory is shared between the operating system, all running Docker containers, and the inference engine (including the loaded model weights). You must choose a model that fits comfortably within this budget.

As a rule of thumb, leave at least 2 GB free for the OS and other services, giving you roughly 5–6 GB for the model and inference context.


ModelQuantizationApprox. SizeNotes
Llama-3.2-3BQ4_K_M~2.0 GBBest fit for Jetson; fast inference
Llama-3.1-8BQ4_K_M~4.7 GBFits with careful tuning; reduce N_CTX
Mistral-7BQ4_K_M~4.1 GBGood quality/size trade-off
Llama-3.2-3BQ8_0~3.5 GBHigher quality, still fits comfortably
Llama-3.1-8BQ2_K~2.9 GBReduced quality but fits easily

Use Q4_K_M quantization as a starting point — it offers a good balance of quality and memory usage.


Memory Budget Example

The table below shows an example memory breakdown for a 3B model with a 2048-token context:

ComponentApprox. Usage
Operating system + Docker~1.5 GB
Backend, UI, rag-db containers~0.5 GB
Model weights (Llama-3.2-3B Q4_K_M)~2.0 GB
KV cache (2048 tokens)~0.3 GB
Total~4.3 GB

This leaves roughly 3.7 GB of headroom, which is comfortable for the Jetson Orin Nano Super's 8 GB budget.


N_GPU_LAYERS Guidance

N_GPU_LAYERS controls how many transformer layers are offloaded to the Jetson GPU. Offloading more layers increases inference speed but uses more GPU memory (which is the same physical pool as CPU RAM on Jetson).

ValueEffect
0CPU-only inference; slowest, lowest memory pressure
1halfPartial GPU offload; balanced speed and memory
-1 (all layers)Full GPU offload; fastest inference, highest memory use

Start with -1 (full offload) for a 3B model. If you encounter out-of-memory errors, reduce N_GPU_LAYERS incrementally until the stack is stable. For 7B+ models on 8 GB, partial offload (e.g., 2030 layers) is often the best trade-off.

See the Configuration page for how to set N_GPU_LAYERS in your environment file.