Changelog
All notable changes to RAG-DocBot are documented here.
[1.8.0]
Added
- Federated login (OIDC / SSO) — multi-provider OpenID Connect login with Authorization Code + PKCE (S256). Pre-built support for Microsoft Entra ID, Google Workspace, Keycloak, and any standards-compliant OIDC IdP. Provider configuration is managed via the admin UI (
oidc_providersPostgres table); env-var-based config (OIDC_PROVIDERS/OIDC_<NAME>_*) remains as a fallback when the table is empty.client_secretis encrypted at rest; asecret_set: boolflag is returned by the API instead of the secret value. JIT user provisioning on first login; group sync from IdP claims;OIDC_<NAME>_ADMIN_GROUPSauto-promotes users to admin. See the SSO / OIDC guide. - TOTP MFA (Two-Factor Authentication) — time-based one-time passwords as a second factor for any account. Enrollment returns a QR SVG plus 10 single-use recovery codes. TOTP secrets encrypted at rest with AES-256-GCM. See the MFA / TOTP guide.
- Groups & Resource ACL (Enterprise) — per-connector and per-integration access control enforced at retrieval time via Qdrant payload filters. Built-in
everyonegroup seeded on first boot. Admins can define groups, assign users, and set ACL on connectors/integrations. Admin role bypasses all ACL checks. See the Groups & ACL guide. - TLS / HTTPS termination in bundled nginx — opt-in via
TLS_ENABLED=1. Modern cipher suite (TLSv1.2 + 1.3), configurable HSTS, OCSP stapling (toggleable for air-gapped environments). HTTP:80redirects to HTTPS and serves Let's Encrypt HTTP-01 challenges. Helper scripts provided for self-signed, internal CA, and Let's Encrypt deployments. See the TLS / HTTPS guide. - Runtime-tunable system settings — new
system_settingsPostgres table replaces several previously env-only knobs. Settings are updated without a restart viaPATCH /api/admin/system/settings. Env vars remain the first-boot seed; the database is authoritative thereafter. See the System Settings guide. - Sign-out everywhere —
DELETE /api/admin/auth/sessionsrevokes all currently-issued tokens globally;DELETE /api/admin/auth/users/{username}/sessionsrevokes all tokens for a single user. Available on all license tiers. - New Docker secrets:
mfa_encryption_key(required),oidc_entra_client_secret,oidc_google_client_secret. - New database migrations:
0016(OIDC/TOTP user fields),0017(groups & ACL tables),0018(oidc_providerstable).
Fixed
- MFA bypass (security) —
POST /api/auth/loginpreviously returned a full session token even when TOTP was enrolled. Now correctly returns{"status": "mfa_required", "mfa_token": "..."}and requires the client to complete the second step. - OIDC role preservation (#328) —
provision_user()no longer overwritesroleor re-syncs groups for existing OIDC users on subsequent logins. - Test suite (PR #349) — repaired 23 stale tests after the system-settings/RAG-tunables work; restored
nginx/conf.d/default.conf.templateto its correct HTTP-only form (it had been clobbered with the TLS template, breakingTLS_ENABLED=0deployments).
Changed
- Several env vars that previously drove runtime behaviour are now first-boot seeds only. After first boot the
system_settingsdatabase table is authoritative and changes apply without a restart:RAG_AUDIT_RETENTION_DAYS,RAG_CONVERSATION_MAX_AGE_DAYS,RAG_CONVERSATION_MAX_TURNS,LOG_LEVEL, and others. See Environment Variables. - JWT token lifetimes (
access_token_seconds,refresh_token_seconds) are now managed viaPATCH /api/admin/auth/settings(theauth_settingstable), not by restarting the service.
[1.7.0]
Added
- SSE streaming chat —
POST /api/chatnow acceptsAccept: text/event-streamand delivers the answer as Server-Sent Events through a fully async pipeline. Conversation history is persisted identically to non-streaming responses. - Job Schedules — cron-based scheduler for automatic connector and integration syncs. Scheduling requires a Pro plan or higher; manual sync remains available on Free.
- Enterprise audit log — append-only Postgres audit log with admin query APIs, configurable retention, and coverage for chat, sync, and config lifecycle events. Enterprise only.
- Operational backup/restore — runbook and automation for backing up and restoring Postgres, Qdrant, branding assets, and local models.
- Bundled nginx ingress with rate limiting — nginx reverse proxy is now part of the deployment image, with rate limits on
/api/auth,/api/chat, and/api/upload. SSE streaming is preserved end-to-end. - Docker/Podman secrets support — deployment secrets (JWT key, DB passwords, Qdrant API key) are moved out of
.envand managed via Docker or Podman secrets. EFFECTIVE_N_CTXreporting — the inference runtime now exposes the effective context-window size as the single source of truth. A startup warning is logged when the configuredN_CTXexceeds actual model capacity.
Changed
- Chat streaming refactored to a native async pipeline to prevent FastAPI event-loop blocking.
- Sync execution moved from routers into the service layer to enforce the layering boundary.
- Scheduler logs the effective license tier on every scheduled fire.
[1.6.0]
Added
- Unified Metadata Rules API — new
/api/metadata/{source_id}/rulesendpoints replace the old per-connector metadata-rules endpoints. Rules can now be attached to integrations (GitHub, Slack, Google Drive) in addition to connectors. All endpoints require Pro plan or higher. - Analytics Dashboard API — seven new endpoints under
/api/analytics/{source_id}/provide insights into chunk distribution, metadata coverage, rule effectiveness, and more. Requires Pro plan or higher. - Integration source support for metadata rules — integrations can now have their own metadata extraction rules via the new
integration_idfield on the metadata rules model. - Pro license guard — new license tier gate for metadata and analytics features.
- Dynamic full-text index creation — Qdrant full-text indexes are now automatically created/updated during connector indexing.
- Automatic payload index cleanup — deleting a metadata rule removes its Qdrant payload index if no other rule uses the same field.
Changed
- Metadata rule endpoints moved from
/api/connectors/{id}/metadata-rulesto/api/metadata/{source_id}/rules - Metadata rule responses now include
source_idandsource_typeinstead ofconnector_id - Metadata rules and analytics require Pro plan or higher (previously no plan restriction on metadata rules)
[1.5.0]
Added
- Hybrid query classifier with optional LLM sidecar — when enabled, ambiguous queries (e.g. matching both an article and an entity) are sent to the local LLM for intent disambiguation. Unambiguous queries still fast-path through the rule-based classifier with zero LLM overhead
- Extraction signal pipeline — all regex patterns now run simultaneously against the query, producing ranked candidate signals. This enables the hybrid classifier to compare and merge complementary intents
- Chunk boundary splitting — connector metadata rules can now act as document pre-split boundaries during indexing (
chunk_boundary: true). Useful for documents with predictable section structure (e.g. legal statutes) - Full-text index auto-creation on integration syncs — Slack, GitHub, and Google Drive indexers now ensure Qdrant full-text indexes exist after sync, so
hybrid_bm25mode works immediately
Changed
- Query classification architecture refactored: signal detection separated from classification logic
- New environment variables:
RAG_CLASSIFIER_LLM_ENABLED,RAG_CLASSIFIER_LLM_MAX_TOKENS
[1.4.0]
Added
- Query Engine — new orchestration layer that coordinates the full query pipeline (classify → retrieve → rerank → budget → generate). Supports pluggable rerankers (
ScoreThresholdReranker,TopKReranker,ChainReranker) and configurable fallback policies (WARN,RETRY_SEMANTIC,ABSTAIN) - Token budget management — the system now prevents context window overflow by estimating token usage and trimming low-relevance chunks before sending to the LLM. Budget diagnostics are surfaced in chat responses via the new
token_budgetfield - Dynamic metadata-aware query classification — the query classifier now automatically uses per-connector metadata extraction rules at query time. Custom metadata fields (e.g. issue IDs, patient IDs) are detected in natural language queries and converted to metadata filters without manual configuration
- Global classifier rule loading — the classifier loads rules from all connectors automatically. Users no longer need to specify which connector their data came from
query_patternoverride for metadata rules — metadata rules can now specify a separate regex pattern for query-time classification, independent of the ingestion pattern. Useful when extraction patterns use anchors or structural regex that don't match mid-sentence queries- Fuzzy matching & typo tolerance — optional fuzzy matching catches common typos and partial identifiers before falling through to semantic search. Three strategies: prefix expansion, edit-distance tolerance, and digit-count tolerance. Enabled by default; confidence is reduced for fuzzy matches to signal uncertainty
Fixed
- Query classifier now applies all connector metadata rules regardless of how the query was submitted — previously missed custom patterns in some cases
Changed
- Chat endpoint now returns additional diagnostic fields:
token_budget(budget usage), effective retrieval settings, and timing information - Internal query orchestration refactored for better modularity and extensibility
[1.3.0]
Added
- GPU-accelerated backend embeddings — configurable via the
EMBED_DEVICEenvironment variable (auto,cpu,cuda), with dedicated CUDA Docker images for GPU-enabled backend deployments - Smart chunking strategies — sentence-boundary and markdown-aware chunking with an
automode that selects the best splitter based on file type - Per-connector metadata extraction rulesets — define regex-based rules per connector to extract structured metadata from document text, filenames, or headers during ingestion. Extracted fields are attached to every chunk and used automatically by the retrieval pipeline
- Hybrid retrieval layer with 5 modes — semantic, hybrid, metadata-only, comparison/grouping, and hybrid BM25 with Reciprocal Rank Fusion
- Document-aware context builder — retrieved chunks are grouped by document, sorted by reading order, and enriched with structured metadata headers before being sent to the LLM
- Intent-based query classifier — automatically detects query intent and routes to the optimal retrieval strategy. Supports multilingual queries (DE + EN)
- Industry-specific classifiers — configurable via metadata rulesets to support domain-specific query patterns (e.g. article lookups, entity filtering, temporal queries)
- Metadata rules REST API — full CRUD plus a test endpoint for validating rules against sample text before saving
Fixed
- Metadata case-sensitivity mismatch between query classifier and stored payloads — all values are now normalized to lowercase
- Restored error messaging for unrecognized chunking strategies
Changed
- Default chat mode changed from
semantictoauto— the query classifier now runs automatically on every query - All metadata values normalized to lowercase throughout the pipeline (re-index required after upgrade)
- Context assembly rewritten to use the new document-aware context builder
[1.2.1]
Added
- Configurable backend worker count via
BACKEND_WORKERSenvironment variable — no image rebuild required - Multi-worker job polling with Redis fallback — fixes 404s when different workers serve poll requests
- PID in log lines for multi-worker debugging
Fixed
- Connector sync cancellation not working — incremental indexer now checks cancel token in all phases
- GitHub integrations without a PAT silently hitting rate limits — now rejected with HTTP 400
Changed
- Incremental indexer uses rolling batches with cancel token checks
- GitHub PAT validation is now a hard requirement for integration creation
[1.2.0]
Added
- NVIDIA GPU acceleration for inference on x86_64 systems via CUDA, enabling significantly faster LLM responses
- Configurable GPU offloading using the
N_GPU_LAYERSenvironment variable (e.g.-1to offload all layers) - CUDA-enabled inference image build support
- Support for running the inference component on Jetson Orin Nano (aarch64 / JetPack). Full support planned with future release
- Documentation for Jetson deployments, including hardware requirements and model sizing guidance
- Platform-aware installer that detects NVIDIA GPUs and automatically selects the appropriate inference image and configuration
Improved
- Excel (
.xlsx) extraction reliability by ensuring cell values are fully loaded before processing - Installer experience by automatically configuring GPU settings and reducing manual setup steps
Changed
- Inference image selection is now dynamically determined based on detected hardware (CPU vs GPU)
- Generated
docker-compose.ymlconditionally includes GPU configuration and environment variables when a GPU is available - Inference build process updated to support both CPU and CUDA variants through build-time configuration
Fixed
- Fixed an issue where
.xlsxfiles could result in empty extracted content during indexing
[1.1.0]
Added
- Qdrant client factory with optional API key support
- Secure storage of integration credentials (Slack, GitHub, Google Drive)
- Generic integration sync tracking across all connectors
- Token usage logging for inference requests (optional)
- Model tuning guide for CPU-based inference
- Built-in license verification (no runtime public key required)
- Unit tests for Qdrant client and credential handling
Improved
- Consistent handling of integration sync state (Slack, GitHub, Google Drive)
- Google Drive integration using service account authentication
- Centralised Qdrant client usage across the codebase
- Docker setup with optional Qdrant API key configuration
- Environment variable naming for inference settings (
N_CTX,N_THREADS) - Inference response now includes token usage metadata
Fixed
- Incorrect or missing
last_syncvalues for integrations - Google Drive API compatibility issues
Changed
- Inference server loads environment variables automatically
[1.0.0]
Added
- Initial stable release of RAG-DocBot
- FastAPI backend with full REST API
- JWT-based authentication with RBAC (viewer, editor, admin roles)
- Document upload and indexing pipeline
- Qdrant vector database integration for semantic search
- Retrieval-Augmented Generation (RAG) chat endpoint
- Async job system for document ingestion and indexing
- PostgreSQL persistent storage with automatic migrations
- Redis for live job state
- llama-cpp-python inference service with GGUF model support
- Source connectors: file system (local directories)
- License validation (FREE / PRO / ENTERPRISE tiers)
- Privacy-preserving logging: log anonymisation and query redaction enabled by default
- Automatic conversation history purging and turn capping
- Branding customisation (logo, display name)
- Hardware and model info endpoints
- Docker Compose deployment with named volumes for data persistence
[0.9.0]
Added
- Connector framework for external document sources
- Slack and GitHub connector support
- Bulk document delete endpoint
- Integration sync endpoint
Changed
- Improved index rebuild performance
- Reduced memory usage during document extraction
[0.8.0]
Added
- Conversation history API (
GET /api/conversations,GET /api/conversations/{id}) - Conversation auto-purge based on
RAG_CONVERSATION_MAX_AGE_DAYS - Max turns per conversation cap (
RAG_CONVERSATION_MAX_TURNS)
Fixed
- Race condition in job status updates
[0.7.0]
Added
- PRO and ENTERPRISE license tier support
- CSV, Excel, and HTML document type support (PRO and ENTERPRISE)
- License endpoint (
GET /api/license,POST /api/license)
[0.6.0]
Added
- Role-based access control (RBAC) — viewer, editor, admin roles
- User management API (
CRUD /api/auth/users) - Default admin account creation on first boot
[0.5.0]
Added
- Refresh token support (
POST /api/auth/refresh) - Token expiry configuration via environment variables
- pgadmin service for database inspection
[0.4.0]
Added
- Job management API (
GET /api/jobs,GET /api/jobs/{id},POST /api/jobs/{id}/cancel) - Async indexing pipeline
- Index stats endpoint (
GET /api/index/stats)
[0.3.0]
Added
- Index rebuild endpoint (
POST /api/index/rebuild) - Branding API (logo upload, branding config)
- Hardware info endpoint
[0.2.0]
Added
- Document upload, list, and delete endpoints
- Qdrant integration for vector storage
- Basic RAG chat endpoint
[0.1.0]
Added
- Initial project structure
- FastAPI application scaffold
- PostgreSQL and Redis integration
- JWT login endpoint
- Health check endpoint