Skip to main content

Changelog

All notable changes to RAG-DocBot are documented here.


[1.8.0]

Added

  • Federated login (OIDC / SSO) — multi-provider OpenID Connect login with Authorization Code + PKCE (S256). Pre-built support for Microsoft Entra ID, Google Workspace, Keycloak, and any standards-compliant OIDC IdP. Provider configuration is managed via the admin UI (oidc_providers Postgres table); env-var-based config (OIDC_PROVIDERS / OIDC_<NAME>_*) remains as a fallback when the table is empty. client_secret is encrypted at rest; a secret_set: bool flag is returned by the API instead of the secret value. JIT user provisioning on first login; group sync from IdP claims; OIDC_<NAME>_ADMIN_GROUPS auto-promotes users to admin. See the SSO / OIDC guide.
  • TOTP MFA (Two-Factor Authentication) — time-based one-time passwords as a second factor for any account. Enrollment returns a QR SVG plus 10 single-use recovery codes. TOTP secrets encrypted at rest with AES-256-GCM. See the MFA / TOTP guide.
  • Groups & Resource ACL (Enterprise) — per-connector and per-integration access control enforced at retrieval time via Qdrant payload filters. Built-in everyone group seeded on first boot. Admins can define groups, assign users, and set ACL on connectors/integrations. Admin role bypasses all ACL checks. See the Groups & ACL guide.
  • TLS / HTTPS termination in bundled nginx — opt-in via TLS_ENABLED=1. Modern cipher suite (TLSv1.2 + 1.3), configurable HSTS, OCSP stapling (toggleable for air-gapped environments). HTTP :80 redirects to HTTPS and serves Let's Encrypt HTTP-01 challenges. Helper scripts provided for self-signed, internal CA, and Let's Encrypt deployments. See the TLS / HTTPS guide.
  • Runtime-tunable system settings — new system_settings Postgres table replaces several previously env-only knobs. Settings are updated without a restart via PATCH /api/admin/system/settings. Env vars remain the first-boot seed; the database is authoritative thereafter. See the System Settings guide.
  • Sign-out everywhereDELETE /api/admin/auth/sessions revokes all currently-issued tokens globally; DELETE /api/admin/auth/users/{username}/sessions revokes all tokens for a single user. Available on all license tiers.
  • New Docker secrets: mfa_encryption_key (required), oidc_entra_client_secret, oidc_google_client_secret.
  • New database migrations: 0016 (OIDC/TOTP user fields), 0017 (groups & ACL tables), 0018 (oidc_providers table).

Fixed

  • MFA bypass (security)POST /api/auth/login previously returned a full session token even when TOTP was enrolled. Now correctly returns {"status": "mfa_required", "mfa_token": "..."} and requires the client to complete the second step.
  • OIDC role preservation (#328) — provision_user() no longer overwrites role or re-syncs groups for existing OIDC users on subsequent logins.
  • Test suite (PR #349) — repaired 23 stale tests after the system-settings/RAG-tunables work; restored nginx/conf.d/default.conf.template to its correct HTTP-only form (it had been clobbered with the TLS template, breaking TLS_ENABLED=0 deployments).

Changed

  • Several env vars that previously drove runtime behaviour are now first-boot seeds only. After first boot the system_settings database table is authoritative and changes apply without a restart: RAG_AUDIT_RETENTION_DAYS, RAG_CONVERSATION_MAX_AGE_DAYS, RAG_CONVERSATION_MAX_TURNS, LOG_LEVEL, and others. See Environment Variables.
  • JWT token lifetimes (access_token_seconds, refresh_token_seconds) are now managed via PATCH /api/admin/auth/settings (the auth_settings table), not by restarting the service.

[1.7.0]

Added

  • SSE streaming chatPOST /api/chat now accepts Accept: text/event-stream and delivers the answer as Server-Sent Events through a fully async pipeline. Conversation history is persisted identically to non-streaming responses.
  • Job Schedules — cron-based scheduler for automatic connector and integration syncs. Scheduling requires a Pro plan or higher; manual sync remains available on Free.
  • Enterprise audit log — append-only Postgres audit log with admin query APIs, configurable retention, and coverage for chat, sync, and config lifecycle events. Enterprise only.
  • Operational backup/restore — runbook and automation for backing up and restoring Postgres, Qdrant, branding assets, and local models.
  • Bundled nginx ingress with rate limiting — nginx reverse proxy is now part of the deployment image, with rate limits on /api/auth, /api/chat, and /api/upload. SSE streaming is preserved end-to-end.
  • Docker/Podman secrets support — deployment secrets (JWT key, DB passwords, Qdrant API key) are moved out of .env and managed via Docker or Podman secrets.
  • EFFECTIVE_N_CTX reporting — the inference runtime now exposes the effective context-window size as the single source of truth. A startup warning is logged when the configured N_CTX exceeds actual model capacity.

Changed

  • Chat streaming refactored to a native async pipeline to prevent FastAPI event-loop blocking.
  • Sync execution moved from routers into the service layer to enforce the layering boundary.
  • Scheduler logs the effective license tier on every scheduled fire.

[1.6.0]

Added

  • Unified Metadata Rules API — new /api/metadata/{source_id}/rules endpoints replace the old per-connector metadata-rules endpoints. Rules can now be attached to integrations (GitHub, Slack, Google Drive) in addition to connectors. All endpoints require Pro plan or higher.
  • Analytics Dashboard API — seven new endpoints under /api/analytics/{source_id}/ provide insights into chunk distribution, metadata coverage, rule effectiveness, and more. Requires Pro plan or higher.
  • Integration source support for metadata rules — integrations can now have their own metadata extraction rules via the new integration_id field on the metadata rules model.
  • Pro license guard — new license tier gate for metadata and analytics features.
  • Dynamic full-text index creation — Qdrant full-text indexes are now automatically created/updated during connector indexing.
  • Automatic payload index cleanup — deleting a metadata rule removes its Qdrant payload index if no other rule uses the same field.

Changed

  • Metadata rule endpoints moved from /api/connectors/{id}/metadata-rules to /api/metadata/{source_id}/rules
  • Metadata rule responses now include source_id and source_type instead of connector_id
  • Metadata rules and analytics require Pro plan or higher (previously no plan restriction on metadata rules)

[1.5.0]

Added

  • Hybrid query classifier with optional LLM sidecar — when enabled, ambiguous queries (e.g. matching both an article and an entity) are sent to the local LLM for intent disambiguation. Unambiguous queries still fast-path through the rule-based classifier with zero LLM overhead
  • Extraction signal pipeline — all regex patterns now run simultaneously against the query, producing ranked candidate signals. This enables the hybrid classifier to compare and merge complementary intents
  • Chunk boundary splitting — connector metadata rules can now act as document pre-split boundaries during indexing (chunk_boundary: true). Useful for documents with predictable section structure (e.g. legal statutes)
  • Full-text index auto-creation on integration syncs — Slack, GitHub, and Google Drive indexers now ensure Qdrant full-text indexes exist after sync, so hybrid_bm25 mode works immediately

Changed

  • Query classification architecture refactored: signal detection separated from classification logic
  • New environment variables: RAG_CLASSIFIER_LLM_ENABLED, RAG_CLASSIFIER_LLM_MAX_TOKENS

[1.4.0]

Added

  • Query Engine — new orchestration layer that coordinates the full query pipeline (classify → retrieve → rerank → budget → generate). Supports pluggable rerankers (ScoreThresholdReranker, TopKReranker, ChainReranker) and configurable fallback policies (WARN, RETRY_SEMANTIC, ABSTAIN)
  • Token budget management — the system now prevents context window overflow by estimating token usage and trimming low-relevance chunks before sending to the LLM. Budget diagnostics are surfaced in chat responses via the new token_budget field
  • Dynamic metadata-aware query classification — the query classifier now automatically uses per-connector metadata extraction rules at query time. Custom metadata fields (e.g. issue IDs, patient IDs) are detected in natural language queries and converted to metadata filters without manual configuration
  • Global classifier rule loading — the classifier loads rules from all connectors automatically. Users no longer need to specify which connector their data came from
  • query_pattern override for metadata rules — metadata rules can now specify a separate regex pattern for query-time classification, independent of the ingestion pattern. Useful when extraction patterns use anchors or structural regex that don't match mid-sentence queries
  • Fuzzy matching & typo tolerance — optional fuzzy matching catches common typos and partial identifiers before falling through to semantic search. Three strategies: prefix expansion, edit-distance tolerance, and digit-count tolerance. Enabled by default; confidence is reduced for fuzzy matches to signal uncertainty

Fixed

  • Query classifier now applies all connector metadata rules regardless of how the query was submitted — previously missed custom patterns in some cases

Changed

  • Chat endpoint now returns additional diagnostic fields: token_budget (budget usage), effective retrieval settings, and timing information
  • Internal query orchestration refactored for better modularity and extensibility

[1.3.0]

Added

  • GPU-accelerated backend embeddings — configurable via the EMBED_DEVICE environment variable (auto, cpu, cuda), with dedicated CUDA Docker images for GPU-enabled backend deployments
  • Smart chunking strategies — sentence-boundary and markdown-aware chunking with an auto mode that selects the best splitter based on file type
  • Per-connector metadata extraction rulesets — define regex-based rules per connector to extract structured metadata from document text, filenames, or headers during ingestion. Extracted fields are attached to every chunk and used automatically by the retrieval pipeline
  • Hybrid retrieval layer with 5 modes — semantic, hybrid, metadata-only, comparison/grouping, and hybrid BM25 with Reciprocal Rank Fusion
  • Document-aware context builder — retrieved chunks are grouped by document, sorted by reading order, and enriched with structured metadata headers before being sent to the LLM
  • Intent-based query classifier — automatically detects query intent and routes to the optimal retrieval strategy. Supports multilingual queries (DE + EN)
  • Industry-specific classifiers — configurable via metadata rulesets to support domain-specific query patterns (e.g. article lookups, entity filtering, temporal queries)
  • Metadata rules REST API — full CRUD plus a test endpoint for validating rules against sample text before saving

Fixed

  • Metadata case-sensitivity mismatch between query classifier and stored payloads — all values are now normalized to lowercase
  • Restored error messaging for unrecognized chunking strategies

Changed

  • Default chat mode changed from semantic to auto — the query classifier now runs automatically on every query
  • All metadata values normalized to lowercase throughout the pipeline (re-index required after upgrade)
  • Context assembly rewritten to use the new document-aware context builder

[1.2.1]

Added

  • Configurable backend worker count via BACKEND_WORKERS environment variable — no image rebuild required
  • Multi-worker job polling with Redis fallback — fixes 404s when different workers serve poll requests
  • PID in log lines for multi-worker debugging

Fixed

  • Connector sync cancellation not working — incremental indexer now checks cancel token in all phases
  • GitHub integrations without a PAT silently hitting rate limits — now rejected with HTTP 400

Changed

  • Incremental indexer uses rolling batches with cancel token checks
  • GitHub PAT validation is now a hard requirement for integration creation

[1.2.0]

Added

  • NVIDIA GPU acceleration for inference on x86_64 systems via CUDA, enabling significantly faster LLM responses
  • Configurable GPU offloading using the N_GPU_LAYERS environment variable (e.g. -1 to offload all layers)
  • CUDA-enabled inference image build support
  • Support for running the inference component on Jetson Orin Nano (aarch64 / JetPack). Full support planned with future release
  • Documentation for Jetson deployments, including hardware requirements and model sizing guidance
  • Platform-aware installer that detects NVIDIA GPUs and automatically selects the appropriate inference image and configuration

Improved

  • Excel (.xlsx) extraction reliability by ensuring cell values are fully loaded before processing
  • Installer experience by automatically configuring GPU settings and reducing manual setup steps

Changed

  • Inference image selection is now dynamically determined based on detected hardware (CPU vs GPU)
  • Generated docker-compose.yml conditionally includes GPU configuration and environment variables when a GPU is available
  • Inference build process updated to support both CPU and CUDA variants through build-time configuration

Fixed

  • Fixed an issue where .xlsx files could result in empty extracted content during indexing

[1.1.0]

Added

  • Qdrant client factory with optional API key support
  • Secure storage of integration credentials (Slack, GitHub, Google Drive)
  • Generic integration sync tracking across all connectors
  • Token usage logging for inference requests (optional)
  • Model tuning guide for CPU-based inference
  • Built-in license verification (no runtime public key required)
  • Unit tests for Qdrant client and credential handling

Improved

  • Consistent handling of integration sync state (Slack, GitHub, Google Drive)
  • Google Drive integration using service account authentication
  • Centralised Qdrant client usage across the codebase
  • Docker setup with optional Qdrant API key configuration
  • Environment variable naming for inference settings (N_CTX, N_THREADS)
  • Inference response now includes token usage metadata

Fixed

  • Incorrect or missing last_sync values for integrations
  • Google Drive API compatibility issues

Changed

  • Inference server loads environment variables automatically

[1.0.0]

Added

  • Initial stable release of RAG-DocBot
  • FastAPI backend with full REST API
  • JWT-based authentication with RBAC (viewer, editor, admin roles)
  • Document upload and indexing pipeline
  • Qdrant vector database integration for semantic search
  • Retrieval-Augmented Generation (RAG) chat endpoint
  • Async job system for document ingestion and indexing
  • PostgreSQL persistent storage with automatic migrations
  • Redis for live job state
  • llama-cpp-python inference service with GGUF model support
  • Source connectors: file system (local directories)
  • License validation (FREE / PRO / ENTERPRISE tiers)
  • Privacy-preserving logging: log anonymisation and query redaction enabled by default
  • Automatic conversation history purging and turn capping
  • Branding customisation (logo, display name)
  • Hardware and model info endpoints
  • Docker Compose deployment with named volumes for data persistence

[0.9.0]

Added

  • Connector framework for external document sources
  • Slack and GitHub connector support
  • Bulk document delete endpoint
  • Integration sync endpoint

Changed

  • Improved index rebuild performance
  • Reduced memory usage during document extraction

[0.8.0]

Added

  • Conversation history API (GET /api/conversations, GET /api/conversations/{id})
  • Conversation auto-purge based on RAG_CONVERSATION_MAX_AGE_DAYS
  • Max turns per conversation cap (RAG_CONVERSATION_MAX_TURNS)

Fixed

  • Race condition in job status updates

[0.7.0]

Added

  • PRO and ENTERPRISE license tier support
  • CSV, Excel, and HTML document type support (PRO and ENTERPRISE)
  • License endpoint (GET /api/license, POST /api/license)

[0.6.0]

Added

  • Role-based access control (RBAC) — viewer, editor, admin roles
  • User management API (CRUD /api/auth/users)
  • Default admin account creation on first boot

[0.5.0]

Added

  • Refresh token support (POST /api/auth/refresh)
  • Token expiry configuration via environment variables
  • pgadmin service for database inspection

[0.4.0]

Added

  • Job management API (GET /api/jobs, GET /api/jobs/{id}, POST /api/jobs/{id}/cancel)
  • Async indexing pipeline
  • Index stats endpoint (GET /api/index/stats)

[0.3.0]

Added

  • Index rebuild endpoint (POST /api/index/rebuild)
  • Branding API (logo upload, branding config)
  • Hardware info endpoint

[0.2.0]

Added

  • Document upload, list, and delete endpoints
  • Qdrant integration for vector storage
  • Basic RAG chat endpoint

[0.1.0]

Added

  • Initial project structure
  • FastAPI application scaffold
  • PostgreSQL and Redis integration
  • JWT login endpoint
  • Health check endpoint