Changelog

All notable changes to RAG-DocBot are documented here.

[1.9.1]

Added

Multi-worker safe job state — job status, progress, and active-job tracking can now be shared across backend replicas via Redis, with a safe in-memory fallback if Redis is unavailable. Job polling now returns the correct state no matter which backend replica serves the request.
Scheduler leader lock for multi-replica deployments — scheduled connector and integration syncs are now coordinated so only one backend replica fires each schedule per tick.
Bounded-memory indexing pipeline — local file, GitHub, and Google Drive indexing now flush to Qdrant in fixed-size batches to cap peak memory usage on large corpora.
Automatic retention sweeps — daily background cleanup now prunes old job history and conversations based on configured retention settings.
Database pool sizing controls — connection pool tuning is now exposed so operators can adjust database concurrency for their deployment profile.
Optional API documentation endpoints — Swagger UI, ReDoc, and OpenAPI schema endpoints can now be enabled explicitly for environments that want interactive API docs.
Helm / GKE Redis secret alignment — Redis authentication can now be sourced from a single External Secrets-managed secret shared by Redis and backend workloads.

Changed

Embedder lifecycle optimized — the embedding model is now loaded once per process and reused across indexing and query paths, reducing repeated model initialization and VRAM pressure.
Qdrant client reuse at startup — the app now shares one Qdrant client instance across pipeline initialization paths.

Fixed

Missing or 404 job updates across backend workers — multi-replica polling now resolves correctly from shared job state instead of per-worker memory.
Duplicate scheduled sync executions — leader locking prevents multiple replicas from triggering the same schedule in the same tick.
Conversation retention sweep regression in 1.9.x — background cleanup now correctly honors the admin-managed conversation retention setting.
Redis auth edge cases in MFA / OIDC flows — authentication paths now consistently use shared Redis client configuration, including password injection.

Upgrade notes

All newly introduced environment variables are backward-compatible and there are no breaking changes.
Multi-replica operators can remove the previous scheduler-disable workaround on secondary backend replicas.
For GKE + External Secrets setups, provision a docbot-redis secret in GCP Secret Manager as {"password":"..."} before upgrading.

[1.9.0]

Added

Kubernetes / GKE first-class deployment — Docbot can you be deployed on GCP and local Clusters. While the local cluster guide is not available for setup yet, clients can now choose between on-prem installation of docbot or a managed cloud solution. The default Region is europe but since the deployment for each client is isolated, the region or other custom requirements can be discussed and set.
OCR support for scanned PDFs and image files — additive OCR extraction pipeline for scanned documents and images, with Tesseract and docTR backends, CUDA-aware auto-selection, and graceful degradation when OCR dependencies are missing. Global toggles INDEXING_OCR_ENABLED and INDEXING_OCR_LANGUAGES (plus INDEXING_TABLES_ENABLED) are exposed as Helm values. Hardened backend images now ship with the Tesseract runtime preinstalled (English + German language packs).
Additive table-row extraction and indexing for PDF, DOCX, Excel, and CSV with opt-in table metadata payloads — queries can now retrieve and filter individual table rows alongside the surrounding document text.
LLM_BACKEND=openai as a third execution mode — disables the in-cluster inference pod and routes chat requests to OpenAI's hosted API. This is useful for instances where one might want to test the features without commiting to GPU ressources early on, or clients who already have an OpenAI Key and would like to let the inference run in this way, while keeping the business logic and data on-prem.
Connector-aware upload, delete, and move endpoints — file operations now respect the configured connector set and route writes to the correct backing volume.
Email-based password reset for local users — new public endpoints (POST /api/auth/forgot-password, POST /api/auth/reset-password) and an authenticated email-change endpoint (PUT /api/auth/me/email). Single-use, hashed reset tokens with expiry. Pluggable email transport (EMAIL_PROVIDER=console|smtp) with SMTP and console drivers, dedicated nginx rate limits for the reset endpoints, and a DEFAULT_ADMIN_EMAIL prompt in the installer. The reset link base URL is auto-derived from the ingress host when not explicitly set.
Self-service TOTP MFA for all authenticated users — TOTP enrolment and login flows are now available on every license tier (Free, Pro, Enterprise) rather than being gated to Enterprise. Endpoints live under /api/auth/totp/* and the /mfa-login flow is wired into all tiers.
Runtime app version surfaced in the API docs landing page and via /api/version and /api/health, making rolling deployments and support diagnostics easier.
Backend license persisted across redeploys — license state survives Helm upgrades and pod restarts; Helm restarts are now migration-safe.

Changed

The Helm chart is now the canonical Kubernetes deployment path: in-cluster Postgres (StatefulSet), Redis (Deployment + optional PVC), and Qdrant (StatefulSet, pinned to v1.11.0) are all templated. The backend pod runs wait-for-migrations and wait-for-inference init containers so it never starts ahead of its dependencies. A Helm pre-install/pre-upgrade Job runs Alembic migrations exactly once per release.
All container images now run as a real non-root appuser (uid/gid 1000), and the chart enforces runAsNonRoot, runAsUser, and runAsGroup for both the backend and inference Deployments.
The chart's ClusterIssuer has been moved out of the chart itself — the full cert-manager lifecycle is now documented as a separate, explicit step in the GKE runbook.

Fixed

Migrations Job missing Secret on GKE — the docbot Secret (with jwt-secret-key and default-admin-password) used by the migrations Job is now rendered by the chart via ESO when externalSecrets.enabled=true, eliminating the CreateContainerConfigError that previously required a manual kubectl create secret workaround.
/api/model-info behaviour in external LLM mode — the endpoint now returns metadata immediately when LLM_BACKEND=openai, so the UI no longer hangs in an indefinite loading state.
ErrImageNeverPull on kind — the development values file now overrides global.imageRegistry so kind-loaded images resolve without a registry prefix.
Inference model-info docs path — corrected to GET /v1/model-info (was incorrectly documented as /v1/info).

[1.8.0]

Added

Federated login (OIDC / SSO) — multi-provider OpenID Connect login with Authorization Code + PKCE (S256). Pre-built support for Microsoft Entra ID, Google Workspace, Keycloak, and any standards-compliant OIDC IdP. Provider configuration is managed via the admin UI (oidc_providers Postgres table); env-var-based config (OIDC_PROVIDERS / OIDC_<NAME>_*) remains as a fallback when the table is empty. client_secret is encrypted at rest; a secret_set: bool flag is returned by the API instead of the secret value. JIT user provisioning on first login; group sync from IdP claims; OIDC_<NAME>_ADMIN_GROUPS auto-promotes users to admin. See the SSO / OIDC guide.
TOTP MFA (Two-Factor Authentication) — time-based one-time passwords as a second factor for any account. Enrollment returns a QR SVG plus 10 single-use recovery codes. TOTP secrets encrypted at rest with AES-256-GCM. See the MFA / TOTP guide.
Groups & Resource ACL (Enterprise) — per-connector and per-integration access control enforced at retrieval time via Qdrant payload filters. Built-in everyone group seeded on first boot. Admins can define groups, assign users, and set ACL on connectors/integrations. Admin role bypasses all ACL checks. See the Groups & ACL guide.
TLS / HTTPS termination in bundled nginx — opt-in via TLS_ENABLED=1. Modern cipher suite (TLSv1.2 + 1.3), configurable HSTS, OCSP stapling (toggleable for air-gapped environments). HTTP :80 redirects to HTTPS and serves Let's Encrypt HTTP-01 challenges. Helper scripts provided for self-signed, internal CA, and Let's Encrypt deployments. See the TLS / HTTPS guide.
Runtime-tunable system settings — new system_settings Postgres table replaces several previously env-only knobs. Settings are updated without a restart via PATCH /api/admin/system/settings. Env vars remain the first-boot seed; the database is authoritative thereafter. See the System Settings guide.
Sign-out everywhere — DELETE /api/admin/auth/sessions revokes all currently-issued tokens globally; DELETE /api/admin/auth/users/{username}/sessions revokes all tokens for a single user. Available on all license tiers.
New Docker secrets: mfa_encryption_key (required), oidc_entra_client_secret, oidc_google_client_secret.
New database migrations: 0016 (OIDC/TOTP user fields), 0017 (groups & ACL tables), 0018 (oidc_providers table).

Fixed

MFA bypass (security) — POST /api/auth/login previously returned a full session token even when TOTP was enrolled. Now correctly returns {"status": "mfa_required", "mfa_token": "..."} and requires the client to complete the second step.
OIDC role preservation (#328) — provision_user() no longer overwrites role or re-syncs groups for existing OIDC users on subsequent logins.
Test suite (PR #349) — repaired 23 stale tests after the system-settings/RAG-tunables work; restored nginx/conf.d/default.conf.template to its correct HTTP-only form (it had been clobbered with the TLS template, breaking TLS_ENABLED=0 deployments).

Changed

Several env vars that previously drove runtime behaviour are now first-boot seeds only. After first boot the system_settings database table is authoritative and changes apply without a restart: RAG_AUDIT_RETENTION_DAYS, RAG_CONVERSATION_MAX_AGE_DAYS, RAG_CONVERSATION_MAX_TURNS, LOG_LEVEL, and others. See Environment Variables.
JWT token lifetimes (access_token_seconds, refresh_token_seconds) are now managed via PATCH /api/admin/auth/settings (the auth_settings table), not by restarting the service.

[1.7.0]

Added

SSE streaming chat — POST /api/chat now accepts Accept: text/event-stream and delivers the answer as Server-Sent Events through a fully async pipeline. Conversation history is persisted identically to non-streaming responses.
Job Schedules — cron-based scheduler for automatic connector and integration syncs. Scheduling requires a Pro plan or higher; manual sync remains available on Free.
Enterprise audit log — append-only Postgres audit log with admin query APIs, configurable retention, and coverage for chat, sync, and config lifecycle events. Enterprise only.
Operational backup/restore — runbook and automation for backing up and restoring Postgres, Qdrant, branding assets, and local models.
Bundled nginx ingress with rate limiting — nginx reverse proxy is now part of the deployment image, with rate limits on /api/auth, /api/chat, and /api/upload. SSE streaming is preserved end-to-end.
Docker/Podman secrets support — deployment secrets (JWT key, DB passwords, Qdrant API key) are moved out of .env and managed via Docker or Podman secrets.
EFFECTIVE_N_CTX reporting — the inference runtime now exposes the effective context-window size as the single source of truth. A startup warning is logged when the configured N_CTX exceeds actual model capacity.

Changed

Chat streaming refactored to a native async pipeline to prevent FastAPI event-loop blocking.
Sync execution moved from routers into the service layer to enforce the layering boundary.
Scheduler logs the effective license tier on every scheduled fire.

[1.6.0]

Added

Unified Metadata Rules API — new /api/metadata/{source_id}/rules endpoints replace the old per-connector metadata-rules endpoints. Rules can now be attached to integrations (GitHub, Slack, Google Drive) in addition to connectors. All endpoints require Pro plan or higher.
Analytics Dashboard API — seven new endpoints under /api/analytics/{source_id}/ provide insights into chunk distribution, metadata coverage, rule effectiveness, and more. Requires Pro plan or higher.
Integration source support for metadata rules — integrations can now have their own metadata extraction rules via the new integration_id field on the metadata rules model.
Pro license guard — new license tier gate for metadata and analytics features.
Dynamic full-text index creation — Qdrant full-text indexes are now automatically created/updated during connector indexing.
Automatic payload index cleanup — deleting a metadata rule removes its Qdrant payload index if no other rule uses the same field.

Changed

Metadata rule endpoints moved from /api/connectors/{id}/metadata-rules to /api/metadata/{source_id}/rules
Metadata rule responses now include source_id and source_type instead of connector_id
Metadata rules and analytics require Pro plan or higher (previously no plan restriction on metadata rules)

[1.5.0]

Added

Hybrid query classifier with optional LLM sidecar — when enabled, ambiguous queries (e.g. matching both an article and an entity) are sent to the local LLM for intent disambiguation. Unambiguous queries still fast-path through the rule-based classifier with zero LLM overhead
Extraction signal pipeline — all regex patterns now run simultaneously against the query, producing ranked candidate signals. This enables the hybrid classifier to compare and merge complementary intents
Chunk boundary splitting — connector metadata rules can now act as document pre-split boundaries during indexing (chunk_boundary: true). Useful for documents with predictable section structure (e.g. legal statutes)
Full-text index auto-creation on integration syncs — Slack, GitHub, and Google Drive indexers now ensure Qdrant full-text indexes exist after sync, so hybrid_bm25 mode works immediately

Changed

Query classification architecture refactored: signal detection separated from classification logic
New environment variables: RAG_CLASSIFIER_LLM_ENABLED, RAG_CLASSIFIER_LLM_MAX_TOKENS

[1.4.0]

Added

Query Engine — new orchestration layer that coordinates the full query pipeline (classify → retrieve → rerank → budget → generate). Supports pluggable rerankers (ScoreThresholdReranker, TopKReranker, ChainReranker) and configurable fallback policies (WARN, RETRY_SEMANTIC, ABSTAIN)
Token budget management — the system now prevents context window overflow by estimating token usage and trimming low-relevance chunks before sending to the LLM. Budget diagnostics are surfaced in chat responses via the new token_budget field
Dynamic metadata-aware query classification — the query classifier now automatically uses per-connector metadata extraction rules at query time. Custom metadata fields (e.g. issue IDs, patient IDs) are detected in natural language queries and converted to metadata filters without manual configuration
Global classifier rule loading — the classifier loads rules from all connectors automatically. Users no longer need to specify which connector their data came from
query_pattern override for metadata rules — metadata rules can now specify a separate regex pattern for query-time classification, independent of the ingestion pattern. Useful when extraction patterns use anchors or structural regex that don't match mid-sentence queries
Fuzzy matching & typo tolerance — optional fuzzy matching catches common typos and partial identifiers before falling through to semantic search. Three strategies: prefix expansion, edit-distance tolerance, and digit-count tolerance. Enabled by default; confidence is reduced for fuzzy matches to signal uncertainty

Fixed

Query classifier now applies all connector metadata rules regardless of how the query was submitted — previously missed custom patterns in some cases

Changed

Chat endpoint now returns additional diagnostic fields: token_budget (budget usage), effective retrieval settings, and timing information
Internal query orchestration refactored for better modularity and extensibility

[1.3.0]

Added

GPU-accelerated backend embeddings — configurable via the EMBED_DEVICE environment variable (auto, cpu, cuda), with dedicated CUDA Docker images for GPU-enabled backend deployments
Smart chunking strategies — sentence-boundary and markdown-aware chunking with an auto mode that selects the best splitter based on file type
Per-connector metadata extraction rulesets — define regex-based rules per connector to extract structured metadata from document text, filenames, or headers during ingestion. Extracted fields are attached to every chunk and used automatically by the retrieval pipeline
Hybrid retrieval layer with 5 modes — semantic, hybrid, metadata-only, comparison/grouping, and hybrid BM25 with Reciprocal Rank Fusion
Document-aware context builder — retrieved chunks are grouped by document, sorted by reading order, and enriched with structured metadata headers before being sent to the LLM
Intent-based query classifier — automatically detects query intent and routes to the optimal retrieval strategy. Supports multilingual queries (DE + EN)
Industry-specific classifiers — configurable via metadata rulesets to support domain-specific query patterns (e.g. article lookups, entity filtering, temporal queries)
Metadata rules REST API — full CRUD plus a test endpoint for validating rules against sample text before saving

Fixed

Metadata case-sensitivity mismatch between query classifier and stored payloads — all values are now normalized to lowercase
Restored error messaging for unrecognized chunking strategies

Changed

Default chat mode changed from semantic to auto — the query classifier now runs automatically on every query
All metadata values normalized to lowercase throughout the pipeline (re-index required after upgrade)
Context assembly rewritten to use the new document-aware context builder

[1.2.1]

Added

Configurable backend worker count via BACKEND_WORKERS environment variable — no image rebuild required
Multi-worker job polling with Redis fallback — fixes 404s when different workers serve poll requests
PID in log lines for multi-worker debugging

Fixed

Connector sync cancellation not working — incremental indexer now checks cancel token in all phases
GitHub integrations without a PAT silently hitting rate limits — now rejected with HTTP 400

Changed

Incremental indexer uses rolling batches with cancel token checks
GitHub PAT validation is now a hard requirement for integration creation

[1.2.0]

Added

NVIDIA GPU acceleration for inference on x86_64 systems via CUDA, enabling significantly faster LLM responses
Configurable GPU offloading using the N_GPU_LAYERS environment variable (e.g. -1 to offload all layers)
CUDA-enabled inference image build support
Support for running the inference component on Jetson Orin Nano (aarch64 / JetPack). Full support planned with future release
Documentation for Jetson deployments, including hardware requirements and model sizing guidance
Platform-aware installer that detects NVIDIA GPUs and automatically selects the appropriate inference image and configuration

Improved

Excel (.xlsx) extraction reliability by ensuring cell values are fully loaded before processing
Installer experience by automatically configuring GPU settings and reducing manual setup steps

Changed

Inference image selection is now dynamically determined based on detected hardware (CPU vs GPU)
Generated docker-compose.yml conditionally includes GPU configuration and environment variables when a GPU is available
Inference build process updated to support both CPU and CUDA variants through build-time configuration

Fixed

Fixed an issue where .xlsx files could result in empty extracted content during indexing

[1.1.0]

Added

Qdrant client factory with optional API key support
Secure storage of integration credentials (Slack, GitHub, Google Drive)
Generic integration sync tracking across all connectors
Token usage logging for inference requests (optional)
Model tuning guide for CPU-based inference
Built-in license verification (no runtime public key required)
Unit tests for Qdrant client and credential handling

Improved

Consistent handling of integration sync state (Slack, GitHub, Google Drive)
Google Drive integration using service account authentication
Centralised Qdrant client usage across the codebase
Docker setup with optional Qdrant API key configuration
Environment variable naming for inference settings (N_CTX, N_THREADS)
Inference response now includes token usage metadata

Fixed

Incorrect or missing last_sync values for integrations
Google Drive API compatibility issues

Changed

Inference server loads environment variables automatically

[1.0.0]

Added

Initial stable release of RAG-DocBot
FastAPI backend with full REST API
JWT-based authentication with RBAC (viewer, editor, admin roles)
Document upload and indexing pipeline
Qdrant vector database integration for semantic search
Retrieval-Augmented Generation (RAG) chat endpoint
Async job system for document ingestion and indexing
PostgreSQL persistent storage with automatic migrations
Redis for live job state
llama-cpp-python inference service with GGUF model support
Source connectors: file system (local directories)
License validation (FREE / PRO / ENTERPRISE tiers)
Privacy-preserving logging: log anonymisation and query redaction enabled by default
Automatic conversation history purging and turn capping
Branding customisation (logo, display name)
Hardware and model info endpoints
Docker Compose deployment with named volumes for data persistence

[0.9.0]

Added

Connector framework for external document sources
Slack and GitHub connector support
Bulk document delete endpoint
Integration sync endpoint

Changed

Improved index rebuild performance
Reduced memory usage during document extraction

[0.8.0]

Added

Conversation history API (GET /api/conversations, GET /api/conversations/{id})
Conversation auto-purge based on RAG_CONVERSATION_MAX_AGE_DAYS
Max turns per conversation cap (RAG_CONVERSATION_MAX_TURNS)

Fixed

Race condition in job status updates

[0.7.0]

Added

PRO and ENTERPRISE license tier support
CSV, Excel, and HTML document type support (PRO and ENTERPRISE)
License endpoint (GET /api/license, POST /api/license)

[0.6.0]

Added

Role-based access control (RBAC) — viewer, editor, admin roles
User management API (CRUD /api/auth/users)
Default admin account creation on first boot

[0.5.0]

Added

Refresh token support (POST /api/auth/refresh)
Token expiry configuration via environment variables
pgadmin service for database inspection

[0.4.0]

Added

Job management API (GET /api/jobs, GET /api/jobs/{id}, POST /api/jobs/{id}/cancel)
Async indexing pipeline
Index stats endpoint (GET /api/index/stats)

[0.3.0]

Added

Index rebuild endpoint (POST /api/index/rebuild)
Branding API (logo upload, branding config)
Hardware info endpoint

[0.2.0]

Added

Document upload, list, and delete endpoints
Qdrant integration for vector storage
Basic RAG chat endpoint

[0.1.0]

Added

Initial project structure
FastAPI application scaffold
PostgreSQL and Redis integration
JWT login endpoint
Health check endpoint

[1.9.1]​

Added​

Changed​

Fixed​

Upgrade notes​

[1.9.0]​

Added​

Changed​

Fixed​

[1.8.0]​

Added​

Fixed​

Changed​

[1.7.0]​

Added​

Changed​

[1.6.0]​

Added​

Changed​

[1.5.0]​

Added​

Changed​

[1.4.0]​

Added​

Fixed​

Changed​

[1.3.0]​

Added​

Fixed​

Changed​

[1.2.1]​

Added​

Fixed​

Changed​

[1.2.0]​

Added​

Improved​

Changed​

Fixed​

[1.1.0]​

Added​

Improved​

Fixed​

Changed​

[1.0.0]​

Added​

[0.9.0]​

Added​

Changed​

[0.8.0]​

Added​

Fixed​

[0.7.0]​

Added​

[0.6.0]​

Added​

[0.5.0]​

Added​

[0.4.0]​

Added​

[0.3.0]​

Added​

[0.2.0]​

Added​

[0.1.0]​

Added​

[1.9.1]

Added

Changed

Fixed

Upgrade notes

[1.9.0]

Added

Changed

Fixed

[1.8.0]

Added

Fixed

Changed

[1.7.0]

Added

Changed

[1.6.0]

Added

Changed

[1.5.0]

Added

Changed

[1.4.0]

Added

Fixed

Changed

[1.3.0]

Added

Fixed

Changed

[1.2.1]

Added

Fixed

Changed

[1.2.0]

Added

Improved

Changed

Fixed

[1.1.0]

Added

Improved

Fixed

Changed

[1.0.0]

Added

[0.9.0]

Added

Changed

[0.8.0]

Added

Fixed

[0.7.0]

Added

[0.6.0]

Added

[0.5.0]

Added

[0.4.0]

Added

[0.3.0]

Added

[0.2.0]

Added

[0.1.0]

Added