
iKB by ThoughtsMachine
Enterprise-grade AI knowledgebase with governed accuracy
4 followers
Enterprise-grade AI knowledgebase with governed accuracy
4 followers
iKB is a self-hosted AI knowledge base platform that enables organisations to build conversational interfaces over their document libraries. It combines vector-based retrieval with automatic knowledge graph construction, delivering significantly higher accuracy on complex queries requiring multi-document synthesis.












Platform Overview
Core capabilities
RAG-based chat with streaming responses and source citations
Hybrid search (vector + keyword) with reranking and deduplication controls
Knowledge graph enhancement (LightRAG/GraphRAG) for multi-hop reasoning
Ingestion for PDFs, Office files, images (OCR), and text formats
Web crawling using Playwright with SSRF protections
Cloud ingestion via rclone and S3-compatible storage (S3/MinIO/R2)
Multi-topic architecture with granular access control and custom settings
Analytics: token usage, session metrics, feedback, ingestion/crawl performance
Enterprise security: encryption, rate limiting, CSRF protection, audit logs
Architecture and Stack
High-level components
Web UI (Admin + Chat), API layer, retrieval engine, citation assembly
Knowledge graph layer (LightRAG/GraphRAG) with per-topic enablement
Data layer: PostgreSQL + pgvector; Redis for caching and rate limiting
Integrations: object storage, rclone cloud drives, Chatwoot, custom AI endpoints
Retrieval modes
Vector search, hybrid (BM25-like + vector), reranking, diversity caps, deduplication
Model management
Per-topic model configuration (OpenAI and OpenAI-compatible endpoints)
Token and pricing tracking (admin-configurable), temperature and response controls
Ingestion and Knowledge Management
Inputs: PDF, DOCX, PPTX, XLSX, images (OCR), TXT/CSV/Markdown
Channels: UI upload, web crawl, S3-compatible storage, rclone cloud drives
Pipeline: validate → (optional) malware scan → extract/OCR → chunk/tokenize → embed → store in pgvector → optional GraphRAG indexing
Security, Compliance, and Governance
Security controls
Encrypted message storage (AES-256-GCM for chat content)
Secure sessions (strict cookies) and CSRF protection
Rate limiting (platform and endpoint levels)
Admin audit logs and security headers (CORS/CSP)
SSRF-safe crawling and IP allowlisting for sensitive callbacks
Governance
Topic-level access control (public/private/unlisted), user groups, topic groups
Admin-only configuration with auditability
Optional incognito mode for sensitive queries
Deployment, Performance, Operations
Deployment: self-hosted, private cloud, on-premises, air-gapped
Reference concurrency: ~30–80 concurrent chats per instance (configuration-dependent)
Scaling levers: horizontal app nodes/workers, PostgreSQL tuning/pooling, batching, dedicated storage for ingestion/graph
Operations: background tasks for indexing, health checks, logging, query timeouts, configurable upload/crawl limits
Differentiators
Accuracy-first design: citations, hybrid retrieval, and governance-first controls
GraphRAG/LightRAG augmentation for deeper, multi-hop reasoning
Flexible deployment including air-gapped
Broad integrations (Chatwoot OMNI, rclone, S3-compatible storage)