v2.0 · Production Ready

Kairos AI

Elite AI assistant for developers, security engineers, and builders. Live Python, RAG indexing, multi‑model, cloud sync, and a premium chat experience.

Groq Llama 3.3-70B Gemini 2.5 Flash OpenRouter Qwen/Nemotron Pyodide Web Worker Chunked RAG + MiniLM

System Overview

Kairos is a feature‑rich, privacy‑first AI chat interface built for developers who need real‑time code execution, project‑aware context (RAG), and full control over LLM providers. It runs entirely in the browser (with optional Firebase sync) and supports local‑only workflows.

⚡

Instant Responses

Streaming markdown, low latency with Groq, Gemini, OpenRouter.

🔐

Local‑First RAG

Index your codebase → embeddings stored in IndexedDB, no external vector DB required.

☁️

Cloud Sync

Firestore chat history, pinned conversations, cross‑device sync (optional).

Architecture Overview

┌─────────────────────────────────────────────────────────────┐
│                     Kairos AI (Browser)                      │
├─────────────────────────────────────────────────────────────┤
│  UI: c.html + modular CSS (tailwind + custom glass)         │
│       └── Markdown renderer (marked + highlight.js)         │
├─────────────────────────────────────────────────────────────┤
│  Core Logic (Vanilla JS)                                     │
│  ├── Multi‑provider router: Groq, OpenRouter, Gemini        │
│  ├── Streaming response + abort controller                  │
│  ├── Inline edit / regenerate / voice (TTS)                 │
│  └── Template suggestions (/ slash)                         │
├─────────────────────────────────────────────────────────────┤
│  Python Sandbox (Pyodide Worker)                            │
│  ├── Lazy load (first run)                                  │
│  ├── Queue + abort + timeout (30s)                          │
│  └── Output capture, save to chat                           │
├─────────────────────────────────────────────────────────────┤
│  RAG Pipeline                                                │
│  ├── Folder/file upload → token‑aware chunker (500 tok)     │
│  ├── Embedding worker (Transformers.js / MiniLM)            │
│  ├── IndexedDB vector store (cosine + keyword hybrid)       │
│  └── Auto context injection into user prompts               │
├─────────────────────────────────────────────────────────────┤
│  Storage Layers                                              │
│  ├── localStorage: API keys, preferences, templates         │
│  ├── IndexedDB: RAG embeddings (per user)                   │
│  └── Firestore: chat history, title, pin, public status     │
└─────────────────────────────────────────────────────────────┘

Core Capabilities

🧠 Multi‑Model Hub

Switch between 10+ models: Llama 3.3, Gemini 2.5 Flash, Qwen 3‑32B, Nemotron, GPT‑OSS, Kimi K2 – all free tiers.

🐍 Live Python Execution

Run any Python snippet directly in chat, view output, stop execution, and save output back to conversation.

📁 Project RAG

Drop folders / files, index code chunks with embeddings, and ask questions with full file context.

✏️ Inline Edit & Regenerate

Edit your previous messages → messages after that point are removed and AI regenerates.

🎤 Speak & Listen

Text‑to‑speech (Web Speech API) for AI responses, pause/resume, copy message text.

📝 Slash Templates

Type / to access saved prompt templates (customizable via Template Manager).

⚙️ User Preferences

Set tone, code style, security focus → dynamic system prompt applied live.

🔑 API Key Overrides

Replace built‑in keys via Settings popup, stored locally.

Tech Stack

Category	Technologies
Frontend	HTML5, Tailwind CSS, Vanilla JS, Marked, Highlight.js, Lottie, AOS
LLM Providers	Groq (Llama 3.3-70B, Mixtral), OpenRouter (Qwen, Nemotron), Google Gemini (2.5 Flash, Flash‑Lite)
Python Runtime	Pyodide (WebAssembly), custom worker with abort/timeout
Embedding & RAG	Transformers.js (Xenova/all-MiniLM-L6-v2), IndexedDB, hybrid search (cosine + keyword)
Database & Sync	Firebase Auth + Firestore (chat history), localStorage (prefs/keys/templates)
DevOps	Service Worker (offline first), Vercel/Netlify ready

Quick Setup

1. Clone & serve

git clone https://github.com/thekaifansari01/kairos.ai
cd kairos.ai
python -m http.server 8000   # or npx serve

2. Open browser → http://localhost:8000/c.html

3. Optional: Configure Firebase (chat sync) → edit js/modules/auth/firebase.js

4. Override API keys via Settings (⚙️) – built‑in keys work but rate‑limited.

🐍 Python Interpreter

Every code block with language python gets a Run button. Execution happens inside a Web Worker (Pyodide) – completely sandboxed, no server required.

Lazy loading: Pyodide loads only when first Python code runs.
Queue + abort: run multiple snippets, stop long‑running code with Stop.
Timeout protection: 30s max execution, auto‑terminate.
Save output: click 📎 Save output to add result as assistant message.
Limitations: threading, socket, multiprocessing not supported (Pyodide constraints).

# Example: run in chat
print("Hello, local AI!")
for i in range(3): print(i)

📁 RAG – Code Indexing

Kairos can index your entire project (source code, configs, markdown) and inject relevant chunks into the LLM context automatically.

Drop folder / files via the Index Manager (database icon in navbar).
Files are chunked token‑aware (max 500 tokens, overlap 50).
Embeddings generated using all-MiniLM-L6-v2 (Transformers.js) – runs locally in a worker.
Stored in IndexedDB (per‑user when logged in, else global guest DB).
During chat, the system performs hybrid search (cosine + keyword) and prepends relevant code snippets to your prompt.
Manage indexed files: delete individual files or clear entire index.

⚡ Enable/disable RAG via the toggle inside Index Manager – works even without indexing.

🤖 Multi‑Model & Context Management

Click the model indicator (top‑right navbar) or the + button near input to switch models. Supported providers:

Provider	Models (Free)
Groq	Llama 3.3-70B, Llama 3.1-8B, Mixtral 8x7B, Gemma2 9B, Qwen 3-32B, GPT‑OSS 120B
Google Gemini	Gemini 2.5 Flash, Gemini 2.5 Flash‑Lite, Gemini 3.1 Flash‑Lite Preview
OpenRouter	NVIDIA Nemotron‑3 Super, Qwen 3‑80B, many more via same API key

Token‑aware context truncation: system respects model limits, drops oldest messages when needed. Streaming with markdown + syntax highlighting.

📝 Templates & Slash Commands

Type / anywhere in the input field – a dropdown appears with your saved prompt templates. Navigate with ↑/↓, select with ↵ or Tab. Templates are stored in localStorage and sync across devices via preferences (if signed in).

Manage Templates

Click the button in navbar → create, edit, delete templates. Each template has a name and content (markdown supported).

☁️ Cloud Sync & Guest Mode

Sign in with Google (sidebar footer) → conversations saved to Firestore.
Chats are automatically titled (AI generated using Groq Qwen 32B), pinnable, searchable, and grouped by date.
Public/private toggle per conversation (anyone with link can view if public).
Guest mode works without login – history stored only in browser memory (not persistent across sessions).
Clear all chats / delete individual conversations – confirmation modals.

⚙️ Configuration & API Keys

All API keys are embedded by default for demo purposes (rate‑limited). To use your own:

Click the Settings icon (⚙️) in the floating navbar.
Enter your Groq / OpenRouter / Gemini keys.
Keys are saved to localStorage and applied immediately to the current session.

Advanced: Modify js/modules/core/config.js to change default model list or providers.

📌 Example Interactions

💬 "Write a Python script to analyze server logs" → AI writes code, you can run it inline.

📁 "Explain the authentication flow in my project" → (RAG enabled) searches indexed code, returns context‑aware answer.

🔐 "Find security vulnerabilities in this function" → security‑focused analysis.

🎤 (click Speak button on any AI message) → TTS reads the response.

/explain (type slash) → insert a pre‑saved "Explain Code" template.

🔍 Troubleshooting

❌ Python code shows "threading not supported"
Pyodide does not support native threads. Use asyncio or restructure code.

🤖 RAG not injecting context
Make sure RAG is enabled (Index Manager toggle) and you have indexed at least one file. Check IndexedDB via browser dev tools.

☁️ Chat history not appearing after login
Ensure Firebase config is correct. Try refreshing the page. Guest mode does not sync.

🎙️ Voice not working
Web Speech API requires user interaction first (click any button). Also ensure no other tab is using microphone.