
From a Personal Itch to an Open Source Tool: How I Built SageDesk
I noticed a gap on my own portfolio and ended up building an open source RAG-powered support widget that any Next.js or Express app can use in a few minutes.
You are deep into building your portfolio. The design is coming together, the projects are in, the copy reads well. And then you sit back and think: if someone lands on this page with a question, what do they do? Email? Scroll forever? There is a gap between what a website shows and what a visitor actually needs. I felt that on my own site, and it was annoying enough that I could not leave it alone.
A Personal Problem That Kept Getting Bigger
The more I thought about it, the more obvious it was that this gap existed everywhere — not just on my portfolio. A product site, a docs page, a SaaS landing page. They all hold information a visitor might want but cannot easily reach. A chat widget that actually understands the site's own content and answers from it felt like something that should be easy to add. The more I thought about it, the harder it was to just leave it.
Before Writing Any Code, a List
Before writing any code, I wrote a list. What does this thing actually need to do? Understand questions in natural language. Search across a knowledge base. Return answers grounded in the site's own content, not invented ones. Work without needing three services and an infrastructure team to set up. Writing it out made the scope feel manageable.
While researching how to make answers feel grounded rather than invented, one pattern kept coming up: RAG, which stands for Retrieval Augmented Generation. The idea is straightforward. Instead of asking a language model to answer from memory, you first retrieve the most relevant pieces of information from your own content, then hand those pieces to the model as context. The model synthesises an answer from what you gave it, not from what it was trained on. That distinction matters a lot for a support tool. The answer is always traceable back to something the site owner wrote.
RAG solves a fundamental problem with language models: they hallucinate. By grounding every response in retrieved content you own, the widget can only answer from what you wrote. No invented facts, no confident wrong answers.
How It Would Work
The architecture took shape quickly once RAG was the pattern. At build time, run a CLI command that reads your knowledge file and converts every entry into a vector embedding, a numerical representation of its meaning. Those embeddings get written to a static JSON index. At query time, the visitor's question gets embedded the same way, and the index is searched for the chunks with the closest semantic meaning. Those chunks become the context. The answer comes from them.
The part I cared most about from the start was keeping it lean. No database to spin up, no separate server just for the index. A static file that ships with the build and a library that knows how to search it. That constraint ended up shaping most of the decisions that followed.
Shipping v1: Local Mode
The first version kept everything in the browser. A small embedding model runs via WebAssembly on the visitor's own device. The index is a static JSON file served alongside your site. No server, no API key, no backend required. A visitor asks a question and the answer comes back in under 100ms, without a single network request leaving their machine.
Shipping that first version felt surprisingly satisfying. Nothing went through an external service, there were no API keys to rotate, and the cost was zero. You run one build command, drop the output into your public folder, and the widget is live. I put it on my portfolio, watched it answer questions about my projects and background, and for a few days that felt like enough.
The Honest Problem with v1
Local mode worked, but the answers felt thin. The small models that run inside a browser are fast and lightweight by design, and that comes at a real cost. They find the right chunk most of the time but cannot synthesise across several chunks, cannot rephrase naturally, and cannot carry any follow-up context. Visitors were getting the right fragment, not a useful response.
There is a real difference between a search result and an answer. A search result points you somewhere. An answer tells you what you need to know. The local version was fast and private but it was surfacing fragments, not synthesising responses. I wanted the second thing.
Adding LLM Mode
The fix was to keep the local embedding step but route the retrieved chunks to a real language model for synthesis. The browser still does the vector search, which keeps the index private and the query fast. The backend handles the LLM call, which keeps the API key server-side and out of the client bundle. The visitor gets a response that reads like something a person composed, pulling from several relevant parts of the knowledge base at once.
The architecture came apart into two clear pieces. The client sends the query and its embedding to your server. The server searches the index, picks the top matching chunks, builds a prompt, calls the LLM, and streams the answer back. If the LLM call fails or times out, it falls back to the raw chunks automatically. The experience degrades instead of breaking.
SaaS or Open Source
This was the decision I spent the most time on. A managed SaaS would have been simpler to ship and easier to monetise. But it would mean asking developers to trust a third party with their visitors' queries, their knowledge base, and their API keys. That felt like the wrong trade. The whole point of this tool is that the answers come from content you own. The infrastructure should be yours too.
I went with open source middleware. You install the package, register a server handler in your own Next.js or Express app, and point it at your own LLM provider with your own key. SageDesk handles the retrieval and the prompt construction. The deployment is yours. Nothing passes through a third-party server.
Getting Started with LLM Mode in Next.js
The full integration is three steps. Build the index from your knowledge file, register the API route handler, and drop the widget into your root layout. Here is what that looks like in a Next.js App Router project.
npx sagedesk build --input knowledge.json --output public/sagedesk-index.jsonimport { createSageDeskHandler } from 'sagedesk/server';
import { resolve } from 'path';
export const POST = createSageDeskHandler({
indexPath: resolve(process.cwd(), 'public', 'sagedesk-index.json'),
provider: 'openai',
apiKey: process.env.SAGEDESK_LLM_API_KEY!,
model: 'gpt-4o-mini',
});import { SageDeskNext } from 'sagedesk/next';
export default function RootLayout({ children }: { children: React.ReactNode }) {
return (
<html lang="en">
<body>
{children}
<SageDeskNext
mode="llm"
endpoint="/api/sagedesk"
agent={{ name: 'Support', theme: 'dark' }}
/>
</body>
</html>
);
}SageDesk supports OpenAI, Anthropic, Google Gemini, DeepSeek, and Groq out of the box. Swap the provider and model fields to switch between them. You can also pass a custom API base URL for any OpenAI-compatible endpoint.
What LLM Mode Actually Changed
The difference is not just answer quality, though the quality improvement is real. It is the character of the interaction. A visitor asking a vague or loosely worded question now gets a response that pulls from several parts of the knowledge base, connects the relevant pieces, and reads like something a person sat down and wrote. It does not feel like a search engine. It feels like talking to someone who knows the site well.
The fallback still triggers when an LLM call fails or times out, so the widget never goes silent. Visitors see the most relevant chunks alongside a short message. The experience is never broken, only occasionally simpler.
What I Would Tell Someone Starting Today
Start with local mode. Write a small knowledge file with the ten most common questions your visitors might ask. Run the build command, drop the widget in, and watch it work. You will see its limitations within a day or two of real use, and at that point you will know exactly what to put in the knowledge file and exactly why LLM mode is worth adding.
The thing that surprised me most was how much the quality of the knowledge file matters. The model is only as good as what you give it. Vague entries produce vague answers. Specific, well-written entries produce specific, useful answers. Writing the knowledge file is half the work, and it is the half that actually shapes the experience your visitors get.
SageDesk is open source under the MIT licence. The repository is on GitHub with full documentation for both modes, all supported providers, and configuration options for the widget and the server handler. The gap I noticed on my own site turned out to be more worth solving properly than I expected when I started.