I spent a lot of time searching the internet and realized that my personal knowledge was scattered everywhere: notes, PDFs, bookmarks, snippets, links, images, voice recordings — all in different places, never findable when I needed them. Notion was good, but I didn’t want to put my brain’s content into another SaaS service.

Then I thought: “Why don’t I build my own?”

And I did.

What is Memorito?#

Memorito is a self-hosted, multimodal knowledge base that I built for myself — not a ready-made product I use, but something I coded for my own needs and continuously develop. It can process, index, and search text, URLs, images, audio files, and PDFs. Not just keyword search — semantic search, meaning it finds relevant content based on meaning.

The gist: throw anything in, and later you can search for it in natural language, like asking a personal assistant.

The code is open source and available on GitHub: https://github.com/Yappito/memorito. If you’re interested, check it out, and if you have ideas or a pull request, I’d gladly accept them.

Why did I build it?#

Honestly? Because it was annoying to search everywhere. I have a QNAP NAS, I have a Proxmox homelab cluster, and I thought: “Why shouldn’t I run my own knowledge base?” I wanted my own software, not a SaaS product.

The main motivations:

  • My data, my server — I don’t give my brain’s content to a SaaS company
  • Multimodal — not just text, but images, audio, PDF too
  • Semantic search — not keyword match, but actual meaning-based search
  • Automatable — API endpoints, so agents can use it
  • Self-hosted — full control, in my own infrastructure
  • My own code — if something doesn’t work, I fix it myself, no need to write a support ticket

Vibe-coding, or the Sin#

Yes, I vibe-coded it. Through GLM-5.1, via OpenCode, the v1 was done in one afternoon. Of course I refined it a lot afterwards, but the basic idea and most of the code came from vibe-coding. I’m not a developer at all, I’m an infrastructure engineer, lately an architect. But with large language models today, you can really put together a working app if you have a good prompt and a clear idea.

The key is knowing what you want. If you can precisely describe what you want, and you have an AI partner who can translate it into code — then you don’t need to wait weeks for a feature. A few hours and it’s done.

How it works?#

Ingestion — feeding content#

There are five endpoints for processing, all async:

  1. Text — direct text entry with title and tags
  2. URL — web page fetching and indexing
  3. Image — image upload for visual search
  4. Audio — audio file upload, automatically transcribed (with z.ai GLM-ASR-2512 model)
  5. PDF — document upload for full-text search

Every ingestion is async: POST a job, then poll the /api/ingest/<id>/status endpoint until it returns COMPLETED. No need to wait for it to finish — just submit it and search later.

Retrieval — search and query#

Two main ways to search:

  • /api/search?q=... — hybrid semantic search, filterable by source type, tags, date range, reranker toggle, pagination
  • /api/items/<id> — querying a specific item, returning metadata, chunks, duplicate links, and a presigned S3 download URL for media files

Modifying existing data#

PATCH endpoints let you update titles and tags without re-uploading the content. Useful if you messed up the title or want to add tags later.

The technical background#

The stack is not exactly average:

  • Next.js 15 App Router + TypeScript strict mode
  • tRPC v11 (internal UI) + REST ingestion API (external callers)
  • Prisma + PostgreSQL 17 + pgvector (hybrid search)
  • Redis + BullMQ (async job queue)
  • Jina embeddings v4 (1024-dimensional) + Jina reranker v3
  • Provider-agnostic LLM layer — can handle four different models:
    • zai — z.ai Coding Plan API (GLM-5.1, /layout_parsing, ASR)
    • local — local OpenAI-compatible server (e.g. llama.cpp)
    • custom — any OpenAI-compatible endpoint
    • ollama-cloud — Ollama Cloud (https://ollama.com/v1)

Audio processing uses a separate z.ai GLM-ASR-2512 model, independent of the LLM provider.

Storage#

Files go to an S3-compatible backend — I run Garage on a QNAP NAS. Download URLs are presigned URLs, valid for about 1 hour. This means Memorito doesn’t store files directly — it sends them to the S3 backend and serves them from there.

How OpenClaw and Hermes agents connect?#

This is where it gets really interesting. I have two AI agents (OpenClaw and Hermes) that use the Memorito API. I didn’t write the code — the AI generated it, but I designed the architecture, and I know exactly how every endpoint works and how to get the most out of it.

OpenClaw#

OpenClaw uses a skill (openclaw-skill/SKILL.md) that sends curl-based HTTP requests to the Memorito API. Key behaviors:

  1. Ingestion with sourceName: "openclaw" tag — so I know which content came from the agent
  2. Polling until COMPLETED — never jump to search before ingestion is done
  3. Search on the /api/search endpoint
  4. Full context retrieval on the /api/items/<id> endpoint for relevant results
  5. Translation to English — non-English results are always translated to English
  6. File download and delivery — if there’s a media file in the result, it downloads from the downloadUrl and sends it to the user

Important: the downloadUrl is a presigned S3 URL, not a Memorito endpoint. Download directly with curl: curl -o file.jpg "$DOWNLOAD_URL".

Hermes#

The Hermes agent works similarly, but with some differences:

  • Netcat fallback — curl stutters in the VM environment, so it uses netcat for raw HTTP
  • File delivery on messaging channels — for Telegram/WhatsApp, the MEDIA: tag isn’t enough, the file must be downloaded to ~/.openclaw/workspace/, then sent with openclaw message send --channel <channel> --target <target> --media <local-path>
  • GLM-5.1 reasoning modelgenerateTitle needs minimum 2048 tokens, otherwise the response may be empty (reasoning tokens eat the budget)

What do I use it for?#

Basically I can throw anything in and search for it later:

  • Tech articles, documentation — URL ingestion, search later if I remember the topic
  • PDF documents — research materials, manuals
  • Audio recordings — meeting notes, podcasts, my own voice notes
  • Images — screenshots, diagrams, anything visual
  • Text notes — quick thoughts, tips, solutions
  • Photos of notes, pamphlets, IKEA furniture labels — this is the biggest use-case for me: I photograph something (e.g. an IKEA furniture label, a pamphlet, a note), and later search for it, or ask the AI agent to find the item URL, save the dimensions, or supplement the information based on the extracted data.

Agents don’t ingest content automatically — they only send data to Memorito when I instruct them to. When I search for something, Memorito returns the relevant results with full context.

Summary#

Memorito is a strong, well-designed knowledge base that I built for myself, and it actually works. Not just another bookmark manager, but a real semantic search layer for your personal knowledge. The agent integrations (OpenClaw, Hermes) make it truly useful — you don’t need to ingest everything by hand, the agents automatically collect and categorize the content.

And since we’re talking about my own code — if you’re interested in Memorito, check it out on GitHub: https://github.com/Yappito/memorito. It’s open source, so no need to fear vendor lock-in. If you have ideas, bug reports, or pull requests — I’d gladly accept them.

If you’re also tired of your knowledge being scattered across ten different places, and you want an AI agent to help you find what you’re looking for — worth trying. Or if you feel like you could build it too — go build it, it’s a lot of fun. You really don’t need to be a professional developer to build your own software today.