Docs · More

Search and OCR

Every capture carries an on-device text layer your agent can read and search, so it can find the screenshot from an hour ago without you scrolling back to it.

A capture isn't just pixels. The moment you take one, Noru reads the text off it on your Mac and keeps that text alongside the image. That text layer is what lets your agent search your captures by what was in them, and skip the pixels entirely when it only needs the words.

The on-device text layer

Noru runs OCR on every capture using Apple's built-in vision framework. It reads the on-screen text, an error message, a URL, a stack trace, the label on a button, and stores it with the image. No capture or text ever leaves your machine to do this. There's nothing to enable and nothing to wait for, it's just there on every shot.

The text comes through verbatim, the same words your agent gets and the same words you'd read on screen. That's the point of OCR here: it gives the model a second, lighter way to understand a capture, and it makes your history searchable.

Searching by what you saw

You don't run a search yourself. You describe the capture you mean and your agent reaches for search_captures, which looks through the on-screen text, any transcript, and your notes for the word or phrase. Talk to it the way you'd talk to a person who was looking over your shoulder:

"Pull up the screenshot with the Stripe 400 error" or "where was that stack trace I showed you earlier?"

Search returns lightweight locator rows, not images, so it's cheap. Your agent reads the hits, picks the right one, and then pulls the actual pixels with get_capture. It's all read-only: searching never consumes or disturbs the live handoff of whatever you captured most recently.

Just the text, when that's all it needs

Sometimes the words are the whole answer and the image is wasted tokens. When your agent only needs what a capture said, it passes text_only and gets the text layer without the picture. Good for a long error log or a wall of config, where reading beats looking. Most of the time it sends the image too, because seeing the layout matters, but the option is there when you're minimizing tokens.

Pro: recall by meaning and by sight

The free text search is exact: it finds captures that literally contain your words. The memory tier adds two ways to find things when you don't remember the exact wording.

semanticPro Surfaces captures related by meaning, not just matching letters, so "the login bug" can find a capture that never used that phrase.

similar_toPro Finds captures that look like a given one through on-device image similarity, for when you remember the picture, not the words.

Both run on your Mac, like everything else. They're part of the one-time Pro purchase, alongside audio. See pricing for what's free and what's Pro.

It's all in the tools

Search, fetch by id, text-only, and visual recall are tools your agent calls, not buttons you press. The full set, and exactly when each one fires, is in the tool reference.