At first I thought the main improvement would be that the search would be faster, but rg is already pretty freakin fast when the fs cache is warm.
What really ended up being the big efficiency improvement is the token efficiency. When you structure all of the transcripts in a SQL table, the agent can retrieve exactly what is needed (such as "print me the lite transcript, without the intermediate messages").
Meanwhile, the other todo list take, one I've undertaken as well, is to cross sync all the Claude Codes across all its instances on all your machines.
There are multiple projects that claim to do this. None do it fully. (They particularly have blind spots to tools that embed a Claude Code, such as the Xcode 26.5 and Xcode 27 beta.)
So: roll one's own, and in doing that, realize that it has first class tools to make back referencing transcripts normal.
Given those tools, you don't really need an extra layer.
How have you enjoyed the semantic search?
a couple of times I was certain that there was a session that contained some word but in reality it was in my personal claude.ai web account, so needed to add the import functionality there.
my favorite piece is the `corrections` command which surfaces all my frustrations/corrections in the last week for example... and I can then figure out if missing context would improve those scenarios going forward
And yea on the import thing, there are quite a few instances when session records can live on other machines, like cloud agents, dev boxes, etc.
Do you have any interest in sharing some transcripts with team members? I'm trying to figure out the shape of this solution because often times people I work with want to see what I did or fork one of my sessions, but I also don't necessarily just want unlimited dumping because I'm sure I have personal details in there too.
if i do want to share context i'll use something like "give me a prompt $coworker can share with their claude to continue this work"
However, I'm puzzled by pi support: https://github.com/ctxrs/ctx/issues/40
Sure they can. Just ask them. Some (like Claude Code) even have built in tools for it that work a treat. It'll happily rebuild an entire edit history diff by diff.
The bigger point is that when they do go spelunking in the old session logs, it is extremely token inefficient, and you can often fill up an entire context window and force a compaction just by trying to put together a transcript or summary.
The goal here is less of doing something previously impossible, but doing it in a way that makes it so efficient and cheap that you can have agents do it very often, like before they start on every single task.
We considered this, but the main thing you gain from this tradeoff is some disk space and cleaner retention semantics from not having to duplicate all of the searchable text.
But you still have to do the parsing and ingestion work to build the index in the first place, so CPU time does not go away.
And you still have to store the indexes and enough metadata to map results back to the raw session files, which bounds the benefit of not duplicating the data.
The main downside is flexibility (you would lose the ability to do arbitrary SQL queries, semantic search on top of structured corpus, etc)
But I would love to see if I can be proven wrong on this!
Creating ground truth is an orthogonal problem - I try to work hard to put it into specs and docs and regularly update those.
Searching history is closer to "super git blame" or like looking through logs. We should expect a lot of stuff went wrong in there.
https://wng.org/articles/the-high-cost-of-negligence-1617309...
Of course, it's impossible to know for sure what was LLM processed or not, but some of your posts (like this one) have been getting classified that way.
There's also a large fuzzy area these days where people are using tools to edit, "polish", etc., but do not think of it as using an LLM to write. This is particularly the case with non-native English speakers.
A few recent cases where this sort of thing came up:
https://news.ycombinator.com/item?id=48467726