Fresh Hacker News | Launch HN: Airweave (YC X25)

▲Launch HN: Airweave (YC X25) – Let agents search any app(github.com)

117 points by lennertjansen 5 hours ago | 8 comments

▲ashu1461 33 minutes ago

Great release,

1. How do you decide whether to cache the data into a vector database or fetch it on runtime using a tool call ? 2. Slowly all players like Open AI / Claude are trying to provide a somewhat equivalent offering of connecting your workspaces and then providing search on top of it either via direct integrations / mcp servers, how do you see that spanning out ?

▲raufakdemir 24 minutes ago

Airweave always indexes everything. We do not do any direct tool calling currently.

▲btown 1 hour ago

How do you compare to Onyx? We've used it for some limited use cases, but one of the real challenges - and one I hope to see a lot of innovation on in the space - was permissioning.

I see in another comment that you encourage each user to build their own dataset with their own permissions, but often this breaks for founders. If I have a Super Secret Personnel Planning Google Doc at a founder level, how can I be the one to set up the system for our company, but ensure that only files that I've explicitly shared with the company are ingested? What if a file needs to be made anyone-with-link-can-access for sharing with a strategic partner, but that shouldn't be indexed for the entire company?

Far too much of the world relies on the security-by-obscurity of public-but-unindexed links, and communications that might look public from a metadata perspective but were carefully designed for a very specific group of people who have verbal/mental context about confidentiality expectations. Being able to categorize by likely confidentiality, and allowing an administrator to partition access on a project and sub-project basis based on that, might be crucial for growth.

My recollection is that Onyx had limited support for some security use cases, but very rudimentary. Hoping you can solve this in a thoughtful way!

Onyx links for comparison:

https://www.onyx.app/

https://docs.onyx.app/developers/guides/chat_guide

https://docs.onyx.app/admin/connectors/official/

▲raufakdemir 12 minutes ago

It’s a good point. It IS hard to map the various “off-market RBACs” onto a unified model and this is part of the reason we delay that - and instead handle it with per-user syncs that include the q=“sharedWithMe” parameters.

As for intelligently - but probabilistically - determining confidentiality (if I read that correctly), that does sound pretty interesting in scenarios where metadata is just simply insufficient. Also tricky. Sounds like you thought about these problems pretty deeply.

▲ameyamk 3 hours ago

Looks good. Curious, how is auth handled? Lot of docs have permissions etc. Can you clarify how this is handled in both indexing side and searching side of things?

▲raufakdemir 2 hours ago

Great question. We usually sync per user in cases where this matters. That seems inefficient until you realize the following: for most teams, workspace data is pretty small - at least compared to other data workloads (CRMs << 1gb).

We plan to implement unified ACL syncs to dedupe the data or even have 1 sync per org, but that’s mostly a cost optimization; Airweave will just scale horizontally until then.

▲suprnurd 4 hours ago

Looks great! It's cool how you are able to unify multiple sources into a single searchable layer. I’m curious how you chose which connectors to support first (e.g. GitHub, Notion, Slack) and how you plan to scale connector coverage? Thanks!

▲lennertjansen 4 hours ago

it's currently guided by community feedback, github issues, and user talks. and we rely on private e2e test suites for maintaining quality as we scale coverage

▲candiddevmike 2 hours ago

Seems like Google Agentspace but without the UI. Do you folks keep a persistent copy of the data being ingested? How are you planning on solving RBAC? IMO, all of these "search anything" apps are going to be leaky by design unless you're indexing/gathering on the fly using passthrough credentials...

▲raufakdemir 2 hours ago

Great question. We do index the data!

We usually sync per user. That way we make sure that no information leaks to another interface.

▲janwilmake 1 hour ago

Hey Lennert, congrats on the launch! Still open to chat about uithub

▲ripped_britches 3 hours ago

Cool deal. How is this different from Glean?

▲raufakdemir 2 hours ago

Glean is enterprise search for humans. Airweave is built for agent developers that want to access their user’s (so the person using the agent product) information

▲andric 45 minutes ago

Your pricing currently seems prohibitive for that kind of use case. Shouldn't it be usage-based so one can build a product where users can connect their apps without having to worry about arbitrary limits on plans? There should be a PAYG option that simply charges per connection, and automatic volume discounts.

▲raufakdemir 30 minutes ago

Definitely worth looking into.

▲EGreg 3 hours ago

"Give us access to any information on your computer."

And who is "us"?

"Well, our agents, of course. We'll send the information down to our servers, because -- surprise -- we have the GPU infrastructure to run it, and you don't. Don't worry, it's secure."

"Alright, well--"

https://www.wiz.io/blog/38-terabytes-of-private-data-acciden...

"Oops! Well don't worry, it's not like we're the first ones to sell your usage data..."

https://ferrumit.com/resources/it-s-now-legal-for-isps-to-se...

"You see! Well, just send us your DNA we'll analyze it -- with science! I mean with AI..."

"Alright, here is--"

https://www.nytimes.com/2025/05/19/business/regeneron-pharma...

"Oops! Well don't worry, it's not like the company that bought us will do anything with your data, that we wouldn't have done."

Here's my question...

1) How much can we feasibly run on a consumer-grade GPU today, on-board the computer, either the latest macbook or latest mobile iphone? Does Apple Metal + Silicon ship with any models that are on board the latest iOS 26?

2) How can we extend the security boundary to GPU servers that are attested black boxes that store data encrypted at rest, guaranteed not to train on it and are not owned by some corporation that can peek at the data?