The code is what it is. `cargo test --workspace` runs across 19 crates. CI on 5 platforms (macOS ARM/Intel, Linux x86/ARM, Windows). JSON output schemas are codegen-checked in CI so docs can't drift from the binary.
If you want to skip the marketing copy and look at engine reasoning instead: PR #240 (audit trail), #241 (column classification + masking), #270 (failed-source surfacing in discover).
I'd rather hear "the code is bad" than "the post sounds AI-written".
'A-Lot' of side projects, hobby projects, etc.. are all using AI tools now. Also for marketing, every sales/marketing firm is using AI. So why critisize this guy inparticular.
AI is pervasive, the train has left the station. So that is not a reason to criticize this project. There might be other reasons, I'm not sure, but not that an AI was used.
Workflows knows task A runs before task B. Rocky knows `dim_customer.email` flows from `raw_users.email_address` through three CTEs in `stg_customers`. Different layer, same word.
I'll be more careful with that framing.
- dbt-style semantic-layer versions (v1/v2 of a model) - schema migration history - branch-based (Rocky already has branches + replay)
Different design choice for each, so it helps to know which problem you're trying to solve.
ClickHouse is tractable through the Adapter SDK without engine patching. If you can share roughly your model count and workload shape, I can put a real timeline on it. Open to community PRs too.
https://news.ycombinator.com/item?id=47340079
Not saying yours are, but them -- dashes certainly looks like it ;)
On the schema-grounded AI angle: agreed. The failure mode you describe — structurally valid SQL that joins on the wrong key or aggregates at the wrong grain because the model hallucinated a relationship — is exactly what the compiler is positioned to catch. AI-generated SQL runs through the type checker before it can land, so suggestions that don't validate against the actual DAG never reach the user. NL-to-SQL tools that integrate a compile step would close exactly the gap you're pointing at.
On your two questions:
1. Branch isolation for stateful models — mixed answer, and worth being honest about:
- Incremental: isolated. The watermark `state_key` includes the resolved schema, and `rocky branch create` swaps the schema prefix. So a branch run reads/writes a different redb key than main and they don't advance each other.
- Snapshot: not yet. Today `rocky branch create` only writes a branch record; it doesn't copy warehouse tables. A snapshot model on a branch starts with an empty table (CREATE TABLE IF NOT EXISTS in the branch schema) and accumulates from the first branch run, with no inherited history from main. That's the gap. The fix is the next wave: native Delta SHALLOW CLONE / Snowflake zero-copy at branch creation, which gives point-in-time snapshot semantics without copy-on-write overhead.
2. Cost attribution. Both bytes scanned and duration are captured per-model in the run record (`bytes_scanned` and duration on `RunRecord`). Budget gating today is on cost (USD) and duration — `max_usd` and `max_duration_ms` in `[budget]` blocks in `rocky.toml`, as independent thresholds. A direct bytes-scanned budget threshold isn't gateable today; the bytes are in the run record for analysis but you can't currently fail CI on "this run scanned more than N TB". Reasonable extension if there's demand. To your Snowflake point: the warehouse-size × duration credit model and the scan volume tell genuinely different stories, so they're tracked separately rather than rolled into a single number.