Fresh Hacker News | A real-world benchmark for AI code review

▲A real-world benchmark for AI code review(qodo.ai)

30 points by benocodes 2 hours ago | 9 comments

▲falloutx 1 hour ago

Company creates a benchmark. Same company is best in that benchmark.

Story as old as time.

▲mattvv 38 minutes ago

Some feedback for the team, looked at pricing page and saw it more expensive ($30/dev/mo) and highly limiting (20prs per month per user). We have devs putting up that many prs in a single day. With this kind of plan pretty much no way we would even try this product

▲esafak 37 minutes ago

It's true, those are some pre-AI quotas.

▲esafak 55 minutes ago

I'm not as cynical as the others here; if there are no popular code review benchmarks why should they not design one?

Apparently this is in support of their 2.0 release: https://www.qodo.ai/blog/introducing-qodo-2-0-agentic-code-r...

> We believe that code review is not a narrow task; it encompasses many distinct responsibilities that happen at once. [...]

> Qodo 2.0 addresses this with a multi-agent expert review architecture. Instead of treating code review as a single, broad task, Qodo breaks it into focused responsibilities handled by specialized agents. Each agent is optimized for a specific type of analysis and operates with its own dedicated context, rather than competing for attention in a single pass. This allows Qodo to go deeper in each area without slowing reviews down.

> To keep feedback focused, Qodo includes a judge agent that evaluates findings across agents. The judge agent resolves conflicts, removes duplicates, and filters out low-signal results. Only issues that meet a high confidence and relevance threshold make it into the final review.

> Qodo’s agentic PR review extends context beyond the codebase by incorporating pull request history as a first-class signal.

▲mbesto 2 hours ago

Cmd+F - "Overfitting"...nothing.

Nope, no mention of how they do anything to alleviate overfitting. These benchmarks are getting tiresome.

▲CuriouslyC 2 hours ago

I don't think LLMs are the right tool for pattern enforcement in general, better to get them to create custom lint rules.

Agents are pretty good at suggesting ways to improve a piece of code though, if you get a bunch of agents to wear different hats and debate improvements to a piece of software it can produce some very useful insights.

▲logicx24 50 minutes ago

Where's the code for this? I'd love to run our tool, https://tachyon.so/, against it.

▲mdeeks 1 hour ago

I feel like pricing needs to be included here. I kind of don't care about 10 percentage points if the cost is dramatically higher. Cursor Bugbot is about the same cost but gives 10x the monthly quota of Qodo.

I know this is focused solely on performance, but cost is a major factor here.

▲kachapopopow 1 hour ago

coderabbit being the worst while (presumeably) advertising the most seems to be check out at least, wouldn't believe the recall % seems bogus.

▲aetherspawn 1 hour ago

Your pricing page has a bug on it, the annual price is higher than the monthly price.

▲zamadatix 1 hour ago

I'm seeing $30/m at annual and $38/m at monthly? (maybe already fixed, hard to tell)