Fresh Hacker News | EvanFlow – A TDD driven feedback loop for Claude Code

▲EvanFlow – A TDD driven feedback loop for Claude Code(github.com)

84 points by evanklem2004 10 hours ago | 17 comments

▲conception 36 minutes ago

If you’re just looking for the TDD part - https://github.com/nizos/tdd-guard - is the only project I’ve come across that actually enforces it with hooks and blocks edits rather than relying on a prompt that gets context rotted away.

▲Deeds67 7 hours ago

To be honest, the official superpowers/brainstorming skill already does TDD so well, I don't see that much of a need for this. TDD is definitely the way to go with agentic development.

▲synergy20 2 hours ago

how?i saw superpowers/brainstorming but never saw tdd code produced

▲jghn 1 hour ago

It’s supposed to do this, but I’ve found it doesn’t always do it

▲s20n 9 hours ago

EvanFlow - thoughts arrive like butterflies?

▲sbseitz 9 hours ago

Oh, he don't know, so he chases them away

▲jamesbfb 9 hours ago

Oooohhhh

▲ge96 7 hours ago

Seeeethinnggg tests failing not complete... again

▲__mharrison__ 7 hours ago

Someday soon he'll begin his life again

▲ 9 hours ago

▲shruubi 8 hours ago

Two questions

1) Do you not feel self-conscious or weird about calling this "EvanFlow"? Seems like a lot of people these days are naming their AI tools/skills/whatever after themselves which seems self-absorbed. Either that or they hope that if their thing takes off like OpenClaw did then they'll grab the fame that comes along with it.

2) Why does your TDD flow miss the refactor step of TDD?

▲phyzix5761 1 hour ago

Let the guy have something. Free and open source developers work tirelessly for free for years supporting software that billion dollar companies use to make huge profits.

We don't question when scientists name stuff after themselves so why question this? At least he gets some recognition for his work.

▲ 34 minutes ago

▲toyg 5 hours ago

I initially thought it was a pun on Pearl Jam's classic "Even Flow", then I read your comment and noticed the username... Sad.

▲mansilladev 8 minutes ago

I was really hoping this was something I could find on CPAN from the author username perlJam.

▲infecto 1 hour ago

1) Do you feel weird asking a question like this? What constructive benefit does it add to any dialogue?

Sometimes it’s helpful to ask oneself what’s the benefit of an answer. I cannot think of any for your question and the way you worded it is a bit cringe. People name things after themselves all the time. It does not matter in the slightest.

▲wenc 8 hours ago

I feel like 1 is a self correcting problem. If this goes nowhere it will soon be forgotten.

I can think of one example that did go somewhere: Linux.

▲anon_46135 58 minutes ago

TanStack was started by a guy named Tanner

Debian is a portmanteau of Debra (Ian's girlfriend) and Ian.

I don't mind it. It's just a name

▲stingraycharles 1 hour ago

ReiserFS is another one that comes to mind.

And djb (the djb) also wrote djbdns.

There are plenty of examples, usually when it coincides with someone’s first project.

▲ 55 minutes ago

▲cornyhorse 54 minutes ago

Debian is an even better example

▲globular-toast 3 hours ago

Linus did not name it Linux himself: https://en.wikipedia.org/wiki/Linux#Naming

▲u_fucking_dork 1 hour ago

He merely laundered it through a coworker.

▲cindyllm 1 hour ago

[dead]

▲EvanKnowles 3 hours ago

Feels like a bonus to me.

▲normie3000 8 hours ago

Ref 1, he should have called it Daughter.

▲reitzensteinm 8 hours ago

No Code, surely?

▲ButlerianJihad 5 hours ago

"Evenflo is a hundred year old infant feeding brand." Probably named to market its baby bottles and accessories.

Everybody who grew up to listen to Pearl Jam had seen or used an Evenflo pacifier, baby bottle, or car seat. That's one reason the song already sounded so familiar.

▲dmitry_dv 3 hours ago

The refactor step is the silent casualty in AI-assisted TDD. Once the test is green, Claude optimizes for moving to the next test, not for cleaning up the impl that just passed. An "iterate-until-clean" pass at the end is a different thing: you're refactoring cold code, not refactoring with a freshly-written test as the safety net.

▲pydry 16 minutes ago

When I first used agentic coding I was already doing strict TDD and I just tried using it for the refactor step.

It sucked so hard I thought the idea of agentic coding was just a joke. Ive tried it periodically and it literally never stopped sucking.

I figure if it cant do that part it isnt worth using it for any part.

Ever since then whenever people tell me it's gotten better I've tried it out and nope, still sucks.

I still get gaslit about how well it works by people who just discovered TDD though, and watch it power through CRUD boilerplate getting impressed, blissfully unaware that boilerplate spew is an antipattern.

▲evanklem2004 10 hours ago

Built this as an opinionated Claude Code development flow based on evidence based practices and what has been working for me while developing professional code.

EvanFlow is a single TDD-driven loop. Say "let's evanflow this" and it walks brainstorm → plan → execute → tdd → iterate → STOP. Real checkpoints at design and plan approval. Never auto-commits, never auto-stages, never proposes integration - every git op is your call.

The three things that actually changed how I work:

1. Vertical-slice TDD. One failing test → minimal impl → next test. Watch each test fail before writing the impl that passes it. (Sounds obvious. Almost no agent does it by default. ~62% of LLM-generated test assertions are wrong per HumanEval research, so testing TDD discipline matters more than the impl discipline.)

2. Embedded grilling at decision points. Before locking a plan: what breaks if a user does X? What's the rollback? What's explicitly out of scope? Catches design flaws while they're still cheap.

3. Iterate-until-clean (hard cap of 5 rounds). Re-read the diff against dead code, naming, the deletion test, assertion correctness, and a Five Failure Modes pass (hallucinated actions, scope creep, cascading errors, context loss, tool misuse). For UI: screenshot via headless Chromium.

For bigger plans with 3+ independent units sharing types, it forks into a parallel coder/overseer orchestration. Integration tests at touchpoints ARE the cohesion contract.

Three install paths: Claude Code plugin marketplace, npx skills add, manual copy. MIT.

▲girvo 7 hours ago

Please don’t post AI generated comments :(

Just write it yourself. I promise it’s worth it

▲deaux 3 hours ago

He's even being cheeky by intentionally replacing the em-dash by a regular dash, haha

▲girvo 1 hour ago

It's quite well done really, but the cadence...

No x. No y. No z. Just abc.

Its like nails on a chalkboard...

▲dpark 7 hours ago

I’ve thought of going down the TDD model for LLMs as a way of providing constraints on their behavior. I would think that “vertical slice” TDD would encourage the LLM to start tailoring the tests to the implementation rather than establishing the invariants up front, though. I was considering “horizontal” TDD to force the agent to implement constraints before coding to them.

▲alex1sa 3 hours ago

[dead]

▲lukewrites 7 hours ago

Curious, In the repo you mention

> Several rules come from 2025-2026 industry research on agentic coding failure modes

What are some of the papers you read?

▲esperent 6 hours ago

With no disrespect intended because this is also how I would do it (but I wouldn't publish and name it after myself!) - they didn't read the research. They had the AI that actually created this do that for them.

▲esperent 6 hours ago

> execute → tdd

How are these separate steps?

TDD is how you execute, not something you tack on afterwards.

▲nghnam 4 hours ago

superpowers/brainstorming is doing TDD as well.

▲ 2 hours ago

▲jtfrench 9 hours ago

How does this handle “dumb zone” evasion while looping?

▲cratermoon 8 hours ago

https://www.evenflo.com/

▲tommy29tmar 6 hours ago

[dead]

▲enesz 4 hours ago

[dead]

▲youwangd 5 hours ago

[dead]

▲jonahs197 8 hours ago

[dead]

▲marsven_422 7 hours ago

[dead]

▲here2learnstuff 7 hours ago

[flagged]

▲xaxfixho 3 hours ago

i'm new around here, how do i *DOWN VOTE* stuff?

▲fragmede 5 hours ago

Linus started Linux when he was 21, an undergrad at the University of Helsinki. You're entirely welcome to use whatever filtering function for products you use, but it doesn't seem like soley using this particular product's creator's age as a disqualifier comes from a place of sound reasoning, to me.

▲avyjit 6 hours ago

This is such a BS take. If you feel the product is immature or not great - that's valid criticism. This is not

▲sdevonoes 5 hours ago

TDD in 2026? Besides, TDDs main benefit is to come up with a decent architecture for your system… LLMs can already do that if instructed. I don’t see the point of TDD

▲myko 1 hour ago

I've always been hesitant to prescribe TDD to _everything_ until agentic coding agents came along. TDD is a great way to keep them on track.