Getting all your teammates to quit giving all their tests names like "testTheThing" is darn near impossible. It's socially painful to be the one constantly nagging people about names, but it really does take constant nagging to keep the quality high. As soon as the nagging stops, someone invariably starts cutting corners on the test names, and after that everyone who isn't a pedantic weenie about these things will start to follow suit.
Which is honestly the sensible, well-adjusted decision. I'm the pedantic weenie on my team, and even I have to agree that I'd rather my team have a frustrating test suite than frustrating social dynamics.
Personally - and this absolutely echoes the article's last point - I've been increasingly moving toward Donald Knuth's literate style of programming. It helps me organize my thoughts even better than TDD does, and it's earned me far more compliments about the readability of my code than a squeaky-clean test suite ever does. So much so that I'm beginning to hold hope that if you can build enough team mass around working that way it might even develop into a stable equilibrium point as people start to see how it really does make the job more enjoyable.
Nine times out of ten this is the only test, which is mostly there to ensure the code gets exercised in a sensible way and returns a thing, and ideally to document and enforce the contract of the function.
What I absolutely agree with you on is that being able to describe this contract alongside the function itself is far more preferable. It’s not quite literate programming but tools like Python’s doctest offer a close approximation to interleaving discourse with machine readable implementation:
def double(n: int) -> int:
“””Increase by 100%
>>> double(7)
14
“””
return 2 * n
What do test names have to do with quality? If you want to use it as some sort of name/key, just have a comment/annotation/parameter that succinctly defines that, along with any other metadata you want to add in readable English. Many testing frameworks support this. There's exactly zero benefit toTryToFitTheTestDescriptionIntoItsName.
Readability of the code makes a lot of it's quality. A working code that is not maintainable will be refactored. A non working cofe that is maintainable will be fixed.
Also, are you a fan of nesting test classes? Any opinions? Eg:
Class fibrulatatorTest {
Class highVoltages{
Void tooMuchWillNoOp() {}
Void maxVoltage() {}
}
}The quality of the tests.
If we go by the article, specifically their readability and quality as documentation.
It says nothing about the quality of the resulting software (though, presumably, this will also be indirectly affected).
I don't know if any test tools that work like that though.
It claims that, in order for tests to serve as documentation, they must follow a set of best practices, one of which is descriptive test names. It says nothing about failing tests when the name of the test doesn't match the actual test case.
Note I'm not saying whether I consider this to be good advice; I'm merely clarifying what the article states.
People grab the first word they think of. And subconsciously they know if they obsess about the name it’ll have an opportunity cost - dropping one or more of the implementation details they’re juggling in their short term memory.
But if “slow” is the first word you think of that’s not very good. And if you look at the synonyms and antonyms you can solidify your understanding of the purpose of the function in your head. Maybe you meant thorough, or conservative. And maybe you meant to do one but actually did another. So now you can not just chose a name but revisit the intent.
Plus you’re not polluting the namespace by recycling a jargon word that means something else in another part of the code, complicating refactoring and self discovery later on.
If anything, in this scenario, I wouldn't even bother printing the test names, and would just give them generated identifier names instead. Otherwise, isn't it a bit like expecting git hashes to be meaningful when there's a commit message right there?
I've been wishing for a long time that the industry would move towards this, but it is tough to get developers to write more than performative documentation that checks an agile sprint box, much less get product owners to allocate time test the documentation (throw someone unfamiliar with the code to do something small with it armed with only its documentation, like code another few necessary tests and document them, and correct the bumps in the consumption of the documentation). Even tougher to move towards the kind of Knuth'ian TeX'ish-quality and -sophistication documentation, which I consider necessary (though perhaps not sufficient) for taming increasing software complexity.
I hoped the kind of deep technical writing at large scales supported by Adobe Framemaker would make its way into open source alternatives like Scribus, but instead we're stuck with Markdown and Mermaid, which have their place but are painful when maintaining content over a long time, sprawling audience roles, and broad scopes. Unfortunate, since LLM's could support a quite rich technical writing and editing delivery sitting on top of a Framemaker-feature'ish document processing system oriented towards supporting literal programming.
Two - so any input value outside of those in unit tests is undocumented / unspecified behavior? A documentation can contain an explanation in words, like what relation should hold between the inputs and outputs in all cases. Unit tests by their nature can only enumerate a finite number of cases.
This seems like such an obviously not great idea...
In Rust, there are two types of comments. Regular ones (e.g. starting with //) and doc-comments (e.g. starting with ///). The latter will land in in the generated documentation when you run cargo doc.
And now the cool thing: If you have example code in these doc comments, e.g. to explain how a feature of your library can be used, that script will automatically become part of the tests per default. That means you are unlikey to forget to update these examples when your code changes and you can use them as tests at the same time by asserting something at the end (which also communicates the outcome to the reader).
Though some replies here seem to keep arguing for my interpretation, so it's not just me.
> Note also that I’m not suggesting that unit tests should replace any form of documentation but rather that they should complement and enrich it.
def get_examples(
source: Path,
minimum_size: float,
maximum_size: float,
total_size: float,
seed: float = 123,
) -> Iterator[Path]:
…
…it’s pretty obvious what those float arguments are for but the “source” is just a Path. Is there an example “source” I can look at to see what sort of thing I am supposed to pass there?Well you could document that abstractly in the function (“your source must be a directory available via NFS to all devs as well as the build infra”) but you could also use the function in a test and describe it there, and let that be the “living documentation” of which the original author speaks.
Obviously if this is a top level function in some open source library with a readthedocs page then it’s good to actually document the function and have a test. If it’s just some internal thing though then doc-rot can be more harmful than no docs at all, so the best docs are therefore verified, living docs: the tests.
(…or make your source an enumeration type so you don’t even need the docs!)
The benefit of explanations in tests is that running them gets you closer to knowing if any of the explanations have bit rotted.
What you appear to have in mind here is the documentation of a test. Any documentation that correctly explains why it matters that the test should pass will likely tell you something about what the purpose of the unit is, how it is supposed to work, or what preconditions must be satisfied in order for it to work correctly, but the first bullet point in the article seems to be making a much stronger claim than that.
The observation that both tests and documentation may fail to explain their subject sheds no light on the question of whether (or to what extent) tests in themselves can explain the things they test.
Obviously unit tests cannot enumerate all inputs, but as a form of programmatic specification, neither do they have to.
For the case you mention where a broad relation should hold, this is a special kind of unit test strategy, which is property testing. Though admittedly other aspects of design-by-contract are also better suited here; nobody's claiming that tests are the best or only programmatic documentation strategy.
Finally, there's another kind of unit testing, which is more appropriately called characterisation testing, as per M. Feathers book on legacy code. The difference being, unit tests are for developing a feature and ensuring adherence to a spec, whereas characterisation tests are for exploring the actual behaviour of existing code (which may or may not be behaving according to the intended spec). These are definitely then tests as programmatic documentation.
Two: my intuition says that exhaustively specifying the intended input output pairs would only hold marginal utility compared to testing a few well selected input output pairs. It's more like attaching the corners of a sheet to the wall than gluing the whole sheet to the wall. And glue is potentially harder to remove. The sheet is n-dimensional though.
Of course, for + it's relatively easy to intuit what it is supposed to mean. But if I develop a "joe's interpolation operator", you think you'll understand it well enough from 5-10 unit tests, and actually giving you the formula would add nothing? Again I find myself wondering if I'm missing some english knowledge...
Can you imagine understanding the Windows API from nothing but unit tests? I really cannot. No text to explain the concepts of process, memory protection, file system? There is absolutely no way I would get it.
That's the natural habitat for code, not formally specified, but partially functioning in situ. Often the best you can do is contribute a few more test cases towards a decent spec for existing code because there just isn't time to re-architect the thing.
If you are working with code in an environment where spending time improving the specification can be made a prerequisite of whatever insane thing the stakeholders want today... Hang on to that job. For the rest of us, it's a game of which-hack-is-least-bad.
Most of the tests I write daily is about moving and transforming data in ways that are individually rather trivial, but when features pile up, keeping track of all requirements is hard, so you want regression tests. But you also don't want a bunch of regression tests that are hard to change when you change requirements, which will happen. So you want a decent amount of simple tests for individually simple requirements that make up a complex whole.
https://hackage.haskell.org/package/ghc-internal-9.1001.0/do...
Two - Ever heard of code coverage? Type systems/type checkers? Also, there's nothing precluding you from using assertions in the test that make any assumed relations explicit before you actually test anything.
I do think that tests should not explain the why, that would be leaking too much detail, but at the same time the why is somewhat the result of the test. A test is a documentation of a regression, not of how code it tests is implemented/why.
The finite number of cases is interesting. You can definitely run single tests with a high number of inputs which of course is still finite but perhaps closer to a possible way of ensuring validity.
Do note TFA doesn't suggest replacing all other forms of documentation with just tests.
Would we prefer better docs than some comments sprinkled in strategic places in test files? Yes. Is having them with the tests maybe the best we can do for a certain level of effort? Maybe.
If the alternative is an entirely standalone repository of docs which will probably not be up to date, I'll take the comments near the tests. (Although I don't think this approach lends itself to unit tests.)
> returns a sum of reciprocals of inputs
Unit Test:
assert_eq(foo(2, 5), 1/2 + 1/5)
assert_eq(foo(4, 7), 1/4 + 1/7)
assert_eq(foo(10, 100, 10000), 0.1101)
This works REALLY well. I've even occasionally done some of my own reviewing and editing of those docs and submitted them back to the project. Here's an example: https://github.com/pydantic/jiter/pull/143 - Claude transcript here: https://gist.github.com/simonw/264d487db1a18f8585c2ca0c68e50...
When you have a codebase sitting around rotting for years and you need to go back and refactor things to add a feature or change the behavior, how do you know you aren't breaking some dependent code down the line?
What happens when you upgrade a 3rd party dependency, how do you know it isn't breaking your code? The javascript ecosystem is rife with this. You can't upgrade anything years later or you have to start over again.
Tests are especially important when you've quit your company and someone else is stuck maintaining your code. The only way they can be sure to have all your ingrained knowledge is to have some sort of reliable way of knowing when things break.
Tests are for preventing the next developer from cursing you under their breath.
I didn't knew a thing about how the business operated and the rationale behind the loans and the transactions. The parts of the application that had unit and behavior tests were easy to work on. Everyone dreaded touching the old pieces that didn't have tests.
- I've never tried to understand a code base by looking at the tlunit tests first. They often require more in depth understanding (due to things like monkeypatching) than just reading the code. I haven't seen anyone else attempt this either.
- Good documentation is good as far as it aids understanding. This might be a side effect of tests, but I don't think it's their goal. A good test will catch breaks in behaviour, I'd never trade completeness for readability in tests, in docs it's the reverse.
So I think maybe, unit tests are just tests? They can be part of your documentation, but calling them documentation in and of themselves I think is maybe just a category error?
Good code can be documentation, both in the way it's written and structured and obviously in the form of comments.
Good tests simply verify what the author of the test believes the behavior of what is being tested should be. That's it. It's not documentation, it rarely "explains" anything, and any time someone eschews actually writing documentation in the form of good code hygiene and actual docs in favor of just writing tests causes the codebase to suffer.
In more complex situations, good tests also show you the environmental set up - for example, all the various odd database records the code needs or expects.
It’s not everything you’d want out of a doc, but it’s a chunk of it.
New rule: if you write a function, you also have to write down what it does, and why. Deal?
If you can't find examples of how to use the code in the code then why does the code even exist?
Somehow extracting your docs from unit tests: might be ok!
Pointing people at unit tests instead of writing docs: not even remotely ok.
Is that really what this guy is advocating??
Couldn't agree more
I'm trying to integrate with a team at work that is doing this, and I'm finding it impossible to get a full picture of what their service can do.
I've brought it up with my boss, their boss, nothing happens
And then the person writing the service is angry that everyone is asking him questions about it all the time. "Just go read the tests! You'll see what it does if you read the tests!"
Incredibly frustrating to deal with when my questions are about the business rules for the service, not the functionality of the service
While documentation is someone's non-precise natural language expression of what (to the best of their imperfect human capacity) expected the code to implement at the time of writing.
Otherwise there is no way to know what is expected behavior or just a mistake built into it by accident
I'd rather have the prose. And if it's wrong, then fix it. I'm so tired of these excuses.
Thanks, "This guy"
In reality, except for the most trivial projects or vigilant test writers, tests are too complicated to act as a stand in for docs.
They are usually abstract in an effort to DRY things up such that you don't even get to see all the API in one place.
I'd rather keep tests optimized for testing rather than nerfing them to be readable to end users.
- What is it?
- What does it do?
- Why does it do that?
- What is the API?
- What does it return?
- What are some examples of proper, real world usage (that don't involve foo/bar but instead, real world inputs/outputs I'd likely see)?
But then I realized that a lot of what makes a set of tests good documentation is comments, and those rot, maybe worse than dedicated documentation.
Keeping documentation up to date is a hard problem that I haven't yet seen solved in my career.
My favorite example is Stripe. They've never skimped on docs and you can tell they've made it a core competency requirement for their team.
At its core, a good test will take an example and do something with it to demonstrate an outcome.
That's exactly what how to docs do - often with the exact same examples.
Logically, they should be the same thing.
You just need a (non turing complete) language that is dual use - it generates docs and runs tests.
For example:
https://github.com/crdoconnor/strictyaml/blob/master/hitch/s...
And:
https://hitchdev.com/strictyaml/using/alpha/scalar/email-and...
You're right, it is a matter of culture and discipline. It's much harder to maintain a consistent and legible theory of a software component than it is to wing it with your 1-2 other teammates. Naming things is hard, especially when the names and their meanings eventually change.
Including screenshots, which a lot of tech writing teams raise as a maintenance burden: https://simonwillison.net/2022/Oct/14/automating-screenshots...
Then there are tools like Doc Detective to inline tests in the docs, making them dependent on each other; if documented steps stop working, the test derived from them fails: https://doc-detective.com/
In the documentation, you can include code examples that, if written a certain way, not only looks good when rendered but can also be tested for their form and documented outputs as well. While this doesn't help with the descriptive text of documentation, at least it can flag you when the documented examples are no longer valid... which can in turn capture your attention enough to check out the descriptive elements of that same area of documentation.
This isn't to say these documentation tests are intended to replace regular unit tests: these documentation tests are really just testing what is easily testable to validate the documentation, the code examples.
Something can be better than nothing and I think that's true here.
Rust doctests. They unite documentation and unit test. Basically documentation that's never so out of sync its assert fail.
This could all easily fit in the top-level comments of a main() function or the help text of a CLI app.
- What is the API?
This could be gleaned from the code, either by reading it or by generating automatic documentation from it.
- What does it return?
This is commonly documented in function code.
- What are some examples of proper, real world usage (that don't involve foo/bar but instead, real world inputs/outputs I'd likely see)?
This is typically in comments or help text if it's a CLI app.
And what should be obvious or it’s still too complex.
"This function exists to generate PDFs for reports and customer documents."
"This endpoint exists to provide a means for pre-flight authorization of requests to other endpoints."
[1]: https://docs.python.org/3/library/doctest.html
[2]: https://doc.rust-lang.org/rustdoc/write-documentation/docume...
I'd suggest that the balance between Unit Test(s) and Integration Test(s) is a trade-off and depends on the architecture/shape of the System Under Test.
Example: I agree with your assertion that I can get "90%+ coverage" of Units at an integration test layer. However, the underlying system would suggest if I would guide my teams to follow this pattern. In my current stack, the number of faulty service boundaries means that, while an integration test will provide good coverage, the overhead of debugging the root cause of an integration failure creates a significant burden. So, I recommend more unit testing, as the failing behaviors can be identified directly.
And, if I were working at a company with better underlying architecture and service boundaries, I'd be pointing them toward a higher rate of integration testing.
So, re: Kent Dodds "we write tests for confidence and understanding." What layer we write tests at for confidence and understanding really depends on the underlying architectures.
a common way to think about this is called the "test pyramid" - unit tests at the base, supporting integration tests that are farther up the pyramid. [0]
roughly speaking, the X-axis of the pyramid is number of test cases, the Y-axis is number of dependencies / things that can cause a test to fail.
as you travel up the Y-axis, you get more "lifelike" in your testing...but you also generally increase the time & complexity it takes to find the root-cause of a test failure.
many times I've had to troubleshoot a failure in an integration test that is trying to test subsystem A, and it turns out the failure was caused by unrelated flakiness in subsystem B. it's good to find that flakiness...but it's also important to be able to push that testing "down the pyramid" and add a unit test of subsystem B to prevent the flakiness from reoccurring, and to point directly at the problem if it does.
> Unit tests have limited benefits overall, and add a bunch of support time, slowing down development
unit tests, _when done poorly_, have limited benefits, require additional maintenance, and slow down development.
integration tests can also have limited benefits, require additional maintenance, and slow down development time, _when done poorly_.
testing in general, _when done well_, increases development velocity and improves product quality in a way that completely justifies the maintenance burden of the additional code.
0: https://martinfowler.com/articles/practical-test-pyramid.htm...
I make this part of my filtering potential companies to work with now. I can't believe how often people avoid doing unit tests.
And the method names are equivalent to the test names. Of course, only if you don't wildly throw around exceptions or return null (without indicating it clearly in the type signature).
"Tomorrow, you will receive your weekly recap on unit tests."
Please, no.
The Coder Cafe is a daily newsletter for coders; we go over different topics from Monday to Thursday, and on Friday, there's a recap ;)
Conversely, if you fail to write a unit test, there is no contract, and the code can freely diverge over time from what you think it ought to be doing.
https://dlang.org/spec/unittest.html#documented-unittests
Nice when combined with CI since you’ll know if you accidentally break your examples.
I wrote a book, and when I created my newsletter, I wanted to have a shift in terms of style because, on the Internet, people don't have time. You can't write a post the same way you write a book. So, I'm following some principles taken here and there. But happy to hear if you have some feedback about the style itself :)
I deeply hate "regression tests" that turn red when the implementation changes, so you regenerate the tests to match the new implementation and maybe glance at the diff, but the diff is thousands of lines long so really it's not telling you anything other than "something changed".
Every statement in the spec has a corresponding unit test, and it’s unbelievably incredible. Hats of to everyone that worked on this.
When learning a new codebase, and I'm looking for an example of how to use feature X I would look in the tests first or shortly after a web search.
It seems to me like the second half of this article also undermines the main idea and goal of using unit tests in this way though.
> Descriptive test name, Atomic, Keep tests simple, Keep tests independent
A unit test that is good at documenting the system needs to be comprehensive, clear and in many cases filled with complexity that a unit test would ignore or hide.A test with a bunch of mocks, helpers, overrides and assumptions does not help anyone understand things like how to use feature X or the correct way to solve a problem with the software.
There are merits to both kinds of tests in their time and place but good integration tests are really the best ones for documenting and learning.
Hitchstory is a type-safe StrictYAML python integration testing framework exploring some interesting ideas around this.
https://hitchdev.com/hitchstory/
Example:
https://hitchdev.com/hitchstory/using/behavior/run-single-na...
Source:
https://github.com/hitchdev/hitchstory/blob/master/hitch/sto...
See also the explanation of self-rewriting tests:
For example this recent feature was added through unit test as documentation.
https://github.com/Attumm/redis-dict/blob/main/extend_types_...
It's of course not documentation in the sense of a manual to the detail of code it exercises, but it definitely helps if tests are proper crafted.
* ScalaSql, where the reference docs (e.g. https://github.com/com-lihaoyi/scalasql/blob/main/docs/refer...) are generated by running unit tests (e.g. https://github.com/com-lihaoyi/scalasql/blob/53cbad77f7253f3...)
* uPickle, where the documentation site (https://com-lihaoyi.github.io/upickle/#GettingStarted) is generated by the document-generator which has syntax to scrape (https://github.com/com-lihaoyi/upickle/blob/004ed7e17271635d...) the unit tests without running them (https://github.com/com-lihaoyi/upickle/blob/main/upickle/tes...)
* OS-Lib, where the documentation examples (e.g. https://github.com/com-lihaoyi/os-lib?tab=readme-ov-file#osr...) are largely manually copy-pasted from the unit tests (e.g. https://github.com/com-lihaoyi/os-lib/blob/9e7efc36355103d71...) into the readme.md/adoc
It's a good idea overall to share unit tests and documentation, but there is a lot of subtlety around how it must be done. Unit tests and documentation have many conflicting requirements, e.g.
* Unit tests prefer thoroughness to catch unintuitive edge cases whereas documentation prefers highlighting of key examples and allowing the reader to intuitively interpolate
* Unit tests examples prefer DRY conciseness whereas documentation examples prefer self-contained-ness
* Unit tests are targeted at codebase internal developers (i.e. experts) whereas documentation is often targeted at external users (i.e. non-experts)
These conflicting requirements mean that "just read the unit tests" is a poor substitute for documentation. But there is a lot of overlap, so it is still worth sharing snippets between unit tests and examples. It just needs to be done carefully and with thought given handling the two sets of conflicting requirements
But they are also pricy
I am interested in how people prevent unit tests becoming a maintenance burden over time.
I have seen so many projects with legacy failing tests. Any proposal to invest time and money cleaning them up dies on the alter of investing limited resources in developing features that make money
if it had "///" it could have test in docs: https://doc.rust-lang.org/stable/book/ch14-02-publishing-to-...
> Unit tests explain [expected] code behavior
Unit tests rarely evaluate performance, so can't explain why something is O(n) vs O(n^2), or if it was supposed to be one or the other.
And of course the unit tests might not cover the full range of behaviors.
> Unit tests are always in sync with the code
Until you find out that someone introduced a branch in the code, eg, for performance purposes (classic refactor step), and forgot to do coverage tests to ensure the unit tests exercised both branches.
> Unit tests cover edge cases
Note the True Scotsman fallacy there? 'Good unit tests should also cover these cases' means that if it didn't cover those cases, it wasn't good.
I've seen many unit tests which didn't cover all of the edge cases. My favorite example is a Java program which turned something like "filename.txt" into "filename_1.txt", where the "_1" was a sequence number to make it unique, and ".txt" was required.
Turns out, it accepted a user-defined filename from a web form, which could include a NUL character. "\x00.txt" put it in an infinite loop due to it's incorrect error handling of "", which is how the Java string got interpreted as a filename.
> Descriptive test name
With some test systems, like Python's unittest, you have both the test name and the docstring. The latter can be more descriptive. The former might be less descriptive, but easier to type or select.
> Keep tests simple
That should be 'Keep tests understandable'. Also, 'too many' doesn't contribute information as by definition it's beyond the point of being reasonable.
> Almost immediately I feel the need to rebut a common misunderstanding. Such a principle is not saying that code is the only documentation.