Fresh Hacker News | Should you use upper bound version constraints?

▲Should you use upper bound version constraints?(iscinumpy.dev)

46 points by BerislavLopac 13 days ago | 9 comments

▲leni536 13 days ago

Capping at major version level is just common sense, IMO. It almost shouldn't be even possible to leave the major version out when declaring package dependencies. A new version might as well be an entirely unrelated library/application from the viewpoint of compatibility, so it might as well be part of the name of the package. Not unlike how the python major version is part of the name of the interpreter.

Major versions going EOL'd and unmaintained is unfortunate, but that's not a purely technical problem. Releasing a new major version and breaking compatibility with existing users is as much a social decision as a technical one.

I'm not sold on the "semver doesn't work anyway" angle here either, although I admit that it's not perfect.

▲ 13 days ago

▲croemer 12 days ago

Not for Python where it screws you. This article is about Python - the title might have benefitted from that being mentioned.

▲quotemstr 13 days ago

Right. Semver makes no sense. A new major version is just a new package. Model it as one. Want the new package to conflict with the old one? Model the conflict explicitly. Major version numbering is total nonsense that everyone buys into for some reason.

▲eyelidlessness 13 days ago

> Model the conflict explicitly.

If you reject relying on major versioning as the mechanism of detecting that conflict (as you declare it “nonsense”), how do you go about “modeling” the conflict. Like you say, a major new version is a new package. What is the model for that, apart from the thing designating it as that? The API? The behavior of your calls to that API? The infinite unknown scope of internal changes not exposed directly in the API?

I’m not trying to dismiss your comment, I sincerely don’t understand what you’re suggesting people do.

▲quotemstr 13 days ago

IMHO, the only version to which we should ascribe significance is the minor version, which we increment monotonically. systemd uses this approach.

What's the purpose of a major version? If you break compatibility, make a new library with a new name. Incorporate the major version into the library name --- not "sqlite 3.x", but "sqlite3 x". This way, your versioning scheme is less complex overall because it deals in fewer concepts.

If you publish a sqlite4, and it can't share a process with sqlite3, mark sqlite4 in package metadata as being incompatible with sqlite3. Now you get the same single-major-version invariant that some package systems support without the fuss.

▲eyelidlessness 12 days ago

Thank you for explaining! I don’t agree with this, but it makes more sense now, understanding that you’re talking about modeling one’s own package around the concept. It sounded like you were talking about modeling consuming packages around it.

Which is ultimately why what you’re proposing couldn’t possibly work. Breaking changes do occur, and packages do update to dependencies with breaking changes, often cascading their own breaking changes to accommodate that. It would be absurd to have a wave of umpteen “new packages”—with new names!—for effectively a single breaking change reflected through that chain.

▲quotemstr 12 days ago

> It would be absurd to have a wave of umpteen “new packages”—with new names!—for effectively a single breaking change reflected through that chain.

Why? That's the practical effect of cascading breaking changes through the dependency tree anyway. I can't substitute the old thing with the new thing, so why not just give the new thing a new name? But you usually don't need to cascade these things: if my library internally shifts from libfoo3 to incompatible libfoo4 and libfoo isn't part of my library's interface, why would I have to cascade the libfoo3 -> libfoo4 change as a breaking change to my consumers?

▲eyelidlessness 12 days ago

> Why? That's the practical effect of cascading breaking changes through the dependency tree anyway.

It isn’t. That’s my point.

> I can't substitute the old thing with the new thing, so why not just give the new thing a new name?

You can’t substitute it with zero effort. The work involved may be as minimal as a single change in a single place. Of course it might be much more than that. If that’s the case, I think this “new package” = “new name”, while still subjective, becomes a valid concept to consider. But that depends not just on the scope of changes: the nature of what changed, why, and how it manifests downstream all matter.

A great example of what you’re proposing, that seems to have gone well (despite much consternation about it at the time), was the transition from the original Angular.js to Angular 2. A wholly different project, positioned as such.

It becomes more questionable to justify the concept for a release with substantive breaking changes, but which nonetheless undeniably remains a spiritual successor. And it becomes absolutely silly when such substantive changes, while breaking, are absolutely in line with the intent of the original package.

An example of this I encountered yesterday, granted still in proposal, is a change to Zod’s union APIs which brings them much more in line with their original goals and with their original underlying rationale. This would be a breaking change, but if anything it would make the package more true to itself rather than another package.

Downstream dependent packages will necessarily make changes to adopt the proposal. But I’d be utterly shocked if anyone would argue, in good faith, that it makes Zod itself meaningfully different to the degree it should be renamed to make the change.

As for cascading changes, it’s entirely possible to encapsulate such that it doesn’t need to happen, just as you say. But it’s also just as reasonable that some changes, breaking though they might be, are objectively good for both the originating package and downstream users. It may be the case that cascading the change is exactly what everyone in that chain wants to happen, even as they consume the same set of packages.

Versioning can be very good at conveying changes like this! Arbitrarily declaring that any change of this kind must warrant a rename would almost certainly be more disruptive than that. Even if it’s philosophically sound (and here I will just say that it’s famously not), it’s pragmatically… goofy.

▲manfre 13 days ago

This idea would require every package to name squat on the major package versions, just in case.

▲adw 13 days ago

I think they're suggesting "change the name", which neglects the value of identifiers as _things people talk about_. It's a fine approach if module names are equivalent to ISBNs, but they're not really.

▲quotemstr 13 days ago

What's the value of modeling GTK2, GTK3, and GTK4 as the "same" library with a different major version number while modeling Qt as a separate library? They're all incompatible.

▲adw 13 days ago

a) most libraries are not UI frameworks undergoing significant perpetual rewrites (and a lot of the distaste towards those frameworks comes from the perpetual rewrites)

b) even given that, they share a mindset, and being able to talk about that mindset has value (trivial example: all the GTKs are GObject C libraries, Qt is a C++ thing.

You're right if and only if there is no transferable knowledge communicable by saying "GTK", regardless of version (or you're just not talking to anyone).

▲svieira 13 days ago

If you haven't seen the "SemVer rant" in Spec-ulation you're missing out: https://youtu.be/oyLBGkS5ICk?si=ZL7MXObP7sWk7JaZ&t=1798

▲globular-toast 12 days ago

I was about to say this is also the opinion of Rich Hickey (and me). I do use semantic versioning but the major version number never exceeds 1.

▲bruce343434 13 days ago

This whole problem is because in python you can not specify interfaces and say "I need a function called f that takes an x of type T and returns a U", so instead you encode that in an indirect way, like "I know version x.y.z works so I'll just require that".

Any other way risks runtime errors. And to people about to mention types in python: those are also checked at runtime.

People keep using these hyper dynamic languages and then running into these robustness issues and scaling limitations brought on by their very dynamism. It makes me mad and sad.

▲burntsushi 13 days ago

No. It's a problem because you can only have one version of any given package in your dependency tree. You can't have `foo 2` and `foo 3` in your dependency tree. Without that limitation, there is a release valve of sorts where you can incur two different semver incompatible releases of the same package in your dependency tree in exchange for a working build. The hope is that it would be transitory state until all of your dependencies migrate.

Rust, for example, has precisely this same problem, except that it is limited to public dependencies. For example, if `serde 2` were ever to be published, then there would likely be a period of immense pain where, effectively, everyone needs to migrate all at once. Even though `serde 1` and `serde 2` can both appear in the same dependency tree (unlike in Python), because it is a public dependency, everyone needs to be using the same version of the library or else the `Serialize` trait from `serde 1` will be considered distinct from the `Serialize` trait (or whatever) in `serde 2`.

But if I, say published a `regex 2.0.0` tomorrow, then folks could migrate at their leisure. The only downside is that you'd have `regex 1` and `regex 2` in your dependency tree. Potentially for a long time until everyone migrated over. But your build would still work because it is uncommon for `regex` to be a public dependency.

(Rust does have the semver trick[1] available to it as another release valve of sorts.)

This problem is definitely not because of missing interfaces or whatever.

[1]: https://github.com/dtolnay/semver-trick

▲Chris_Newton 13 days ago

It's a problem because you can only have one version of any given package in your dependency tree. You can't have `foo 2` and `foo 3` in your dependency tree.

That does seem to be the fundamental problem with the Python model of dependency management.

If your dependencies have transitive dependencies of their own but your dependency model is a tree and everything is clearly namespaced/versioned, you might end up with multiple versions of the same package installed, but at least they won’t conflict.

If your dependency model is flat but each dependency bakes in its own transitive dependencies so they’re hidden from the rest of the system, for example via static linking, again you might end up with multiple versions of the same package (or some part of it) installed, but again they won’t conflict.

But if your dependency model is flat and each dependency can require specific versions of its transitive dependencies to be installed as peers, you fundamentally can’t avoid the potential for unresolvable conflicts.

A pragmatic improvement in the third case is, as others have suggested, to replace the SemVer-following mypackage 1.x.y and mypackage 2.x.y with separate top-level packages mypackage1 x.y and mypackage2 x.y. Now you have reintroduced namespaces and you can install mypackage1 and mypackage2 together as peers without conflict. Moreover, if increasing x and y faithfully represent minor and point releases, always using the latest versions of mypackage1 and mypackage2 should normally satisfy any other packages that depend on them, however many there are.

Of course it doesn’t always work like that in practice. However, at least the problem is now reduced to manually adjusting versions to resolve conflicts where a package didn’t match its versions to its behaviour properly and/or Hyrum’s Law is relevant, which is probably much less work than before.

▲int_19h 12 days ago

As the article explains, this is precisely why the social expectations around Python package versioning are very different from JS package version (i.e. you can't just break things willy nilly even in major releases and cite semver as justification).

That aside, note the obvious problems here for any language that uses nominal typing - like, say, Python. Since types from dependencies can often surface in one's public API, having a tree of dependencies means that many libraries will end up referring to different (and thus ipso facto incompatible) versions of the same type.

▲Chris_Newton 12 days ago

social expectations around Python package versioning are very different from JS package version

If anything, I’d say in my experience the Python community tends to be more willing to make big changes. After all, Python itself famously did so with the 2 to 3 transition, and to some extent we’re seeing a second round of big changes even now as optional typing spreads through the ecosystem.

Admittedly, the difference could also be because so few packages in JS world seem to last long enough for multiple major versions to become an issue. The Python ecosystem seems more willing to settle on a small number of de facto standard libraries for common tasks.

Since types from dependencies can often surface in one's public API, having a tree of dependencies means that many libraries will end up referring to different (and thus ipso facto incompatible) versions of the same type.

Leaving aside the questionable practice of exposing details of internal dependencies directly through one’s own public interface, I don’t see how this is any different to any other potential naming conflict. Whatever dependency model you pick, you’re always going to have the possibility that two dependencies use the same name as part of their interface, and in Python you’re always going to have to disambiguate explicitly if you want to import both in the same place. However, once you’ve done so, there is no longer any naming clash to leak through your own interface either.

▲int_19h 12 days ago

> After all, Python itself famously did so with the 2 to 3 transition

That transition has been so traumatic for the whole ecosystem that, if anything, it became an abject lesson as to why you don't do stuff like that. "Never again" is the current position of PSF wrt any hypothetical future Python 3 -> 4 transition.

Major Python libraries pretty much never just remove things over the course of a single major release. Things get officially announced first, then deprecated for at least one release cycle but often longer (which is communicated via DeprecationWarning etc), then finally retired.

> Leaving aside the questionable practice of exposing details of internal dependencies directly through one’s own public interface

Not all dependencies are internal. If library A exposes type X, and library B exposes type Y that by design extends X (so that instances of Y can be passed anywhere X is expected), that is very intentionally public.

Now imagine that library C exposes type Z that also by design extends X. If B and C both get their copy of A, then there are two identical types X that are not type-compatible.

Now suppose we have the app that depends on both B and C. Its author wants to write a generic function F that accepts an instance of X (or a subtype) and does something with it. How do they write a type signature for F such that it can accept both Y and Z?

▲Chris_Newton 12 days ago

I’m not sure that’s a realistic generalisation. To pick a few concrete examples, there were some breaking changes in SQLAlchemy 2, Pydantic 2, and as an interesting example of the “rename the package instead of bumping the major version” idea mentioned elsewhere, from Psycopg2 to Psycopg (3). I think it’s fair to say all of those are significant packages within the Python ecosystem.

Not all dependencies are internal. If library A exposes type X, and library B exposes type Y that by design extends X […] Now imagine that library C exposes type Z that also by design extends X

Yes, you can create some awkward situations with shared bases in Python, and you could split all of the relevant types into different libraries, and this isn’t a situation that Python’s object model (or those of many other OO languages) handles very gracefully.

Could you please clarify the main point you’d like to make here? The shared base/polymorphism complications seem to apply generally with Python’s object model, unless you have a set of external dependencies that are designed to share a common base type from a common transitive dependency and support code that is polymorphic as if each name refers to a single, consistent type and yet the packages in question are not maintained and released in sync.

That seems like quite an unusual scenario. Even if it happens, it seems like the most that can safely be assumed by code importing from B and C — unless B and C explicitly depend on exactly the same version of A — is that Y extends (X from A v1.2.3) while Z extends (X from A v1.2.4). If B and C aren’t explicitly managed together, I’m not convinced it’s reasonable for code using them both to assume the base types that happen to share the same name X that they extend and expose through their respective interfaces are really the same type.

▲cardanome 13 days ago

> And to people about to mention types in python: those are also checked at runtime.

They are not checked at runtime at all. Type declarations are only used for static analyzing tools and not by the runtime.

So types are checked BEFORE runtime by the tooling just like they would be in TypeScript or any other language that offers gradual typing.

Yes, the dynamic nature of Python does make type safety and certain performance optimizations very difficult but then again it is the dynamic nature that allows for the high productivity of the language. A static language would be far less ergonomic to use for the typical prototyping and explorative programming done in Python.

▲int_19h 12 days ago

> A static language would be far less ergonomic to use for the typical prototyping and explorative programming done in Python.

A static language without type inference, sure. But that's not the only option.

OCaml, for example, will infer object types for you based on what methods are called with what kinds of arguments inside the body.

▲flakes 12 days ago

> They are not checked at runtime at all. Type declarations are only used for static analyzing tools and not by the runtime.

This is the common usecase, but types certainly are used at runtime by many libraries. Frameworks like FastAPI use the type annotations to declare dependency injection which is resolved during application startup. In other cases like Pydantic, they are used to determine marshalling/unmarshalling strategies.

▲oivey 13 days ago

It makes me pretty mad and sad that people think static languages solve this problem pretty much at all. If you do, I have a version of liblzma for you to install. If you do, do you release versions of your libraries without version numbers because the compiler will catch any mistakes?

▲o11c 13 days ago

Theoretically static languages don't solve this problem, but in practice, programmers writing packages in a static language don't gratuitously break their API every release or so. Which seems far too common in Python-land.

▲oivey 12 days ago

I’m not entirely sure that’s true, and I’m not sure it makes sense to extrapolate all dynamic languages from Python.

Huge amounts of effort are expended on Linux distros ensuring that all the packages work together. Much of and maybe most of those packages are written in static languages.

Many Python packages don’t have issues with things constantly breaking. I find NumPy, SciPy, the Scikits, and more to be rather stable. I can only think of making trivial fixes in the last few years. I have lots of exotic code using things like Numba that’s been long lived. I’m guessing Flask and Django are pretty stable at this point, but I don’t work on that side of things.

Packages undergoing a lot of construction still are less nice. I think that might be the nature of all new things, though. The example at the beginning of this article, TensorFlow, is still a relatively new sort of package and is seeing tons of new development still.

Packaging in Python in 2024 still sucks, which is a uniquely Python issue. Python’s slowness necessitating wrapping lots of platform specific binaries doesn’t help. Seemingly even major Python projects like TensorFlow have really only just started making an attempt to version their dependencies. In one of the issues in the article, the issue was TF pinning things way too specifically in the main project. One of the satellite projects had the opposite issue, not even setting min bounds. The Wild West of unpinned deps make it hard for upstream authors to even know they are breaking things.

Many people know Python packaging sucks, but I don’t think they know how bad it really is. The slowness is also special to Python. Other languages like Julia and Clojure seem to be much better with these difficulties, and I think in large part this is due to early investments preventing the problems from festering.

Rust vs C++ is a good comparison I think. Cargo is better than anything C++ has by far. In C++, it’s common to completely avoid dependencies altogether because the best you’ve had historically is the OS-specific package manager. The issue isn’t static vs dynamic. The issue is early investment in packaging and community uptake.

▲hnfong 12 days ago

> TensorFlow, is still a relatively new sort of package and is seeing tons of new development still.

But I thought Tensorflow is already "dead" and everyone is moving to Torch...?

Even if it's not dead, tf has been around for almost a decade by now.

The landscape of ML is changing rapidly I'll grant you that, so I guess that might necessitate more visible changes esp. on API and dependencies...

▲VS1999 13 days ago

It solves the issue of finding out a function signature changed at compile time instead of runtime, which is infinitely better. The real answer is that serious software developers don't just leave their packages on auto-update and rarely, if ever, update dependencies unless there's a good reason.

▲llm_trw 13 days ago

It doesn't solve the problem of the function body changing though.

▲oivey 12 days ago

It finds the most trivial mistake. It isn’t infinitely better because static typing is far from free.

▲ 13 days ago

▲LegionMammal978 13 days ago

Since when has static typing eliminated backward-compatibility problems in interfaces? Not all visible runtime behavior can be encoded in the type system, unless you're using a proof assistant or formal-verification system.

▲AlotOfReading 13 days ago

Even formal verification misses visible behaviors, the line is just a little farther down. Correct runtime visible behavior can also include side effects like the amount of time taken (e.g. constant time cryptography), or even the amount of heat generated [0]. You're not going to encode everything you could possibly want in a type system without modeling the entire universe, so draw an opinionated line somewhere reasonable and be happy with it.

[0] https://news.ycombinator.com/item?id=39751509#39761349

▲eesmith 12 days ago

I need a function which, given a graph, returns the graph diameter as an integer.

I need another function which, given a graph, uses a faster but approximate method to return the graph diameter as an integer.

I need a third function which, given a graph, returns the graph radius as an integer.

All three of these functions have an identical type signature.

Oh, now I need something which takes a regex pattern string and a haystack string, and returns 1 if the pattern is found in the haystack, otherwise 0.

And the regex "|" pattern must match longest first, not left-right.

And it needs to support "verbose" mode, which allows comments.

And it supports backreference matches.

How do you express that type?

Now I need to numerically integrate some arbitrary function "f" in the range 0.0 to 1.0. Which of the many numeric integration methods should I use which prevents runtime issues like being unable to converge?

▲leni536 13 days ago

Function interfaces are better, but not necessarily sufficient. Semantics of functions can also change in a breaking way.

▲cqqxo4zV46cp 13 days ago

What!? A function’s signature does not completely describe its behaviour. This doesn’t remotely address the problem. Have your preferences all you want, but this is blatantly a case of being blinded by some silly programming language culture war.

▲ 13 days ago

▲hot_gril 13 days ago

Dynamic typing can make importing libraries riskier, but the benefits can outweigh the costs. Also, it's not hard to just not break APIs (unless you're node-redis), and you should have tests anyway if you really care.

▲317070 13 days ago

There are static type checkers, though: https://github.com/google/pytype

So you can specify interfaces (protocols) and check them in your installation process.

▲teaearlgraycold 13 days ago

Just use TypeScript when possible.

▲jbmsf 13 days ago

The most sane thing I've found is:

- Pin all application versions - Don't pin or set upper bounds in libraries. Lower bounds may work. - Use automation to continuously upgrade and test new versions of everything

If you just pin, you fall behind and eventually it becomes expensive to catch up. If you don't pin, you lose repeatability. If you don't automate, the upgrade work doesn't happen reliably.

▲stouset 12 days ago

I just keep everything up to date as often as possible.

It’s rarely a pain in the ass. When it is, it’s because we’ve chosen dependencies poorly and that’s a good signal to move on to something else. Occasionally major versions require a little more time and effort, but just eat the cost and do it.

What’s painful is updating rarely and calcifying. Do the hard thing often and it stops being hard.

▲ijustlovemath 13 days ago

What's your automated upgrade test setup?

▲globular-toast 12 days ago

This is what I do:

1. Use pip-tools to generate a pinned requirements list. This is used to build artifacts like docker images, installer bundles etc. All pins are upgraded periodically (manually). Any version changes are inspected (this requires "knowing" your dependencies a bit and which ones might cause problems). Finally the resulting artifacts are manually tested.

2. An automated build of master every 24 hours from completely unpinned versions. That is installing the package fresh from pyproject.toml. The automated test suite will reveal any upcoming breakage that would happen if we continue without upper bounds on certain packages. If there is breakage we decide whether to accommodate the new breaking dependencies or constrain them with an upper bound (perhaps recording a tech debt).

▲ijustlovemath 12 days ago

So you keep two requirements.txt files?

▲globular-toast 11 days ago

Not exactly. Look up how pip-tools works.

▲dannyz 13 days ago

I agree, I think upper bound constraints go against what is commonly accepted and used in the Python ecosystem. What I try to do on my projects now is to always have a nightly CI test step, in theory if an updated package breaks my package it will be caught fairly quickly

▲croemer 12 days ago

This was previously posted by same user in 2022: https://news.ycombinator.com/item?id=29507681

And by another user 54 days ago: https://news.ycombinator.com/item?id=39486552

▲saila 13 days ago

The general rule I use is that libraries should not specify upper bounds on dependencies but applications should.

I use Poetry for all my projects, but I agree that it exacerbates the issue somewhat with its default npm-style version syntax.

▲croemer 12 days ago

I struggled to refind this post when searching on hn.algolia.com so I'll write a comment that'll help others find it: This very long article describes the pitfalls of capping dependencies in Python libraries. It's specific about Python, it discusses the bad precedent that poetry is setting.

▲kazinator 12 days ago

GNU programs have been doing this for decades w.r.t. Autoconf and Automake versions (which you run into when you need to "boostrap" them to work on them rather than just build them as a downstream user).

▲classified 11 days ago

Of course you should. But only when necessary.