Major versions going EOL'd and unmaintained is unfortunate, but that's not a purely technical problem. Releasing a new major version and breaking compatibility with existing users is as much a social decision as a technical one.
I'm not sold on the "semver doesn't work anyway" angle here either, although I admit that it's not perfect.
If you reject relying on major versioning as the mechanism of detecting that conflict (as you declare it “nonsense”), how do you go about “modeling” the conflict. Like you say, a major new version is a new package. What is the model for that, apart from the thing designating it as that? The API? The behavior of your calls to that API? The infinite unknown scope of internal changes not exposed directly in the API?
I’m not trying to dismiss your comment, I sincerely don’t understand what you’re suggesting people do.
What's the purpose of a major version? If you break compatibility, make a new library with a new name. Incorporate the major version into the library name --- not "sqlite 3.x", but "sqlite3 x". This way, your versioning scheme is less complex overall because it deals in fewer concepts.
If you publish a sqlite4, and it can't share a process with sqlite3, mark sqlite4 in package metadata as being incompatible with sqlite3. Now you get the same single-major-version invariant that some package systems support without the fuss.
Which is ultimately why what you’re proposing couldn’t possibly work. Breaking changes do occur, and packages do update to dependencies with breaking changes, often cascading their own breaking changes to accommodate that. It would be absurd to have a wave of umpteen “new packages”—with new names!—for effectively a single breaking change reflected through that chain.
Why? That's the practical effect of cascading breaking changes through the dependency tree anyway. I can't substitute the old thing with the new thing, so why not just give the new thing a new name? But you usually don't need to cascade these things: if my library internally shifts from libfoo3 to incompatible libfoo4 and libfoo isn't part of my library's interface, why would I have to cascade the libfoo3 -> libfoo4 change as a breaking change to my consumers?
It isn’t. That’s my point.
> I can't substitute the old thing with the new thing, so why not just give the new thing a new name?
You can’t substitute it with zero effort. The work involved may be as minimal as a single change in a single place. Of course it might be much more than that. If that’s the case, I think this “new package” = “new name”, while still subjective, becomes a valid concept to consider. But that depends not just on the scope of changes: the nature of what changed, why, and how it manifests downstream all matter.
A great example of what you’re proposing, that seems to have gone well (despite much consternation about it at the time), was the transition from the original Angular.js to Angular 2. A wholly different project, positioned as such.
It becomes more questionable to justify the concept for a release with substantive breaking changes, but which nonetheless undeniably remains a spiritual successor. And it becomes absolutely silly when such substantive changes, while breaking, are absolutely in line with the intent of the original package.
An example of this I encountered yesterday, granted still in proposal, is a change to Zod’s union APIs which brings them much more in line with their original goals and with their original underlying rationale. This would be a breaking change, but if anything it would make the package more true to itself rather than another package.
Downstream dependent packages will necessarily make changes to adopt the proposal. But I’d be utterly shocked if anyone would argue, in good faith, that it makes Zod itself meaningfully different to the degree it should be renamed to make the change.
As for cascading changes, it’s entirely possible to encapsulate such that it doesn’t need to happen, just as you say. But it’s also just as reasonable that some changes, breaking though they might be, are objectively good for both the originating package and downstream users. It may be the case that cascading the change is exactly what everyone in that chain wants to happen, even as they consume the same set of packages.
Versioning can be very good at conveying changes like this! Arbitrarily declaring that any change of this kind must warrant a rename would almost certainly be more disruptive than that. Even if it’s philosophically sound (and here I will just say that it’s famously not), it’s pragmatically… goofy.
b) even given that, they share a mindset, and being able to talk about that mindset has value (trivial example: all the GTKs are GObject C libraries, Qt is a C++ thing.
You're right if and only if there is no transferable knowledge communicable by saying "GTK", regardless of version (or you're just not talking to anyone).
Any other way risks runtime errors. And to people about to mention types in python: those are also checked at runtime.
People keep using these hyper dynamic languages and then running into these robustness issues and scaling limitations brought on by their very dynamism. It makes me mad and sad.
Rust, for example, has precisely this same problem, except that it is limited to public dependencies. For example, if `serde 2` were ever to be published, then there would likely be a period of immense pain where, effectively, everyone needs to migrate all at once. Even though `serde 1` and `serde 2` can both appear in the same dependency tree (unlike in Python), because it is a public dependency, everyone needs to be using the same version of the library or else the `Serialize` trait from `serde 1` will be considered distinct from the `Serialize` trait (or whatever) in `serde 2`.
But if I, say published a `regex 2.0.0` tomorrow, then folks could migrate at their leisure. The only downside is that you'd have `regex 1` and `regex 2` in your dependency tree. Potentially for a long time until everyone migrated over. But your build would still work because it is uncommon for `regex` to be a public dependency.
(Rust does have the semver trick[1] available to it as another release valve of sorts.)
This problem is definitely not because of missing interfaces or whatever.
That does seem to be the fundamental problem with the Python model of dependency management.
If your dependencies have transitive dependencies of their own but your dependency model is a tree and everything is clearly namespaced/versioned, you might end up with multiple versions of the same package installed, but at least they won’t conflict.
If your dependency model is flat but each dependency bakes in its own transitive dependencies so they’re hidden from the rest of the system, for example via static linking, again you might end up with multiple versions of the same package (or some part of it) installed, but again they won’t conflict.
But if your dependency model is flat and each dependency can require specific versions of its transitive dependencies to be installed as peers, you fundamentally can’t avoid the potential for unresolvable conflicts.
A pragmatic improvement in the third case is, as others have suggested, to replace the SemVer-following mypackage 1.x.y and mypackage 2.x.y with separate top-level packages mypackage1 x.y and mypackage2 x.y. Now you have reintroduced namespaces and you can install mypackage1 and mypackage2 together as peers without conflict. Moreover, if increasing x and y faithfully represent minor and point releases, always using the latest versions of mypackage1 and mypackage2 should normally satisfy any other packages that depend on them, however many there are.
Of course it doesn’t always work like that in practice. However, at least the problem is now reduced to manually adjusting versions to resolve conflicts where a package didn’t match its versions to its behaviour properly and/or Hyrum’s Law is relevant, which is probably much less work than before.
That aside, note the obvious problems here for any language that uses nominal typing - like, say, Python. Since types from dependencies can often surface in one's public API, having a tree of dependencies means that many libraries will end up referring to different (and thus ipso facto incompatible) versions of the same type.
If anything, I’d say in my experience the Python community tends to be more willing to make big changes. After all, Python itself famously did so with the 2 to 3 transition, and to some extent we’re seeing a second round of big changes even now as optional typing spreads through the ecosystem.
Admittedly, the difference could also be because so few packages in JS world seem to last long enough for multiple major versions to become an issue. The Python ecosystem seems more willing to settle on a small number of de facto standard libraries for common tasks.
Since types from dependencies can often surface in one's public API, having a tree of dependencies means that many libraries will end up referring to different (and thus ipso facto incompatible) versions of the same type.
Leaving aside the questionable practice of exposing details of internal dependencies directly through one’s own public interface, I don’t see how this is any different to any other potential naming conflict. Whatever dependency model you pick, you’re always going to have the possibility that two dependencies use the same name as part of their interface, and in Python you’re always going to have to disambiguate explicitly if you want to import both in the same place. However, once you’ve done so, there is no longer any naming clash to leak through your own interface either.
That transition has been so traumatic for the whole ecosystem that, if anything, it became an abject lesson as to why you don't do stuff like that. "Never again" is the current position of PSF wrt any hypothetical future Python 3 -> 4 transition.
Major Python libraries pretty much never just remove things over the course of a single major release. Things get officially announced first, then deprecated for at least one release cycle but often longer (which is communicated via DeprecationWarning etc), then finally retired.
> Leaving aside the questionable practice of exposing details of internal dependencies directly through one’s own public interface
Not all dependencies are internal. If library A exposes type X, and library B exposes type Y that by design extends X (so that instances of Y can be passed anywhere X is expected), that is very intentionally public.
Now imagine that library C exposes type Z that also by design extends X. If B and C both get their copy of A, then there are two identical types X that are not type-compatible.
Now suppose we have the app that depends on both B and C. Its author wants to write a generic function F that accepts an instance of X (or a subtype) and does something with it. How do they write a type signature for F such that it can accept both Y and Z?
I’m not sure that’s a realistic generalisation. To pick a few concrete examples, there were some breaking changes in SQLAlchemy 2, Pydantic 2, and as an interesting example of the “rename the package instead of bumping the major version” idea mentioned elsewhere, from Psycopg2 to Psycopg (3). I think it’s fair to say all of those are significant packages within the Python ecosystem.
Not all dependencies are internal. If library A exposes type X, and library B exposes type Y that by design extends X […] Now imagine that library C exposes type Z that also by design extends X
Yes, you can create some awkward situations with shared bases in Python, and you could split all of the relevant types into different libraries, and this isn’t a situation that Python’s object model (or those of many other OO languages) handles very gracefully.
Could you please clarify the main point you’d like to make here? The shared base/polymorphism complications seem to apply generally with Python’s object model, unless you have a set of external dependencies that are designed to share a common base type from a common transitive dependency and support code that is polymorphic as if each name refers to a single, consistent type and yet the packages in question are not maintained and released in sync.
That seems like quite an unusual scenario. Even if it happens, it seems like the most that can safely be assumed by code importing from B and C — unless B and C explicitly depend on exactly the same version of A — is that Y extends (X from A v1.2.3) while Z extends (X from A v1.2.4). If B and C aren’t explicitly managed together, I’m not convinced it’s reasonable for code using them both to assume the base types that happen to share the same name X that they extend and expose through their respective interfaces are really the same type.
They are not checked at runtime at all. Type declarations are only used for static analyzing tools and not by the runtime.
So types are checked BEFORE runtime by the tooling just like they would be in TypeScript or any other language that offers gradual typing.
Yes, the dynamic nature of Python does make type safety and certain performance optimizations very difficult but then again it is the dynamic nature that allows for the high productivity of the language. A static language would be far less ergonomic to use for the typical prototyping and explorative programming done in Python.
A static language without type inference, sure. But that's not the only option.
OCaml, for example, will infer object types for you based on what methods are called with what kinds of arguments inside the body.
This is the common usecase, but types certainly are used at runtime by many libraries. Frameworks like FastAPI use the type annotations to declare dependency injection which is resolved during application startup. In other cases like Pydantic, they are used to determine marshalling/unmarshalling strategies.
Huge amounts of effort are expended on Linux distros ensuring that all the packages work together. Much of and maybe most of those packages are written in static languages.
Many Python packages don’t have issues with things constantly breaking. I find NumPy, SciPy, the Scikits, and more to be rather stable. I can only think of making trivial fixes in the last few years. I have lots of exotic code using things like Numba that’s been long lived. I’m guessing Flask and Django are pretty stable at this point, but I don’t work on that side of things.
Packages undergoing a lot of construction still are less nice. I think that might be the nature of all new things, though. The example at the beginning of this article, TensorFlow, is still a relatively new sort of package and is seeing tons of new development still.
Packaging in Python in 2024 still sucks, which is a uniquely Python issue. Python’s slowness necessitating wrapping lots of platform specific binaries doesn’t help. Seemingly even major Python projects like TensorFlow have really only just started making an attempt to version their dependencies. In one of the issues in the article, the issue was TF pinning things way too specifically in the main project. One of the satellite projects had the opposite issue, not even setting min bounds. The Wild West of unpinned deps make it hard for upstream authors to even know they are breaking things.
Many people know Python packaging sucks, but I don’t think they know how bad it really is. The slowness is also special to Python. Other languages like Julia and Clojure seem to be much better with these difficulties, and I think in large part this is due to early investments preventing the problems from festering.
Rust vs C++ is a good comparison I think. Cargo is better than anything C++ has by far. In C++, it’s common to completely avoid dependencies altogether because the best you’ve had historically is the OS-specific package manager. The issue isn’t static vs dynamic. The issue is early investment in packaging and community uptake.
But I thought Tensorflow is already "dead" and everyone is moving to Torch...?
Even if it's not dead, tf has been around for almost a decade by now.
The landscape of ML is changing rapidly I'll grant you that, so I guess that might necessitate more visible changes esp. on API and dependencies...
I need another function which, given a graph, uses a faster but approximate method to return the graph diameter as an integer.
I need a third function which, given a graph, returns the graph radius as an integer.
All three of these functions have an identical type signature.
Oh, now I need something which takes a regex pattern string and a haystack string, and returns 1 if the pattern is found in the haystack, otherwise 0.
And the regex "|" pattern must match longest first, not left-right.
And it needs to support "verbose" mode, which allows comments.
And it supports backreference matches.
How do you express that type?
Now I need to numerically integrate some arbitrary function "f" in the range 0.0 to 1.0. Which of the many numeric integration methods should I use which prevents runtime issues like being unable to converge?
So you can specify interfaces (protocols) and check them in your installation process.
- Pin all application versions - Don't pin or set upper bounds in libraries. Lower bounds may work. - Use automation to continuously upgrade and test new versions of everything
If you just pin, you fall behind and eventually it becomes expensive to catch up. If you don't pin, you lose repeatability. If you don't automate, the upgrade work doesn't happen reliably.
It’s rarely a pain in the ass. When it is, it’s because we’ve chosen dependencies poorly and that’s a good signal to move on to something else. Occasionally major versions require a little more time and effort, but just eat the cost and do it.
What’s painful is updating rarely and calcifying. Do the hard thing often and it stops being hard.
1. Use pip-tools to generate a pinned requirements list. This is used to build artifacts like docker images, installer bundles etc. All pins are upgraded periodically (manually). Any version changes are inspected (this requires "knowing" your dependencies a bit and which ones might cause problems). Finally the resulting artifacts are manually tested.
2. An automated build of master every 24 hours from completely unpinned versions. That is installing the package fresh from pyproject.toml. The automated test suite will reveal any upcoming breakage that would happen if we continue without upper bounds on certain packages. If there is breakage we decide whether to accommodate the new breaking dependencies or constrain them with an upper bound (perhaps recording a tech debt).
And by another user 54 days ago: https://news.ycombinator.com/item?id=39486552
I use Poetry for all my projects, but I agree that it exacerbates the issue somewhat with its default npm-style version syntax.