Fresh Hacker News | Tail Call Recursion in Java with ASM (2023)

▲Tail Call Recursion in Java with ASM (2023)(unlinkedlist.org)

96 points by hyperbrainer 95 days ago | 8 comments

▲fsckboy 95 days ago

the "lambda the ultimate" papers and the birth of scheme was a loong time ago, so it grates on my ears to hear this topic presented as "an optimization". Yes, it is sometimes an optimization a compiler can make, but the idea is much better presented as a useful semantic of a language.

in the same way that passing parameters to a subfunction "creates" a special set of local variables for the subfunction, the tail recursion semantic updates this set of local variables in an especially clean way for loop semantics, allowing "simultaneous assignment" from old values to new ones.

(yes, it would be confusing with side effected C/C++ operators like ++ because then you'd need to know order of evaluation or know not to do that, but those are already issues in those languages quite apart from tail recursion)

because it's the way I learned it, I tend to call the semantic "tail recursion" and the optimization "tail call elimination", but since other people don't do the same it's somewhat pointless; but I do like to crusade for awareness of the semantic beyond the optimization. If it's an optimization, you can't rely on it because you could blow the stack on large loops. If it's a semantic, you can rely on it.

(the semantic is not entirely "clean" either. it's a bit of a subtle point that you need to return straightaway the return value of the tail call or it's not a tail call. fibonacci is the sum of the current with the next so it's not a tail call unless you somewhat carefully arrange the values you pass/keep around. also worth pointing out that all "tail calls" are up for consideration, not just recursive ones)

▲ekimekim 94 days ago

In a weird way it kinda reminds me of `exec` in sh (which replaces the current process instead of creating a child process). Practically, there's little difference between these two scripts:

    #!/bin/sh
    foo
    bar

    #!/bin/sh
    foo
    exec bar

And you could perhaps imagine a shell that does "tail process elimination" to automatically perform the latter when you write the former.

But the distinction can be important due to a variety of side effects and if you could only achieve it through carefully following a pattern that the shell might or might not recognize, that would be very limiting.

▲nagaiaida 94 days ago

this is pretty much exactly how my "forth" handles tail call elimination, and it's the main thing that's added the quotes so far since it shifts the mental burden to being aware of this when writing code to manipulate the return stack.

as you imply towards the end, i'm not confident this is a trick you can get away with as easily without the constraints of concatenative programming to railroad you into it being an easily recognizable pattern for both the human and the interpreter.

▲LeFantome 94 days ago

One of the issues with Java is that it is two levels of language. You compile Java into Java Byte code which is further compiled into native machine code. There is no concept of tail call recursion in Java Byte code. So, it is difficult to propagate the semantics. So it really has to be a programmer or compiler optimization to implement the tail call optimization into the generated intermediate bytecode before that is further compiled.

.NET is an interesting contrast. The equivalent of Java Bytecode in .NET (CIL) does have the concept of tail calls. This allows a functional language like F# to be compiled to the intermediate form without losing the tail call concept. It is still up to the first level compiler though. C# for example does not support tail calls even though it’s intermediate target (CIL) does.

▲ghoul2 94 days ago

Sigh. I have been kicking this horse forever as well: an "optimization" implies just a performance improvement.

Tail call elimination, if it exists in a language, allows coding certain (even infinite) loops as recursion - making loop data flow explicit, easier to analyze, and at least in theory, easier to vectorize/parallelize, etc

But if a language/runtime doesn't do tail call elimination, then you CAN'T code up loops as recursion, as you would be destroying you stack. So the WAY you code, structure it, must be different.

Its NOT an optimization.

I have no idea who even came up with that expression.

▲ameliaquining 95 days ago

I mean, in the particular case demonstrated in this blog post it can only be an optimization, because semantically guaranteeing it would require language features that Java doesn't have.

▲bradley13 95 days ago

Every compiler should recognize and optimize for tail recursion. It's not any harder than most other optimizations, and some algorithms are far better expressed recursively.

Why is this not done?

▲SkiFire13 95 days ago

In general, tail recursion destroys stacktrace information, e.g. if f calls g which tail calls h, and h crashes, you won't see g in the stacktrace, and this is bad for debuggability.

In lower level languages there are also a bunch of other issues:

- RAII can easily make functions that appear in a tail position not actually tail calls, due to destructors implicitly running after the call;

- there can be issues when reusing the stack frame of the caller, especially with caller-cleanup calling conventions;

- the compiler needs to prove that no pointers to the stack frame of the function being optimized have escaped, otherwise it would be reusing the memory of live variables which is illegal.

▲chowells 95 days ago

I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had. Tail recursion is equivalent to looping. People should stop complaining when it gives them the same information as looping.

▲roenxi 94 days ago

> I'll believe destroying stacktrace information is a valid complaint when people start complaining that for loops destroy the entire history of previous values the loop variables have had.

That is a common criticism. You're referring to the functional programmers. They would typically argue that building up state based on transient loop variables is a mistake. The body of a loop ideally should be (at the time any stack trace gets thrown) a pure function of constant values and a range that is being iterated over while being preserved. That makes debugging easier.

▲ameliaquining 95 days ago

I mean, if I were doing an ordinary non-recursive function call that just happened to be in tail position, and it got eliminated, and this caused me to not be able to get the full stack trace while debugging, I might be annoyed.

In a couple languages I've seen proposals to solve this problem with a syntactic opt-in for tail call elimination, though I'm not sure whether any mainstream language has actually implemented this.

▲chowells 95 days ago

Language designers could keep taking ideas from Haskell, and allow functions to opt in to appearing in stack traces. Give the programmer control, and all.

▲michaelmrose 95 days ago

https://clojuredocs.org/clojure.core/recur

▲SamLL 95 days ago

Kotlin has a syntactic opt-in for tail call elimination (the "tailrec" modifier).

▲hyperbrainer 95 days ago

AFAIK Zig is the only somewhat-big and known low-level language with TCO. Obviously, Haskell/Ocaml and the like support and it are decently fast too, but system programming languages they are not.

▲vlovich123 95 days ago

For guarantee:

https://crates.io/crates/tiny_tco

https://crates.io/crates/tco

As an optimization my understanding is that GCC and LLVM implement it so Rust, C, and C++ also have it implicitly as optimizations that may or may not apply to your code.

But yes, zig does have a formal language syntax for guaranteeing tail calls to happen at the language level (which I agree with as the right way to expose this optimization).

▲SkiFire13 95 days ago

Zig's tco support is not much different than Clang's `[[clang::musttail]]` in C++. Both have the big restriction that the two functions involved are required to have the same signature.

▲hyperbrainer 95 days ago

> Both have the big restriction that the two functions involved are required to have the same signature.

I did not know that! But I am a bit confused, since I don't really program in either language. Where exactly in the documentation could I read more about this? Or see more examples?

The language reference for @call[0] was quite unhelpful for my untrained eye.

[0] https://ziglang.org/documentation/master/#call

▲SkiFire13 95 days ago

Generally I also find Zig's documentation pretty lacking, instead I try looking for the relevant issues/prs. In this case I found comments on this issues [1] which seem to still hold true. That same issue also links to the relevant LLVM/Clang issue [2], and the same restriction is also being proposed for Rust [3]. This is were I first learned about it and prompted me to investigate whether Zig also suffers from the same issue.

[1]: https://github.com/ziglang/zig/issues/694#issuecomment-15674... [2]: https://github.com/llvm/llvm-project/issues/54964 [3]: https://github.com/rust-lang/rfcs/pull/3407

▲ufo 95 days ago

This limitation is to ensure that the two functions use the exact same calling convention (input & output registers, and values passed via stack). It can depend on the particular architecture.

▲Thorrez 95 days ago

C++:

> All current mainstream compilers perform tail call optimisation fairly well (and have done for more than a decade)

https://stackoverflow.com/questions/34125/which-if-any-c-com... (2008)

▲hyperbrainer 95 days ago

I couldn't actually figure out whether this TCO being done "fairly well" was a guarantee or simply like Rust (I am referring to the native support of the language, not what crates allow)

▲Thorrez 93 days ago

When that SO answer was written, it was not a guarantee.

You can now get a guarantee by using non-standard compiler attributes:

https://clang.llvm.org/docs/AttributeReference.html#musttail

https://gcc.gnu.org/onlinedocs/gcc/Statement-Attributes.html...

▲johnisgood 95 days ago

Depends on what you mean by "systems programming", you can definitely do that in OCaml.

▲pjmlp 95 days ago

"Unix system programming in OCaml"

https://ocaml.github.io/ocamlunix/ocamlunix.html

MirageOS

https://mirage.io/

House OS,

https://programatica.cs.pdx.edu/House/

Just saying.

▲hyperbrainer 94 days ago

I know of these. Almost added a disclaimer too -- that was not my point, as I am sure, you understand. Also Ocaml has a GC, unsuitable for many applications common to systems programming.

▲vbezhenar 94 days ago

Some of the issues partially alleviated by using limited part of tail recursion optimization. You mark some function with tailrec keyword, and compiler verifies that this function calls itself as the last statement. You also wouldn't expect complete stack trace from that function. At the same time it probably helps with 90% of recursive algorithms which would benefit from the tail recursion.

▲LeFantome 94 days ago

That is what Clojure does I believe.

▲vlovich123 95 days ago

My bigger issue with tail call optimization is that you really want it to be enforceable since if you accidentally deoptimize it for some reason then you can end up blowing up your stack at runtime. Usually failure to optimize some pattern doesn’t have such a drastic effect - normally code just runs more slowly. So tail call is one of those special optimizations you want a language annotation for so that if it fails you get a compiler error (and similarly you may want it applied even in debug builds).

▲_old_dude_ 95 days ago

Parroting something i have heard at a Java conference several years ago, tail recursion remove stack frames but the security model is based on stack frames, so it has to be a JVM optimization, not a compiler optimization.

I've no idea if this fact still holds when the security manager will be removed.

▲smarks 95 days ago

The security manager was removed (well, “permanently disabled”) in Java 24. As you note, the permissions available at any given point can depend on the permissions of the code on the stack, and TCO affects this. Removal of the SM thus removes one impediment to TCO.

However, there are other things still in the platform for which stack frames are significant. These are referred to as “caller sensitive” methods. An example is Class.forName(). This looks up the given name in the classloader of the class that contains the calling code. If the stack frames were shifted around by TCO, this might cause Class.forName() to use the wrong classloader.

No doubt there are ways to overcome this — the JVM does inlining after all — but there’s work to be done and problems to be solved.

▲thfuran 95 days ago

Is there? As you say, there's already inlining, and I don't see how tco presents a harder case for that.

▲smarks 94 days ago

There are similarities in the problems, but there are also fundamental differences. With inlining, the JVM can always decide to deoptimize and back out the inlining without affecting the correctness of the result. But it can't do that with tail calls without exposting the program to a risk of StackOverflowError.

We've been using TCO here ("tail call optimization") but I recall Guy Steele advocating for calling this feature TCE ("elimination") because programs can rely on TCE for correctness.

▲javier2 95 days ago

In theory, if all you do is implement algorithms, this sounds fine. But most apps implement horrible business processes, so what would one do with missing stacktraces? Maybe in languages that can mark functions as pure.

▲cempaka 95 days ago

Very nice article demonstrating a neat use of ASM bytecode. The Java language devs are also working on Project Babylon (code reflection), which will bring additional techniques to manipulate the output from the Java compiler: https://openjdk.org/projects/babylon/articles/code-models

▲gavinray 95 days ago

This was delivered in JDK 24 as the "Class-File API"

https://openjdk.org/jeps/484

▲algo_trader 95 days ago

Can this improve/replace AspectJ and similar instrumentations? We do lots of instruction level modifications

▲1932812267 95 days ago

Scala has been using this technique for years with its scala.annotation.tailrec annotation. Regardless, it's cool to see this implemented as a bytecode pass.

▲gavinray 95 days ago

Kotlin as well, with the "tailrec" keyword, e.g. "tailrec fun fibonacci()"

https://kotlinlang.org/docs/functions.html#tail-recursive-fu...

Kotlin also has a neat other tool, "DeepRecursiveFunction<T, R>" that allows defining deep recursion that is not necessarily tail-recursive.

Really useful if you wind up a problem that is most cleanly solved with mutual recursion or similar:

https://kotlinlang.org/api/core/kotlin-stdlib/kotlin/-deep-r...

▲deepsun 95 days ago

Interesting, does it depend on Kotlin compiler or it can be implemented in Java as well?

▲gavinray 95 days ago

The "DeepRecursiveFunction<T,R>" could be implemented in Java. The Kotlin implementation leverages Kotlin's native coroutines and uses continuations.

It'd require a bit of engineering to get something working in native Java I'd imagine, even with the new JDK Structured Concurrency API offering you a coroutines alternative.

On the other hand, "tailrec" is a keyword and implemented as a compiler optimization.

The closest I've seen in Java is a neat IntelliJ plugin that has a transformation to convert recursive method calls into imperative loops with a stack frame.

This transformation and resulting tool was the result of someone's thesis, it's pretty cool:

https://github.com/andreisilviudragnea/remove-recursion-insp...

▲ncruces 94 days ago

It's been a long time since I've messed with Java bytecode [1], but shouldn't the private method call use INVOKESPECIAL?

In general I don't think you can do this to INVOKEVIRTUAL (or INVOKEINTERFACE) as it covers cases where your target is not statically resolved (virtual/interface calls). This transformation should be limited to INVOKESTATIC and INVOKESPECIAL.

You also need lots more checks to make sure you can apply the transformations, like ensure the call site is not covered by a try block, otherwise this is not semantics preserving.

1: https://jauvm.blogspot.com/

▲lukaslalinsky 94 days ago

I never understood the need for tail recursion optimization in imperative languages. Sure, you need it in FP if you don't have loops and recursion is you only option, but what is the benefit of recursive algorithms, that could benefit from tail optimization (i.e recursive loops), in a language like Java?

▲droideqa 95 days ago

Cool, now ABCL can have TCO!

▲1932812267 95 days ago

This isn't a _general_ tail call optimization--just tail recursion. The issue is that this won't support mutual tail recursion.

e.g.:

(defun func-a (x) (func-b (- x 34)) (defun func-b (x) (cond ((<= 0 x) x) ('t (func-a (-x 3))))

Because func-a and func-b are different (JVM) functions, you'd need an inter-procedural goto (i.e. a tail call) in order to natively implement this.

As an alternative, some implementations will use a trampoline. func-a and func-b return a _value_ which says what function to call (and what arguments) for the next step of the computation. The trampoline then calls the appropriate function. Because func-a and func-b _return_ instead of actually calling their sibling, the stack depth is always constant, and the trampoline takes care of the dispatch.

▲knome 95 days ago

Sounds like a manual form of clojures recur function.

https://clojuredocs.org/clojure.core/recur

▲1932812267 95 days ago

Clojure's loop/recur is specifically tail recursion like scala's tailrec or the optimization described in the blogpost. It doesn't use trampolines to enable tail calls that aren't tail recursion.

▲dapperdrake 95 days ago

Finally.

The ANTLR guys went through terrible contortions for their parsers.

Never felt like working those details out for ABCL.

▲curtisszmania 95 days ago

[dead]