53 points by luu 13 days ago | 3 comments
bmc7505 9 days ago
This is a very nice approach. Unfortunately, Piantadosi's pairing function does not work for acyclic grammars (i.e., finite CFLs). We found a variant that handles enumerating all trees of a certain width from an arbitrary CFG, i.e., L(G) ∩ Σ^n, or whose language is finite [1]:

[1]: https://github.com/breandan/galoisenne/blob/master/latex/laf...

rictic 8 days ago
Related work: Functional Enumeration of Algebraic Data Types - https://mengwangoxf.github.io/Papers/Haskell12.pdf

It's quite efficient, able to generate the 10^100th element in ~a second on 2013 hardware. It also groups the generated values by size, so you can e.g. randomly sample among ASTs with N nodes.

082349872349872 12 days ago
I prefer to enumerate via a shortlex variant, primary key being number of leaves and secondary being the index of the particular expansion.
CoastalCoder 9 days ago
Why the preference?

(For context, I'm completely new to this topic.)

082349872349872 9 days ago
Because I use them for testing, so I generally want to uniformly generate samples within a particular size of tree.

(generating random trees the naive way tends to produce the occasional huge tree amidst huge numbers of tiny trees)

gessha 9 days ago
If it’s not confidential, what type of problem would require manipulation and testing of trees?
082349872349872 9 days ago
Almost all problems (that I find interesting): anytime you can frob two foos together and get a new foo (put two parts together to make something that can again be a part) you've got the material for a tree.

Concrete examples: Docker containers, PDF parts, POSIX filesystems, Scheme sources, SVGs, x86 objects, etc.

gopher_space 9 days ago
Message board?
morgante 8 days ago
Do you have an example of doing this?