For the use-cases where Genetic Programming was popular, I would recommend looking at Bayesian Optimization (bayesopt) as an alternative today (I know I keep recommending the area - but I hope I do when it is relevant :-)). This is mostly because IMHO it has a principled foundation that has been productively developed further in the past few years. Here's a good book on the topic [1], and I've a tutorial as well [2]. Interestingly one of the books I had encountered when reading up on Genetic Algo. years ago was by Melanie Mitchell [3]!
Bayesopt or Genetic Programming, or any search algorithm that can operate over non-differentiable objective functions are very useful in practice. For ex, when performing model selection in the space of hyperparameters, when your model is not differentiable such as a traditional Decision Tree [4]. Or exotic use-cases like molecule discovery [5].
You can try out bayesopt using the botorch or hyperopt libraries. The latter only implements a specific bayesopt algo. which was/is popular but it seems to have been bettered of late [4].
[2] Part 1 https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs
[3] Found a free copy online https://www.boente.eti.br/fuzzy/ebook-fuzzy-mitchell.pdf
[4] "... Analysis of the Black-Box Optimization Challenge 2020" https://proceedings.mlr.press/v133/turner21a.html
[5] ChemBO is an example but there are others https://proceedings.mlr.press/v108/korovina20a.html
Also don’t confuse Genetic Algorithms (GA) with GP.
Anyway, it should be close enough.
I just lump it all under "Evolutionary Computing"...
I've been working on GP architectures that use short linear program tapes as part of their genome. Executing these programs can be done very quickly. Assuming the program tape is <=10kb in length and you are constraining to ~a megacycle in the interpreter, you can execute these programs 100k~1m times per second on a chunky workstation. The problems you can solve in this time & space are non-trivial. Once you evolve the desired program, you can write the interpreter for it in any language on any platform within half an hour.
Training is definitely going to be expensive but it can fit in one powerful machine. You don't need to keep a terabyte of data around in VRAM to make reasonable rates of progress. Paging out parts of the population to disk or S3 buckets is feasible with these techniques. It subdivides very nicely into however many islands you can afford to run.
Inference is essentially free and could run on the processor in your toaster oven.
These approaches can make CUDA & friends look like a Rube Goldberg machine by comparison. Being able to explain how the entire thing works to a peer in a single afternoon is perhaps another significant feature of these designs.
At the end of the day, there is a pretty good reason these techniques aren't popular - There aren't any quadratic scaling laws that constrain the architecture of GP. Only the much scarier exponential ones. I contend that the search space of all linear programs is viable, but I still don't have any proof that this is the case after chasing the rabbit for about a year. My central argument is that there are many high-quality programs out there. We just need to find one of them. I think the arguments against this tend to unfairly cast the problem as searching for a single needle in a haystack of 10^10000.
Anything where we are transforming information from one format to another on a ~1:1 basis without injecting additional external information (i.e., what foundation models do with their broad world knowledge).
Most quadratic and linear programming solvers assume a fully generic problem structure. You give them any matrix and they'll spit out an answer. This means that internally they must also use generic matrix operations.
If you know the full structure of the optimization problem and how it changes with respect to changes in parameters, you could in principle compile a unique solver that is fully specialized to the parameterized problem. If your problem is particularly simple, you could even enumerate all possible answers in a lookup table.
https://en.m.wikipedia.org/wiki/Parametric_programming
Now here is the problem: While it is not that difficult to find the sparsity pattern of the optimization problem and choose the appropriate sparse matrix library based on the problem, actually producing a unique program that does nothing but solve your parametric optimization problem is a pipe dream. Most approaches involve a lot of human ingenuity to find exploitable structures in specific problems that become useless the moment you change the problem even a little bit.
So here is my suggestion: figure out how to use genetic programming to produce fast quadratic programming solvers to specific problems.
Note that you don't necessarily need a perfect solution since you can always run a bunch of QP steps on an initial guess to nudge it towards convergence. The goal would be to reduce the number of these steps to as close to zero as possible.
If the problem fits into a tree form, then Genetic Programming programming is your best friend.
If the problem can be encoded onto a set of substrings of 0s and 1s, then Genetic Algorithms are your best friend.
Likewise if your problem can be encoded using floating point numbers, then Evolutionary Strategies is your best friend.
Now... I could ruffle some feathers by saying that Genetic Algorithms and Evolutionary Strategies aren't really all that different. Same algorithms could be used for both. But historicaly they they came roughly at similar times frome different places on earth, Illinois vs Germany.
Back to GP. The cool thing about GP is that when the solution is evolved you have THE solution to the problem. No futzing about with how or what the results mean.
A big problem in the past is that GP doesn't scale well like neural networks. It is not embarrassingly parallel. It has been very limited by the types of hardware/architectures available.
But GP is a great field of exploration!
Until a few years ago, the popular deep learning methods like CNNs weren't great at that kind of thing, but LLMs definitely changed that - it's safe to say that by drawing on huge amounts of data LLMs are much better at most practical programming tasks. They're not necessarily a complete replacement though, there's definitely space for hybrid approaches, eg https://arxiv.org/abs/2401.07102 or https://www.nature.com/articles/s41586-023-06924-6.
An example would be a neural network over a discrete field.
Back at university I played around with boolean neural networks and this was one way of training them directly.
EDIT: > what are the most suitable problems that Genetic Programming is to solve
Given above, genetic algorithms are really suitable when part of your problem domain is bounded execution time. You perform n iterations of genetic algorithm and even if the result does not fit within fitness function, you still get some result that is closer to fitness function than a wild guess.
No. Possibly you're confused between GAs and GP, a common confusion. In GP, the goal is to find an algorithm - in a real programming language, not as weights - to optimise an objective function. Often that is specified by input-output pairs, similar to supervised learning.
Deep learning on graphs is unfortunately still a little underwhelming.
GA (not GP) is commonly used for hyperparameter choice in deep learning.
It's an interesting showcase in how we (or at least I) assumed what the lifecycle of hyperlinks would be. (My assumption was they would perhaps 404 at some point, but not this.)
Link to Ants AI Challenge: http://ants.aichallenge.org/
Winner's post mortem: https://web.archive.org/web/20151117155824/http://xathis.com...
A "humbling experience" is definitely the right choice of words.
I did some minor work on a PoC for the one after Ants. It was multi-player (or: multi-bot) Asteroids but things fizzled out as most of the people organizing moved on to other things.
My first one just tries to make a 64-bit bitting that is all 1's. I made random selection, roulette wheel, and tournament. Random is there as empirical proof that GA's are smarter than chance. Tournament is doing a great job.
Next idea is student loans. For maybe 15 loans with different rates, what payment plan knocks them all out the fastest? I'm fixing the payment size with a payment plan being a list of numbers representing which loan to apply the payment to (or 0 for no payment). Each step is (a) apply payment, (b) advance all loans with interest, and (c) see if the total is zero. Fitness will be interesting given I need to optimize for minimal principle and payments.
I found some other easy examples people did online that beginners might want to recreate. They include trying to get from point A to B with bananas at random places (avoid falling), best combo of items in your backpack for travel, and ants finding the best path to get the most food in a 2D map. I'll try to dig up the links later if someone wants them. Otherwise, those descriptions alone might be a starting point.
Evolutionary computation > History: https://en.wikipedia.org/wiki/Evolutionary_computation #History
- "The sad state of property-based testing libraries" re coverage-guided fuzzing and other Hilbert spaces: https://news.ycombinator.com/item?id=40884466
- re: MOSES and Combo (a Lisp) and now Python too in re: "Show HN: Codemodder – A new codemod library for Java and Python" and libCST and AST and FST; https://news.ycombinator.com/item?id=39139198
- opencog/asmoses implements MOSES on AtomSpace, a hypergraph for algorithmic graph rewriting: https://github.com/opencog/asmoses
OTOH that was similar to what people were saying about hidden layers, so YMMV