That's not to say that LISP was faster than Prolog in general, just this particular program was slow.
Now a days, of course nobody writes parsers or grammars by hand like that. Which makes me sad, because it was a lot of fun :).
LLMs brought this new revolution where it's not immediately obvious you're chatting with a machine, but, just like most humans, they still severely lack the ability to decompose unstructured data into logic statements and prove anything out. It would be amazing if they could write some datalog or prolog to approximate more complex neural-network-based understanding of some problem, as logic based systems are more explainable
Sentences that are incorrect but still understandable.
If you then include leet speak, acronyms, short form writing (SMS / Tweets), it quickly becomes unmanageable.
From what I understand, the modern understanding is that these point to the failure of grammar as a prescriptive exercise ("This is how thou shalt speak"). Human speech is too complex for simple grammar rules to fully capture its variety. Strict grammar and lexical rules were always fantasies of the grammar teacher anyway.
See, for example, the following article on double negatives and African American Vernacular English: https://daily.jstor.org/black-english-matters/.
For the record, the parser I worked on ended up having the "interesting" rules removed, leaving it as a tool for finding sentences that didn't conform to a Basic English grammar with a controlled vocabulary--and used to QC aircraft repair manuals, which need to be read by non-native English speakers.
I see it as a complete waste of my youth, BTW. Today I speak English that I learned through listening, reading and watching, and all of this mother tongue grammar nonsense that used to stress me out daily at school and during homework is absolutely useless to me.
* Conditional random fields: Probabilistic models for segmenting and labeling sequence data (2001) - Central paper of structured supervised learning in the 2000s era
* Weighted finite-state transducers in speech recognition (2002) - This work and OpenFST are so clean
* Non-projective dependency parsing using spanning tree algorithms (2005) - Influential work connecting graph algorithms to syntax. Less relevant now, but still such a nice paper.
* Distributional clustering of English words (1994) - Proto word embeddings.
* The Unreasonable Effectiveness of Data (2009) - More high-level, but certainly explains the last 15 years
I nearly cried---this is how a great institution crumbles---this is how great libraries are destroyed.
Future generations are going to really be scratching their heads, wondering why we disbanded the institution which brought us the transistor and Unix, and instead funded billions of dollars into research in how to get us to click on buttons and doom scroll.
However, AT&T was broken up in 1984 as the result of yet another lawsuit involving AT&T’s monopoly. Bell Labs still remained, but it no longer had the same amount of resources. Thus, the lab’s unfettered research culture gradually gave way to shorter-term research that showed promise of more immediate business impact. A similar thing happened to Xerox PARC when the federal government forced Xerox to license its xerography patents in the mid-1970s; this, combined with the end of a five-year agreement where Xerox’s executives promised not to meddle in the operations of Xerox PARC, led to increased pressure on the researchers (though, ironically, Xerox infamously didn’t take full advantage of the research PARC produced, though that’s another story).
Combine this with a business culture that emerged in the 1990s that disdains long-term, unfettered research and emphasizes short-term research with promises of immediate business impact, and this has resulted in the transformation of industrial research. There are some labs like Microsoft Research that still provide their researchers a great deal of freedom, but such labs are rare these days. It’s amazing that well-resourced companies like Apple don’t have labs like Bell Labs and Xerox PARC, but if businesses are beholden to quarterly results, why would they invest in long-term, risky research projects?
This leaves government and academia. Unfortunately government, too, is often subject to ROI demands from politicians (which is nothing new; check out how the Mansfield Amendment changed ARPA into DARPA), and academia is subject to “publish or perish” demands.
The running theme is that unfettered research with a proper amount of resources can result in world-changing discoveries and inventions. However, funding such research requires a large amount of resources as well as patience, since research takes time and results don’t always come in neat quarterly or even annual periods. Our business culture lacks this type of patience, and many businesses lack the resources to devote to maintaining labs at the level of Bell Labs or Xerox PARC. Even academia and government lacks this type of patience.
The question is how can we encourage unfettered research in a world that is unwilling to fund it. I’ve been thinking of ideas for quite some time, but I haven’t fully fleshed them out yet.
The CUE evaluator is a really interesting codebase for anyone interested in algos
The NLP stuff -Definite Clause Grammars- is also interesting, in the sense that it's surprising that a notation that can be used to parse language (definite clauses) can also be used to represent any program a Universal Turing Machine can compute.