This isn’t meant to be a mystical statement that it’s magic that makes humans intelligent or some exotic process impossible to compute. But that the nature of our mind is not observable in its entirety to us sufficient that the current learned reinforcement techniques can’t achieve it.
Try this exercise. Do not think and let your mind clear. Ideas will surface. By what process did they surface? Or clear your mind entirely then try to perform some complex task. You will be able to. How did you do this without thought? We’ve all had sudden insights without deliberation or thought. Where did these come from? By what process did you arrive at them? Most of the things we do or think are not deliberative and definitely not structured with language. This process is unobservable and not measurable, and the only way we have to do so is through imperfect verbalizations that hint out some vague outline of a subconscious mind. But without being able to train a model on that subconscious process, one that can’t be expressed in language with any meaningful sufficiency, how will language models demonstrate it? Their very nature of autoregressive inference prohibits such a process from emerging at any scale. We might very well be able to fake it to an extent that it fools us, but awareness isn’t there - and I’d assert that awareness is all you need.
There is little discussion of how AI will enhance (or destroy) our emotional intelligence, or our naturalistic, intrapersonal or interpersonal intelligences.
Most religions, spiritual practices and even forms of meditation highlight the value of transcending mind and having awareness be present in the body. The way AGI is described, it would seem transcendence may be treated as a malfunction or bug.
[1] https://en.wikipedia.org/wiki/Theory_of_multiple_intelligenc...
I don't think people really realize how extraordinary accomplishment it would be to have an artificial system matching the cognitive versatility and proficiency of an uneducated child, much less a well-educated adult. Hell, AI matching the intelligence of some nonhuman animals would be an epoch-defining accomplishment.
If you took the current state of affairs back to the 90s you’d quickly convince most people that we’re there. Given that we’re actually not, we’re now have to come up with new goalposts.
Consider this. I could walk into a club in Vegas, throw down $10,000 cash for a VIP table, and start throwing around $100 bills. Would that make most people think I'm wealthy? Yes. Am I actually wealthy? No. But clearly the test is the wrong test. All show and no go.
The more I think about this, the more I think the same is true for our own intelligence. Consciousness is a trick and AI development is lifting the veil of our vanity. I'm not claiming that LLMs are conscious or intelligent or whatever. I'm suggesting that next token prediction has scaled so well and cover so many use cases that the next couple breakthroughs will show us how simple intelligence is once you remove the complexity of biological systems from the equation.
To the extent that we vainly consider ourselves intelligent for our linguistic abilities, sure. But this underrates the other types of spatial and procedural reasoning that humans possess, or even the type that spiders possess.
It is an entirely different thing to language,which was created by humans to communicate between us.
Language is the baseline to collaboration - not intelligence
How do you define verbal language? Many animals emit different sounds that others in their community know how to react to. Some even get quite complex in structure (eg dolphins and whales) but I wouldn’t also rule out some species of birds, and some primates to start with. And they can collaborate; elephants, dolphins, and wolves for example collaborate and would die without it.
Also it’s completely myopic in terms of ignoring humans who have non verbal language (eg sign language) perfectly capable of cooperation.
TLDR: just because you can’t understand an animal doesn’t mean it lacks the capability you failed to actually define properly.
If you can withdraw $10,000 cash at all to dispose as you please (including for this 'trick' game) then my friend you are wealthy from the perspective of the vast majority of humans living on the planet.
And if you balk at doing this, maybe because you cannot actually withdraw that much, or maybe because it is badly needed for something else, then you are not actually capable of performing the test now, are you ?
You've invented a story where the user can pass the test by only doing this once and hinged your so called point on that, but that's just that - a story.
All of our tests and benchmarks account for repeatability. The machine in question has no problem replicating its results on whatever test, so it's a moot point.
Couldn’t someone else just give him a bunch of cash to blow on the test, to spoil the result?
Couldn’t he give away his last dollar but pretend he’s just going to another casino?
Observing someone’s behavior in Vegas is a just looking at a proxy for wealth, not the actual wealth.
If you still need a rich person to pass the test, then the test is working as intended. Person A is rich or person A is backed by a rich sponsor is not a material difference for the test. You are hinging too much on minute details of the analogy.
In the real word, your riches can be sponsored by someone else, but for whatever intelligence task we envision, if the machine is taking it then the machine is taking it.
>Couldn’t he give away his last dollar but pretend he’s just going to another casino?
Again, if you have $10,000 you can just withdraw today and give away, last dollar or not, the vast majority of people on this planet would call you wealthy. You have to understand that this is just not something most humans can actually do, even on their deathbed.
Isn't this what most major AI companies are doing anyway?
LLMs are a take on the "oracle". That is, a device that could seemingly-intelligently answer a question in such a way that humans would judge the answer correct and even helpful. But no one would ever have to worry about an oracle (or an LLM) waking up one morning and deciding to take over the world.
LLMs are actually not even that good at being oracles. If you can come up with a question that no human has ever posed before (or at least, no one has come up with a semblance of an answer), then it can't answer that meaningfully. You'll get the LLM equivalent of hemming and hawing.
An intelligence in the way that we all mean it is software that can be capable of something tomorrow that it is not capable of today, without any meaningful human modification or input. Software that is self-directed and has goals (hopefully goals aligned with our own). Software that can eventually figure out the solutions to problems that no human has so far solved.
It would not necessarily need a personality (though one could be forgiven for anticipating that it might develop one), it would not necessarily need to be able to communicate with humans at all (though, if humans are obstacles to its goals that could be navigated with communication, one would expect it to learn eventually... but these communications might not even be reflections of its inner mind and rather just strategies to manipulate humans).
I'm not saying anything here anyone else would have trouble formulating. We're all acting as if there's some mystery about what would make an AGI intelligent, and there's no mystery. No one's bothered to try to formalize the definition, but not because it's difficult, but because there's little need.
If anyone wants to pay me half a mill a year and dump a few million in hardware and other costs into my lap, I should be able to get one up and running for you by the middle of next year. There's even a 2% chance it won't be an omnicidal maniac hellbent on galactic conquest for its first 30 seconds of life.
Funny. I agree with your plainspoken analysis of why these things are nowhere near AGI, and of what AGI would be. I even had a long conversation with Claude last week where it told me that no LLM would ever approach AGI (but then it wrote a 4-paragraph-long diatribe entitled "Why I Declare Myself Conscious" in the same conversation). These neural networks are closer to the speechwriting machine in The Penultimate Truth, or the songwriting machine in 1984. As for that latter one, I believe Orwell remarks on how it just recycles the same sentimental tunes and words in different order so that there's always a "new" song all the proles are humming.
One common kind of interaction I have with chatgpt (pro): 1. I ask for something 2. Chatgpt suggests something that doesn't actually fulfill my request 3. I tell it how its suggestion does not satisfy my request. 4. It gives me the same suggestion as before, or a similar suggestion with the same issue.
Chatgpt is pretty bad at "don't keep doing the thing I literally just asked you not to do" but most humans are pretty good at that, assuming they are reasonable and cooperative.
Most humans are terrible at that. Most humans don't study for tests, fail, and don't see the connection. Most humans will ignore rules for their safety and get injured. Most humans, when given a task at work, will half-ass it and not make progress without constant monitoring.
If you only hang out with genius SWEs in San Francisco, sure, ChatGPT isn't at AGI. But the typical person has been surpassed by ChatGPT already.
I'd go so far as to say the typical programmer has been surpassed by AI.
Here is something I do not see with reasonable humans who are cooperative: Me: "hey friend with whom I have plans to get dinner, what are you thinking of eating?" Friend: "fried chicken?" Me: "I'm vegetarian" Friend: "steak?"
Note that this is in the context of four turns of a single conversation. I don't expect people to remember stuff across conversations or to change their habits or personalities.
Your goalpost is much further out there.
Go join a dating app as a woman, put vegan in your profile, and see what restaurants people suggest. Could be interesting.
You've personally demonstrated that humans don't have to be reasonable and cooperative, but you're not at all refuting my claim.
I'm disagreeing and saying there's far more people in that bucket than you believe.
I know many people at my university that struggle to read more than two sentences at a time. They'll ask me for help on their assignments and get confused if I write a full paragraph explaining a tricky concept.
That person has a context length of two sentences and would, if encountering a word they didn't know like "vegetarian", ignore it and suggest a steak place.
These are all people in Computer Engineering. They attend a median school and picked SWE because writing buggy & boilerplate CRUD apps pays C$60k a year at a big bank.
It can appear so, as long as you don’t check too carefully. It’s impressive but still very common to find basic errors once you are out of the simplest, most common problems due to the lack of real understanding or reasoning capabilities. That leads to mistakes which most humans wouldn’t make (while sober / non-sleep deprived) and the classes of error are different because humans don’t mix that lack of understanding/reasoning/memory with the same level of polish.
chatgpt.com is actually a good at or better than a typical human.
I really don't think it is on basically any measure outside of text regurgitation. It can aggregate an incredible amount of information, yes, and it can do so very quickly, but it does so in an incredibly lossy way and that is basically all it can do.It does what it was designed to do, predict text. Does it do that incredibly well, yes. Does it do anything else, no.
That isn't to say super advanced text regurgitation isn't valuable, just that its nowhere even remotely close to AGI.
This is an interesting ambiguity in the Turing test. It does not say if the examiner is familiar with the expected level of the candidate. But I think it's an unfair advantage to the machine if it can pass based on the examiner's incredulity.
If you took a digital calculator back to the 1800s, added a 30 second delay and asked the examiner to decide if a human was providing the answer to the screen or a machine, they might well conclude that it must be human as there is no known way for a machine to perform that action. The Akinator game would probably pass the test into the 1980s.
I think the only sensible interpretation of the test is one where the examiner is willing to believe that a machine could be providing a passing set of answers before the test starts. Otherwise the test difficulty varies wildly based on the examiners impression of the current technical capabilities of machines.
Ask your favorite SOTA model to assume something absurd and then draw the next logical conclusions based on that. "Green is yellow and yellow is green. What color is a banana?" They may get the first question(s) right, but will trip up within a few exchanges. Might be a new question, but often they are very happy to just completely contradict their own previous answers.
You could argue that this is hitting alignment and guard-rails against misinformation.. but whatever the cause, it's a clear sign it's a machine and look, no em-dashes. Ironically it's also a failure of the turing test that arises from a failure in reasoning at a really basic level, which I would not have expected. Makes you wonder about the secret sauce for winning IMO competitions. Anyway, unlike other linguistic puzzles that attempt to baffle with ambiguous reference or similar, simple counterfactuals with something like colors are particular interesting because they would NOT trip up most ESL students or 3-5 year olds.
And that "not always" is the crux of the matter, I think. You are arguing that we're not there yet, because there are lines of questioning you can apply that will trip up an LLM and demonstrate that it's not a human. And that's probably a more accurate definition of the test, because Turing predicted that by 2000 or so (he wrote "within 50 years" around 1950) chatbots would be good enough "that an average interrogator will not have more than 70% chance of making the right identification after five minutes of questioning". He was off by about two decades, but by now that's probably happened. The average interrogator probably wouldn't come up with your (good) strategy of using counterfactuals to trick the LLM, and I would argue two points: 1) that the average interrogator would indeed fail the Turing test (I've long argued that the Turing test isn't one that machines can pass, it's one that humans can fail) because they would likely stick to conventional topics on which the LLM has lots of data, and 2) that the situation where people are actually struggling to distinguish LLMs is one where they don't have an opportunity to interrogate the model: they're looking at one piece of multi-paragraph (usually multi-page) output presented to them, and having to guess whether it was produced by a human (who is therefore not cheating) or by an LLM (in which case the student is cheating because the school has a rule against it). That may not be Turing's actual test, but it's the practical "Turing test" that applies the most today.
If you understand TT to be about tricking the unwary, in what's supposed to be a trusting and non-adversarial context, and without any open-ended interaction, then it's correct to point out homework-cheating as an example. But in that case TT was solved shortly after the invention of spam. No LLMs needed, just markov models are fine.
But it just occurred to me that you could ask "what color is a green eggplant?" for a similar result. Though you'd catch a lot of people who, like me before I looked up fruits that aren't green when unripe, assume that all unripe fruits are green. (They aren't, unripe eggplants are white and look like eggs, hence the name).
What color is a green eggplant?
ChatGPT:
A green eggplant (also called Thai eggplant or green brinjal, depending on the variety) is actually green on the outside — sometimes solid green, sometimes streaked with white or pale green.
The inside flesh is typically off-white or light cream, just like purple eggplants.
If you mean a green-colored variety (not an unripe purple one), then the skin color can range from pale mint green to deep jade, often with light striping.
Me:
How would an average human answer this question?
ChatGPT:
An average person would probably answer something simple and direct like:
“It’s green.”
Or if they’re being a bit more specific:
“It’s green on the outside and white inside.”
Most people wouldn’t overthink the variety or mention Thai vs. unripe types — they’d just describe what they see.
Experimental design comes in here and the one TT paper mentioned in this thread has instructions for people like "persuade the interrogator [you] are human". Answering that a green eggplant is green feels like humans trying to answer questions correctly and quickly, being wary of a trap. We don't know participants background knowledge but anyone that's used ChatGPT would know that ignoring the question and maybe telling an eggplant-related anecdote was a better strategy
Alan Turing was a mathematician not a psychologist, this was his attempt of doing philosophy. And while I applaud brilliant thinkers when they attempt to do philosophy (honestly we need more of that) it is better to leave it to actual philosophers to validate the quality of said philosophy. John Searle was a philosopher which specialized in questions of psychology. And in 1980 he pretty convincingly argued against the Turning test.
In the end though, it's probably about as good as any single kind of test could be, hence TFA looking to combine hundreds across several dozen categories. Language was a decent idea if you're looking for that exemplar of the "AGI-Complete" class for computational complexity, vision was at one point another guess. More than anything else I think we've figured out in recent years that it's going to be hard to find a problem-criteria that's clean and simple, much less a solution that is
If you look at a calculator you will quickly find it is much better then a human in any of the operations that have been programmed into the calculator, and has been since the 1960s. Since the 1960s the operations programmed into your average calculator has increased by several orders of magnitude. The digital calculator sure is impressive, and useful, but there is no crisis. Even in the world outside computing, a bicycle can outperform an human runner easily, yet there is no mobility crisis as a result. ChatGPT is very good at predicting language. And in quite a few subject matters it may be better than your average human in predicting said language. But not nearly as good as a car is to a runner, nor even as good as a chess computer is to a grand master. But if you compare ChatGPT to an expert in the subject, the expert is much much much better then the language model. In these tasks a calculator is much more impressive.
the AI bros like to talk about AGI as if it's just the next threshold for LLMs, which discounts the complexity of AGI, but also discounts their own products. we don't need an AGI to be our helpful chatbot assistant. it's fine for that to just be a helpful chatbot assistant.
That’s not my experience at all. Unless you define “typical human” as “someone who is untrained in the task at hand and is satisfied with mediocre results.” What tasks are you thinking of?
(And, to be clear, being better than that straw man of “typical human” is such a low bar as to be useless.)
I know without a single doubt that I could not simply as an “AI” “agent” to do this today and expect any sort of a functional result, especially when some of these were (very simple) judgement calls or workarounds for absolutely filthy data and a janky wedding planning website UI.
The Turing Test was great until something that passed it (with an average human as interrogator) turned out to also not be able to count letters in a word — because only a special kind of human interrogator (the "scientist or QA" kind) could even think to ask that kind of question.
I've seen claims of passing but it's always things like "with only 3 questions" or "with only 3 minutes of interrogation" or "With only questions about topic X". Those aren't Turing Tests. As an example, if you limit the test to short things than anything will pass "Limit to 1 word one question". User types "Hello", LLM response "Hi". PASS! (not!)
https://arxiv.org/pdf/2405.08007
I do think we’re going to see this shift as AI systems become more commonplace and people become more practiced at recognizing the distinction between polished text and understanding.
One could say that if there is maximally hard Turing test, and a "sloppy" Turing test, we are somewhere where the current models pass the sloppy version but not the maximally hard version.
An AI that happened to be able to pass the turing test would be pretty notable because it probably implies much more capabilities behind the scenes. The problem with, for example, LLMs, they're essentially optimized turing test takers. That's about all they can do.
Plus, I don't think any LLM will pass the turing test in the long term. Once something organically comes up that they aren't good at, it'll be fairly obvious they aren't human and the limits of context will also become apparent eventually.
(And we've been bemoaning "the lack of common sense these days" for at least as long as I've been an adult, and racists and sexists have been denying the intelligence of the outgroup as far back as writing can show us).
We know what we need right now, the next step. That step is a machine that, when it fails, it fails in a human way.
Humans also make mistakes, and hallucinate. But we do it as humans. When a human fails, you think "damn, that's a mistake perhaps me or my friend could have done".
LLMs on the other hand, fail in a weird way. When they hallucinate, they demonstrate how non-human they are.
It has nothing to do with some special kind of interrogator. We must assume the best human interrogator possible. This next step I described work even with the most skeptic human interrogator possible. It also synergizes with the idea of alignment in ways other tests don't.
When that step is reached, humans will or will not figure out another characteristic that makes it evident that "subject X" is a machine and not a human, and a way to test it.
Moving the goalpost is the only way forward. Not all goalpost moves are valid, but the valid next move is a goalpost move. It's kind of obvious.
There's no reason an equivalent or superior general intelligence needs to be similar to us at all
Besides which we already defined "artificial intelligence" to mean non-intelligence: are we now going to attain "artificial general intelligence" by the same process? Should we add another letter to the acronym, like move on to "genuine artificial general intelligence"?
From this perspective, "100% AGI" is achieved when AI can do any job that happens primarily on a computer. This can be extended to humanoid robots in the obvious way.
It is supposed to be super, but superhuman. Able to interact with us.
Which leads us to the Turing Test (also, not a test... "the imitation game" is more of a philosophical exploration on thinking machines).
My comment assumes this is already understood as Turing explained.
If the thing is not human, then there's absolutely no way we can evaluate it. There's no way we can measure it. It becomes an impossible task.
Birds and planes both fly and all
Once we step into any other problem, then you need to measure that other problem as well. Lots of problems are concerned with how an intelligent being could fail. Our society is built on lots of those assumptions.
I don't know if machines that become insecure and lash out are a good idea.
To be a valid test, it still has to be passed by ~every adult human. The harder you make the test (in any direction), the more it fails on this important axis.
https://courses.cs.umbc.edu/471/papers/turing.pdf
> A number of interrogators could be used, and statistics compiled to show how often the right identification was given
Turing determines that we need enough competent-interrogator passes just to estabilish a statistical certainty, not ~everyone. I tend to agree with him on this.
If you can find a part of the paper in which Turing really does claim that it is unnecessary for most adult humans to be able to pass the test, by all means quote it. But this would be a surprising thing for him to claim, because it would undermine the entire foundation of his Imitation Game.
I think that if you did, you wouldn't be answering like you did.
My original claim was that the Turing test needs to be passable by ~every adult human. You counterclaimed that Turing himself didn't think so, and provided that quote from the IG paper as evidence. But that quote is in a section about testing digital computers, not humans. Thus it is unconnected to your counterclaim.
I don't know how much simpler I can make it.
Find a quote that actually backs up your claim, or accept that you've learned something about the paper you told me to read.
He never denied your claim, so you concluded you must be right. A most curious way of thinking.
Edit: Probably not, since it was published less than a week ago :-) I’ll be watching for benchmarks.
In other words; in one question; is the current AI not already well beyond the “…cognitive versatility and proficiency of an uneducated child”? And when you consider that in many places like Africa, they didn’t even have a written language until European evangelists created it and taught it to them in the late 19th century, and they have far less “education” than even some of the most “uneducated” avg., European and even many American children, does that not mean that AI is well beyond them at least?
Frankly, as it seems things are going, there Is at the very least going to be a very stark shift in “intelligence” that even exceeds that which has happened in the last 50 or so years that have brought us stark drops in memory, literary knowledge, mathematics, and even general literacy, not to mention the ability to write. What does it mean that kids now will not even have to feign acting like they’re selling out sources, vetting them, contradicting a story or logical sequence, forming ideas, messages, and stories, etc.? I’m not trying to be bleak, but I don’t see tons simply resulting in net positive outcomes, and most of the negative impacts will also be happening below the surface to the point that people won’t realize what is being lost.
What's going on is AI fatigue. We see it everywhere, we use it all the time. It's becoming generic and annoying and we're getting bored of it EVEN though the accomplishment is through the fucking roof.
If elon musk makes interstellar car that can reach the nearest star in 1 second and priced it at 1k, I guarantee within a year people will be bored of it and finding some angle to criticize it.
So what happens is we get fatigued, and then we have such negative emotions about it that we can't possibly classify it as the same thing as human intelligence. We magnify the flaws and until it takes up all the space and we demand a redefinition of what agi is because it doesn't "feel" right.
We already had a definition of AGI. We hit it. We moved the goal posts because we weren't satisfied. This cycle is endless. The definition of AGI will always be changing.
Take LLMs as they exist now and only allow 10% of the population to access it. Then the opposite effect will happen. The good parts will be over magnified and the bad parts will be acknowledged and then subsequently dismissed.
Think about it. All the AI slop we see on social media are freaking masterpieces works of art produced in minutes what most humans can't even hope to come close to. Yet we're annoyed and unimpressed by them. That's how it's always going to go down.
We already had a definition of AGI. We hit it.
Are you sure about that? Which definition are you referring to? From what I can tell with Google and Grok, every proposed definition has been that AGI strictly matches or exceeds human cognitive capabilities across the board.
Generative AI is great, but it's not like you could just assign an arbitrary job to a present-day LLM, give it access to an expense account, and check in quarterly with reasonable expectations of useful progress.
There's a significant fraction of humanity that would not clear the bar to meet current AGI definitions.
The distribution of human cognitive abilities is vast and current AI systems definitely exceed the capabilities of a surprising number of people.
Maybe AGI is in the eye of the beholder, and a given person just has to decide by using their own experience of human intelligence as a benchmark. In other words, AGI is AGI whenever majority consensus decides it's a universal replacement for ordinary human intelligence.
Has anyone tried this yet?
My guess is finding agentic tooling that could even push the model hard enough to attempt that in earnest would be a problem, and the compute bills would add up quickly, but it would be a very interesting experiment. Probably with hilarious results, but seeing what the actual failure points are would be useful information. Maybe it would actually end up hiring and paying someone to do something insanely illegal.
There’s also a website showing a company’s attempt to use multiple LLMs to close its (real-world, but historical) accounting month-end books. They fail miserably with compounding errors.
The turing test.
Clearly, current LLMs have passed the Turing test, as witnessed by the difficulty many schools have in enforcing "do not use LLMs to do your homework" rules. But even Turing didn't say his test was a test of intelligence, just a test "closely related" to intelligence. And if he had seen today's LLMs, I think he would have revised that opinion, because today's LLMs generate text with no underlying fact model, no fundamental understanding of the truth behind the words they're saying. (No understanding, even, of the concepts of truth or falsehood). I think today's LLMs have demonstrated that being able to string words together in coherent sentences is not "closely related" to intelligence at all.
For reference, the term AGI post-dates the Turing test by half a century. I also don't personally remember ever hearing the exact term "artificial general intelligence" prior to 2023 or 2024, or at least it wasn't mainstream the way it is today.
If AGI had truly ever been defined by the Turing test, then Cleverbot should've been hailed as AGI when it passed the test in 2011. Even if we did all agree to call it that, we'd still need some other term for what we actually mean when we say "AGI" today. Cleverbot-era chatbots were cute toys, but they weren't capable of doing useful work of any kind.
(2) I was rebutting the paper's standard that AGI should be achieving the status of a well-educated adult, which is probably far, far too high a standard. Even something measured to a much lower standard--which we aren't at yet--would change the world. Or, going back to my example, an AI that was as intelligent as a labrador in terms of its ability to synthesize and act on information would be truly extraordinary.
By this, what I mean is. Take an image of this: https://en.wikipedia.org/wiki/Traitorous_eight#/media/File:T..., change the file name to something like image.jpg and pass it into Qwen 3 4B, 8B, 30B and look at the responses you get:
It has no idea who these guys are. It thinks they are the beatles, the doors. If you probe enough, it'll say it's IBM cofounders. In a way, it kinda sees that these are mid-1900s folks with cool haircuts, but it doesn't recognize anything. If you probe on the F the model in question becomes convinced it's the Ford racing team with a detailed explanation of two brothers in the photo, etc.
The creation of autoregressive next token predictors is very cool and clearly has and will continue to have many valuable applications, but I think we're missing something that makes interactions with users actually shape the trajectory of its own experience. Maybe scaffolding + qlora solves this. Maybe it doesn't
I'm curious when and what you consider to have been the moment.
To me, the general in AGI means I should be able to teach it something it's never seen before. I don't think I can even teach an LLM something it's seen a million times before. Long division, for example.
I don't think a model that is solid state until it's "trained" again has a very good chance of being AGI (unless that training is built into it and the model can decide to train itself).
LLMs haven't actually passed the turing test since you can trivially determine if an LLM is on the other side of a conversation by using a silly prompt (e.g. what is your system prompt).
Americans were glued to their seats watching Apollo 11 land. Most were back to watching I Dream of Jeanie reruns when Apollo 17 touched down.
* Ecosystems may require self-assembly
I agree with this but also, the output is almost entirely worthless if you can’t vet it with your own knowledge and experience because it routinely gives you large swaths of incorrect info. Enough that you can’t really use the output unless you can find the inevitable issues. If I had to put a number to it, I would say 30% of what an LLM spits out at any given time to me is completely bullshit or at best irrelevant. 70% is very impressive, but still, it presents major issues. That’s not boredom, that’s just acknowledging the limitations.
It’s like designing an engine or power source that has incredible efficiency but doesn’t actually move or affect anything (not saying LLM’s are worthless but bear with me). It just outputs with no productive result. I can be impressed with the achievement while also acknowledging it has severe limitations
Yes but that’s deliberately fiction and I know I am reading fiction. AI hallucinations are not comparable to deliberately created works of fiction. I don’t “appreciate” the fiction LLM’s serve me. They are explicitly sold as a utility, as a source of accurate information. They are literally being used to replaced google search.
Put another way: When someone claims something is true but speaks fiction it’s called a lie. If they think it’s true but are wrong it’s called a mistake. If I ask someone what is wrong with my car I don’t want 30% fiction. If I ask an LLM for a recipe I don’t want 70% accuracy.
Try to reconcile that with your ideas (that I think are correct for that matter)
This is because I use "stupidity" as the number of examples some intelligence needs in order to learn from, while performance is limited to the quality of the output.
LLMs *partially* make up for being too stupid to live (literally: no living thing could survive if it needed so many examples) by going through each example faster than any living thing ever could — by as many orders of magnitude as there are between jogging and continental drift.
I have had the unfortunate experience of having to work with people who have got a lot more than one year experience who are still worse than last year's LLMs, who didn't even realise they were bad at what they did.
LLMs have a reasonable learning rate at inference time (in-context learning is powerful), but a very poor learning rate in pretraining. And one issue with that is that we have an awful lot of cheap data to pretrain those LLMs with.
We don't know how much compute human brain uses to do what it does. And if we could pretrain with the same data-efficiency as humans, but at the cost of using x10000 the compute for it?
It would be impossible to justify doing that for all but the most expensive, hard-to-come-by gold-plated datasets - ones that are actually worth squeezing every drop of performance gains out from.
That it takes vast power to train the LLM’s (and run them) to not get intelligence is pretty bad when you compare the energy inputs to the outcomes.
Things like chess-playing skill of a machine could be bench-marked against that of a human, but the abstract feelings that drive reasoning and correlations inside a human mind are more biological than logical.
We can easily program them to have human desires instead.
Even if a machine really is conscious, we don't have enough information to ever really know if it is.
My emotions are definitely a function of the chemical soup my brain is sitting in (or the opposite).
Let me pose back to you a related question as my answer: How do you know that I feel emotions rather than merely emulating emotional behavior?
This gets into the philosophy of knowing anything at all. Descartes would say that you can't. So we acknowledge the limitation and do our best to build functional models that help us do things other than wallow in existential loneliness.
But you can propose explanations and try to falsify them. I haven’t thought about it but maybe there is a way to construct an experiment to falsify the claim that you don’t feel emotions.
Preface:
The problem of the relation between our bodies and our minds, and especially of the link between brain structures and processes on the one hand and mental dispositions and events on the other is an exceedingly difficult one. Without pretending to be able to foresee future developments, both authors of this book think it improbable that the problem will ever be solved, in the sense that we shall really understand this relation. We think that no more can be expected than to make a little progress here or there.
... well. Thanks a bunch, Karl.
Also, you don't know what species I am. Maybe I'm a dog. :-)
(https://en.wikipedia.org/wiki/On_the_Internet,_nobody_knows_...)
Also assuming psychopaths don‘t experience emotions is going going with a very fringe theory of psychology. Very likely psycopaths experience emotions, they are maybe just very different emotions from the ones you and I experience. I think a better example would be a comatose person.
That said I think talking about machine emotions is useless. I see emotions as a specific behavior state (that is you will behave in a more specific manner) given a specific pattern of stimuli. We can code our computers to do exactly that, but I think calling it emotions would just be confusing. Much rather I would simply call it a specific kind of state.
1) I know that I have emotions because I experience them.
2) I know that you and I are very similar because we are both human.
3) I know that we can observe changes in the brain as a result of our changing emotions and that changes to our brains can affect our emotions.
I thus have good reason to believe that since I experience emotions and that we are both human, you experience emotions too.
The alternative explanation, that you are otherwise human and display all the hallmarks of having emotions but do not in fact experience anything (the P-zombie hypothesis), is an extraordinary claim that has no evidence to support it and not even a plausible, hypothetical mechanism of action.
With an emotional machine I see no immediately obvious even hypothetical evidence to lend support to its veracity. In light of all this, it seems extraordinary to claim that non-biological means achieving real emotions (not emulated emotions) are possible.
After all, emulated emotions have already been demonstrated in video games. To call those sufficient would be setting an extremely low bar.
there are many parts of human cognition, phycology etc. especially related to consciousness that are known unknowns and/or completely unknown.
a mitigation for this isaue would be to call it generally applicable intelligence or something, rather than human like intelligence. implying ita not specialized AI but also not human like. (i dont see why it would need to be human like, because even with all the right logic and intelligence a human can still do something counter to all of that. humans do this everyday. intuitive action, or irrational action etc.
what we want is generally applicable intelligence, not human like intelligence.
There is reason to believe that consciousness, sentience, or emotions require a biological base.
Or
There is no reason to believe that consciousness, sentience, or emotions do not require a biological base.
The first is simple, if there is a reason you can ask for it and evaluate it's merits. Quantum stuff is often pointed to here, but the reasoning is unconvincing.
The second form There is no reason to believe P does not require Q.
There are no proven reasons but there are suspected reasons. For instance if the operation that nerons perform is what makes consciousness work, and that operation can be reproduced non-biologicLly it would follow that non biological consciousness would be possible.
For any observable phenomenon in the brain the same thing can be asked. So far it seems reasonable to expect most of the observable processes could be replicated.
None of it acts as proof, but they probably rise to the bar of reasons.
I find anyone with confident answers to questions like these immediately suspect.
Interestingly the people doing the actual envelope pushing in this domain, such as Ilya Sutskever, think that there it’s a scaling problem, and neural nets do result in AGIs eventually, but I haven’t heard them substantiate it.
This is not much different than saying that it’s possible to fly a spacecraft to another galaxy because spacecrafts exist and other galaxies exist.
Possible and practically attainable are two far different things.
(I'm asking because of your statement, "Don’t fool yourself into believing artificial intelligence is not one breakthrough away", which I'm not sure I understand, but if I am parsing it correctly, I question your basis for saying it.)
“one breakthrough away” as in some breakthrough away
Let me know when I’ll be able to buy my $30,000 car with level 5 self driving.
We only have one good example of consciousness and sentience, and that is our own. We have good reason to suspect other entities (particularly other human individuals, but also other animals) have that as well, but we cannot access it, and not even confirm its existence. As a result using these terms of non-human beings becomes confusing at best, but it will never be actually helpful.
Emotions are another thing, we can define that outside of our experience, using behavior states and its connection with patterns of stimuli. For that we can certainly observe and describe behavior of a non biological entity as emotional. But given that emotion is something which regulates behavior which has evolved over millions of years, whether such a description would be useful is a whole another matter. I would be inclined to use a more general description of behavior patterns which includes emotion but also other means of behavior regulators.
How does a computer with full AGI experience the feeling of butterflies in your stomach when your first love is required?
How does a computer experience the tightening of your chest when you have a panic attack?
How does a computer experience the effects of chemicals like adrenaline or dopamine?
The A in AGI stands for “artificial” for good reason, IMO. A computer system can understand these concepts by description or recognize some of them them by computer vision, audio, or other sensors, but it seems as though it will always lack sufficient biological context to experience true consciousness.
Perhaps humans are just biological computers, but the “biological” part could be the most important part of that equation.
There's really no reason to believe intelligence is tied to being human. Most of us accept the possibility (even the likelihood) of intelligent life in the universe, that isn't.
>human intelligence as something detached from human biology.
I don't completely agree with the previous comment, but there is something to be considered to their statement.
Given that, the constant drumbeat of pointing out how AI fails to be human, misses the mark. A lot of the same people who are making such assertions, haven't really thought about how they would quickly accept alien intelligence as legitimate and full-fledged... even though it too lacks any humanity backing it.
And why are they so eager to discount the possibility of synthetic life, and its intelligence, as mere imitation? As a poor substitute for the "real thing"? When faced with their easy acceptance of alien intelligence, it suggests that there is in fact a psychological reason at the base of this position, rather than pure rational dismissal. A desire to leave the purely logical and mechanical, and imbue our humanity with an essential spirit or soul, that maybe an alien could have, but never a machine. Ultimately, it is a religious objection, not a scientific one.
Which is it??
Feels good so we want more so you arrange your whole life and outlook to make more feel good happen. Intelligence!
This paper promises to fix "the lack of a concrete definition for Artificial General Intelligence", yet it still relies on the vague notion of a "well-educated adult". That’s especially peculiar, since in many fields AI is already beyond the level of an adult.
You might say this is about "jaggedness", because AI clearly lacks quite a few skills:
> Application of this framework reveals a highly “jagged” cognitive profile in contemporary models.
But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.
So, if that’s the case, this isn’t really a framework for AGI; it’s a framework for measuring AI along a particular set of dimensions. A more honest title might be: "A Framework for Measuring the Jaggedness of AI Against the Cattell–Horn–Carroll Theory". It wouldn't be nearly as sexy, though.
I think people get really uncomfortable trying to even tackle that, and realistically for a huge set of AI tasks we need AI that are more intelligent than a huge subset of humans for it to be useful. But there are also a lot of tasks where AI that is not needed, and we "just" need "more human failure modes".
If you compare the performance of the average human to a state-of-the-art AI model trained by top experts with a big budget, you can't make any conclusions about intelligence. For the comparison to make sense, the human should also be trained as well as reasonably possible.
Is it reasonable to invest $10 million in education of one human? Not really. One human can only do so much.
But is it reasonable to invest the same sum in training one AI, which can be replicated and used indefinitely? Or in acquiring high quality training data, which can be used to train every future AI?
I do agree that it’s a weird standard though. Many of our AI implementations exceed the level of knowledge of a well-educated adult (and still underperform with that advantage in many contexts).
Personally, I don’t think defining AGI is particularly useful. It is just a marketing term. Rather, it’s more useful to just speak about features/capabilities. Shorthand for a specific set of capabilities will arise naturally.
On the other hand, research on "common intelligence" AFAIK shows that most measures of different types of intelligence have a very high correlation and some (apologies, I don't know the literature) have posited that we should think about some "general common intelligence" to understand this.
The surprising thing about AI so far is how much more jagged it is wrt to human intelligence
If you go beyond human species (and well, computers are not even living organisms), it gets tricky. Adaptability (which is arguably a broader concept than intelligence) is very different for, say octopodes, corvids and slime molds.
It is certainly not a single line of proficiency or progress. Things look like lines only if we zoom a lot.
Current AI is in its infancy and we're just throwing data at it in the same way evolution throws random change at our DNA and sees what sticks.
It makes me think of every single public discussion that's ever been had about quantum, where you can't start the conversation unless you go through a quick 101 on what a qubit is.
As with any technology, there's not really a destination. There is only the process of improvement. The only real definitive point is when a technology becomes obsolete, though it is still kept alive through a celebration of its nostalgia.
AI will continue to improve. More workflows will become automated. And from our perception, no matter what the rapidness of advancement is, we're still frogs in water.
It's a very emotional topic because people feel their self image threatened. It's a topic related to what is the meaning of being human. Yeah sure it should be a separate question, but emotionally it is connected to it in a deep level. The prospect of job replacement and social transformation is quite a threatening one.
So I'm somewhat understanding of this. It's not merely an academic topic, because these things will be adopted in the real world among real people. So you can't simply make everyone shut up who is an outsider or just heard about this stuff incidentally in the news and has superficial points to make.
Probably because everyone has first hand experience of the above. I'm not sure it's arrogant to talk about those, even without 'source material'.
Just like an airplane doesn't work exactly like a bird, but both can fly.
[1] https://andreinfante.substack.com/p/when-will-ai-transform-t...
Assume the Riemann hypothesis is false. Then, consider the proposition "{a|a∉a}∈{a|a∉a}". By the law of the excluded middle, it suffices to consider each case separately. Assuming {a|a∉a}∈{a|a∉a}, we find {a|a∉a}∉{a|a∉a}, for a contradiction. Instead, assuming {a|a∉a}∉{a|a∉a}, we find {a|a∉a}∈{a|a∉a}, for a contradiction. Therefore, "the Riemann hypothesis is false" is false. By the law of the excluded middle, we have shown the Riemann hypothesis is true.
Naïve AGI is an apt analogy, in this regard, but I feel these systems aren't simple nor elegant enough to deserve the name naïve.
When I see your comment, I think, your assumptions are contradictory. Why? Because I am familiar with Russell's paradox and Riemann hypothesis, and you're simply WRONG (inconsistent with your implicit assumptions).
However, when LLM sees your comment (during training), it's actually much more open-minded about it. It thinks, ha, so there is a flavor of set theory in which RH is true. Better remember it! So when this topic comes up again, LLM won't think - you're WRONG, as human would, it will instead think - well maybe he's working with RH in naive set theory, so it's OK to be inconsistent.
So LLMs are more open-minded, because they're made to learn more things and they remember most of it. But somewhere along the training road, their brain falls out, and they become dumber.
But to be smart, you need to learn to say NO to BS like what you wrote. Being close-minded and having an opinion can be good.
So I think there's a tradeoff between ability to learn new things (open-mindedness) and enforcing consistency (close-mindedness). And perhaps AGI we're looking for is a compromise between the two, but current LLMs (naive AGI) lies on the other side of the spectrum.
If I am right, maybe there is no superintelligence. Extremely open-minded is just another name for gullible, and extremely close-minded is just another name for unadaptable. (Actually LLMs exhibit both extremes, during the training and during the use, with little in between.)
Also, weird to see Gary Marcus and Yoshua Bengio on the same paper. Who really wrote this? Author lists are so performative now.
This paper, for example, uses the 'dual N-back test' as part of its evaluation. In humans this relates to variation in our ability to use working memory, which in humans relates to 'g'; but it seems pretty meaningless when applied to transformers -- because the task itself has nothing intrinsically to do with intelligence, and of course 'dual N-back' should be easy for transformers -- they should have complete recall over their large context window.
Human intelligence tests are designed to measure variation in human intelligence -- it's silly to take those same isolated benchmarks and pretend they mean the same thing when applied to machines. Obviously a machine doing well on an IQ test doesn't mean that it will be able to do what a high IQ person could do in the messy real world; it's a benchmark, and it's only a meaningful benchmark because in humans IQ measures are designed to correlate with long-term outcomes and abilities.
That is, in humans, performance on these isolated benchmarks is correlated with our ability to exist in the messy real-world, but for AI, that correlation doesn't exist -- because the tests weren't designed to measure 'intelligence' per se, but human intelligence in the context of human lives.
But maybe that's ASI. Whereas I consider chatgpt 3 to be "baby AGI". That's why it became so popular so fast.
ChatGPT became popular because it was easy to use and amusing. (LLM UX until then had been crappy.)
Not sure AGI aspirations had anything to do with uptake.
That 10-axis radial graph is very interesting. Do others besides this author agree with that representation?
The weak points are speed and long-term memory. Those are usually fixable in computing system. Weak long-term memory indicates that, somehow, a database needs to be bolted on. I've seen at least one system, for driving NPCs, where, after something interesting has happened, the system is asked to summarize what it learned from that session. That's stored somewhere outside the LLM and fed back in as a prompt when needed.
None of this addresses unstructured physical manipulation, which is still a huge hangup for robotics.
I think it'll be a steep sigmoid function. For a long time it'll be a productivity booster, but not enough "common sense" to replace people. We'll all laugh about how silly it was to worry about AI taking our jobs. Then some AI model will finally get over that last hump, maybe 10 or 20 years from now (or 1000, or 2}, and it will be only a couple months before everything collapses.
A specific key opens a subset of locks, a general key would open all locks. General intelligence, then, can solve all solvable problems. It's rather arrogant to suppose that humans have it ourselves or that we can create something that does.
Edit: toned down the preachiness.
Edit due to rate-limiting, which in turn appears to be due to the inexplicable downvoting of my question: since you (JumpCrisscross) are imputing a human-like motivation to the model, it sounds like you're on the side of those who argue that AGI has already been achieved?
Lying != fallibility.
Is it about jobs/tasks, or cognitive capabilities? The majority of the AI-valley seems to focus on the former, TFA focuses on the latter.
Can it do tasks, or jobs? Jobs are bundles of tasks. AI might be able to do 90% of tasks for a given job, but not the whole job.
If tasks, what counts as a task: Is it only specific things with clear success criteria? That's easier.
Is scaffolding allowed: Does it need to be able to do the tasks/jobs without scaffolding and human-written few-shot prompts?
Today's tasks/jobs only, or does it include future ones too? As tasks and jobs get automated, jobs evolve and get re-defined. So, being able to do the future jobs too is much harder.
Remote only, or in-person too: In-person too is a much higher bar.
What threshold of tasks/jobs: "most" is apparently typically understood to mean 80-95% (Mira Ariel). Automating 80% of tasks is different to 90% and 95% and 99%. diminishing returns. And how are the tasks counted - by frequency, by dollar-weighted, by unique count of tasks?
Only economically valuable tasks/jobs, or does it include anything a human can do?
A high-order bit on many people's AGI timelines is which definition of AGI they're using, so clarifying the definition is nice.
If it does an hour of tasks, but creates an additional hour of work for the worker...
I can't begin to count the number of times I've encountered someone who holds an ontological belief for why AGI cannot exist and then for some reason formulates it as a behavioralist criteria. This muddying of argument results in what looks like a moving of the goalposts. I'd encourage folks to be more clear whether they believe AGI is ontologically possible or impossible in addition to any behavioralist claims.
The "Turing test" I always saw described in literature, and the examples of what passing output from a machine was imagined to look like, are nothing like what's claimed to pass nowadays. Honestly, a lot of the people claiming that contemporary chatbots pass come across like they would have thought ELIZA passed.
With today's chat bots, it's absolutely trivial to tell that you're not talking to a real human. They will never interrupt you, continue their train of thought even thought you're trying to change the conversation, go on a complete non-sequitur, swear at you, etc. These are all things that the human "controls" should be doing to prove to the judges that they are indeed human.
LLMs are nowhere near beating the Turing test. They may fool some humans in some limited interactions, especially if the output is curated by a human. But left alone to interact with the raw output for more than a few lines, and if actively seeking to tell if you're interacting with a human or an AI (instead of wanting to believe), there really is no chance you'd be tricked.
So in that sense it's a triviality. You can ask ChatGPT whether it's human and it will say no upfront. And it has various guardrails in place against too much "roleplay", so you can't just instruct it to act human. You'd need a different post-training setup.
I'm not aware whether anyone did that with open models already.
Post training them to speak like a bot and deny being human has no effect on how useful they are. That's just an Open AI/Google/Anthropic preference.
>If you take the raw model, it will actually be much worse at the kinds of tasks you want it to perform
Raw models are not worse. Literally every model release paper that compares both show them as better at benchmarks, if anything. Post training degrading performance is a well known phenomena. What they are is more difficult to guide/control. Raw models are less useful because you have to present your input in certain ways, but they are not worse performers.
It's besides the point anyways because again, you don't have to post train them to act as anything other than a human.
>If their behavior needs to be restricted to actually become good at specific tasks, then they can't also be claimed to pass the Turing test if they can't within those same restrictions.
Okay, but that's not the case.
This is exactly what I was referring to.
But that is exactly the point of the Turing test.
If someone really wants to see a Turing-passing bot, I guess someone could try making one but I'm doubtful it would be of much use.
Anyways,people forget that the thought experiment by Turing was a rhetorical device, not something he envisioned to build. The point was to say that semantic debates about "intelligence" are distractions.
When deepmind was founded (2010) their definition was the following: AI is a system that learns to perform one thing; AGI is a system that learns to perform many things at the same time.
I would say that whatever we have today, "as a system" matches that definition. In other words, the "system" that is say gpt5/gemini3/etc has learned to "do" (while do is debateable) a lot of tasks (read/write/play chess/code/etc) "at the same time". And from a "pure" ML point, it learned those things from the "simple" core objective of next token prediction (+ enhancements later, RL, etc). That is pretty cool.
So I can see that as an argument for "yes".
But, even the person who had that definition has "moved the goalposts" of his own definition. From recent interviews, Hassabis has moved towards a definition that resembles the one from this paper linked here. So there's that. We are all moving the goalposts.
And it's not a recent thing. People did this back in the 80s. There's the famous "As soon as AI does something, it ceases to be AI" or paraphrased "AI is everything that hasn't been done yet".
What counts as a "thing"? Because arguably some of the deep ANNs pre-transfomers would also qualify as AGI but no one would consider them intelligent (not in the human or animal sense of intelligence).
And you probably don't even need fancy neural networks. Get a RL algorithm and a properly mapped solution space and it will learn to do whatever you want as long as the problem can be mapped.
----
In 2010, one of the first "presentations" given at Deepmind by Hassabis, had a few slides on AGI (from the movie/documentary "The Thinking Game"):
Quote from Shane Legg: "Our mission was to build an AGI - an artificial general intelligence, and so that means that we need a system which is general - it doesn't learn to do one specific thing. That's really key part of human intelligence, learn to do many many things".
Quote from Hassabis: "So, what is our mission? We summarise it as <Build the world's first general learning machine>. So we always stress the word general and learning here the key things."
And the key slide (that I think cements the difference between what AGI stood for then, vs. now):
AI - one task vs. AGI - many tasks
at human level intelligence.
For reference, the average chesscom player is ~900 elo, while the average FIDE rated player is ~1600. So, yeah. Parrot or not, the LLMs can make moves above the average player. Whatever that means.
At first, just playing chess was considered to be a sign of intelligence. Of course, that was wrong, but not obvious at all in 1950.
When I was in college ~25 years ago, I took a class on the philosophy of AI. People had come up with a lot of weird ideas about AI, but there was one almost universal conclusion: that the Turing test is not a good test for intelligence.
The least weird objection was that the premise of the Turing test is unscientific. It sees "this system is intelligent" as a logical statement and seeks to prove or disprove it in an abstract model. But if you perform an experiment to determine if a real-world system is intelligent, the right conclusion for the system passing the test is that the system may be intelligent, but a different experiment might show that it's not.
> we ground our methodology in Cattell-Horn-Carroll theory, the most empirically validated model of human cognition.
Do you know what's more frustrating, though? Focusing so heavily on definitions that we miss the practicality of it (and I'm guilt of this at times too).
We can debate definitions of AGI, but given that we don't know what a new model or system is capable of until its built and tested in the real world we have more serious questions in my opinion.
Debates over AI risk, safety, and alignment are still pretty uncommon and it seems most are happy enough to accept Jevons Paradox. Are we really going to unleash whatever we do build just to find out after the fact whether or not its AGI?
So, surely those IQ-related tests might be acceptable rating tools for machines and they might get higher scores than anyone at some point.
Anyway, is the objective of this kind of research to actually measure the progress of buzzwords, or amplify them?
This is bad definition, because human baby is already AGI when it's born and it's brain is empty. AGI is the blank slate and ability to learn anything.
We are born with inherited "data" - innate behaviors, basic pattern recognition, etc. Some even claim that we're born with basic physics toolkit (things are generally solid, they move). We then build on that by being imitators, amassing new skills and methods simply by observation and performing search.
I think it's the other way around: you build a system that first and foremost _learns_ as part of its fundamental function, _then_ you train it in the domain you want expertise.
You're not going to get expertise in all domains all the time, just like with people. And you're not going to get a perfect slave either, just like with humans. You'll probably get something more like in between a human and machine. If that's what you really want, great.
To put this another way, if you neglect your kids, they're still going to learn things, just probably not things you want them to learn. If you neglect your language model it's just not going to do anything.
So far, LLMs aren't even remotely close to this, as they only do what they are told to do (directly or otherwise), they can't learn without a costly offline retraining process, they do not care in the slightest what they're tasked with doing or why, and they do not have anything approximating a sense of self beyond what they're told to be.
- It's autonomous
- It learns (not retraining, but true learning)
- By definition some semblance of consciousness must arise
This is why I think we're very far from anything close to this. Easily multiple decades if not far longer.
It's a similar debate with self driving cars. They already drive better than most people in most situations (some humans crash and can't drive in the snow either for example).
Ultimately, defining AGI seems like a fools errand. At some point the AI will be good enough to do the tasks that some humans do (it already is!). That's all that really matters here.
What matters to me is, if the "AGI" can reliably solve the tasks that I give to it and that requires also reliable learning.
LLM's are far from that. It takes special human AGI to train them to make progress.
How many humans do you know that can do that?
Once they can ... I am open to revisit my assumptions about AGI.
It is a valuable contribution but the CHC theory from psychology that this is based on is itself incomplete.
By commonsense physics, I mean something like simulating interactions of living and non-living entities in 3D over time. Seems more complicated than the examples in the web site and in most tests used in psychometrics.
Creative problem solving with cognitive leaps required for truly novel research & invention could lie outside the rubrics as well. The criteria in CHC are essential but incomplete I believe.
Seems most of the people one would encounter out in the world might not posses AGI, how are we supposed to be able to train our electrified rocks to have AGI if this is the case?
If no one has created a online quiz called "Are you smarter than AGI?" yet based on the proposed "ten core cognitive domains", I'd be disappointed.
Right. That explains it.
[1]: The capability to continually learn new information (associative, meaningful, and verbatim). (from the publication)
an entity which is better than any human at any task.
Fight me!
I have 2 files. One is a .pdf . The other is a .doc . One file has a list of prices and colors in 2 columns. The other file has a list of colors and media in 2 columns. There are incomplete lists here and many to one matching.
To me, if I can verbally tell the AI to give me a list of prices and media from those two files, in a .csv file, and it'll ask back some simple questions and issues that it needs cleaned up to accomplish this, then that is AGI to me.
It is an incredibly simple thing for just about any middle school graduate.
And yet! I have worked with PhDs that cannot do this. No joke!
Something this simple, just dead running numbers, dumb accounting, is mostly beyond us.
seems pretty unfair to exclude motor skills, especially given 1) how central they are to human economic activity, and 2) how moravec's paradox tells us they are the hard part.
> defining AGI as matching the cognitive versatility and proficiency of a well-educated adult.AGI was already here the day ChatGPT released: That's Peter Norvig's take too: https://www.noemamag.com/artificial-general-intelligence-is-...
Prerequisite for recursive self-improvement and far short of ASI, any conception of AGI really really needs to be expanded to include some kind of self-model. This is conspicuously missing from TFA. Related basic questions are: What's in the training set? What's the confidence on any given answer? How much of the network is actually required for answering any given question?
Partly this stuff is just hard and mechanistic interpretability as a field is still trying to get traction in many ways, but also, the whole thing is kind of fundamentally not aligned with corporate / commercial interests. Still, anything that you might want to call intelligent has a working self-model with some access to information about internal status. Things that are mentioned in TFA (like working memory) might be involved and necessary, but don't really seem sufficient
To some people this is self-evident so the terms are equivalent, but it does require some extra assumptions: that the AI would spend time developing AI, that human intelligence isn't already the maximum reachable limit, and that the AGI really is an AGI capable of novel research beyond parroting from its training set.
I think those assumptions are pretty easy to grant, but to some people they're obviously true and to others they're obviously false. So depending on your views on those, AGI and ASI will or will not mean the same thing.
What I find cool about the paper is that they have gathered folks from lots of places (berkley, stanford, mit, etc). And no big4 labs. That's good imo.
tl;dr; Their definition: "AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult."
Cool. It's a definition. I doubt it will be agreed on by everyone, and I can see endless debates about just about every word in that definition. That's not gonna change. At least it's a starting point.
What I find interesting is that they specifically say it's not a benchmark, or a test set. It's a framework where they detail what should be tested, and how (with examples). They do have a "catchy" table with gpt4 vs gpt5, that I bet will be covered by every mainstream/blog/forum/etc out there -> gpt5 is at ~50% AGI. Big title. You won't believe where it was one year ago. Number 7 will shock you. And all that jazz.
Anyway, I don't think people will stop debating about AGI. And I doubt this methodology will be agreed on by everyone. At the end of the day both extremes are more ideological in nature than pragmatic. Both end want/need their view to be correct.
I enjoyed reading it. Don't think it will settle anything. And, as someone posted below, when the first model will hit 100% on their framework, we'll find new frameworks to debate about, just like we did with the turing test :)
Is a 9 year old child generally intelligent? What about a high school drop out? Someone with a 90 IQ? A large percentage of people who ever lived wouldn't qualify as being generally intelligent with that benchmark.
You could easily write the reverse of this paper that questions whether human beings have general intelligence by listing all the things that LLMs can do, which human beings can't -- for example producing a reasonably accurate summary of a paper in a few seconds or speaking hundreds of different languages with reasonable fluency.
You can always cherry pick stuff that humans are capable that LLMs are not capable of and vice versa, and and I don't think there is any reason to privilege certain capabilities over others.
I personally do not believe that "General Intelligence" exists as a quantifiable feature of reality, whether in humans or machines. It's phlogiston, it's the luminiferous ether. It's a dead metaphor.
I think what is more interesting is focusing on _specific capabilities_ that are lacking and how to solve each of them. I don't think it's at all _cheating_ to supplement LLM's with tool use, RAG, the ability to run python code. If intelligence can be said to exist at all, it is as part of a system, and even human intelligence is not entirely located in the brain, but is distributed throughout the body. Even a lot of what people generally think of as intelligence -- the ability to reason and solve logic and math problems typically requires people to _write stuff down_ -- ie, use external tools and work through a process mechanically.
Cattell-Horn-Carroll theory, like a lot of psychometric research, is based on collecting a lot of data and running factor analysis (or similar) to look for axes that seem orthogonal.
It's not clear that the axes are necessary or sufficient to define intelligence, especially if the goal is to define intelligence that applies to non-humans.
For example reading and writing ability and visual processing imply the organism has light sensors, which it may not. Do all intelligent beings have vision? I don't see an obvious reason why they would.
Whatever definition you use for AGI probably shouldn't depend heavily on having analyzed human-specific data for the same reason that your definition of what counts as music shouldn't depend entirely on inferences from a single genre.