ChatGPT, Claude, Mistral.
The well-known names (apart from the unrelated Mistral) of this AI boom are different from the “previous one.”
“Previous one” is in quotes because it’s actually somewhat difficult to define when it occurred. Yes, indeed, we’re going on a history lesson!
The interesting part of it is we start from… kind of where we end (though this characterization isn’t even quite right, as we’ll discuss).
1960-1970s: Proto-Neural Networks—the biology era
Perceptrons, which layered and with a lot of math improvements, become modern neural networks, were invented all the way back in 1943. People really got excited after Rosenblatt’s paper in 1958 expanded the concept into hardware and basically talked it up to the degree that people expected it to suddenly become SkyNet (if Terminator had been invented then).
The Navy revealed the embryo of an electronic computer today that it expects will be able to walk, talk, see, write, reproduce itself and be conscious of its own existence.
Obviously, they were set to be disappointed and, if any of the staff writers are still around who penned that article, still have not witnessed the birth of that embryo.
During this period, Rosenblatt was basically the charismatic founder-type (as an academic) who sustained interest despite the lack of any meaningful results. His entire concept was that this was basically a good model for how brains worked—so if we kept pushing it, we’d basically realize human-level artificial intelligence.
That ended with the next generation of AI (and the source of many of the names I mentioned at the beginning). Though I normally don’t love citing Wikipedia, whoever wrote this statement captured the sentiment quite well about a specific book, called Perceptrons: an introduction to computational geometry.
This book is the center of a long-standing controversy in the study of artificial intelligence. It is claimed that pessimistic predictions made by the authors were responsible for a change in the direction of research in AI, concentrating efforts on so-called "symbolic" systems, a line of research that petered out and contributed to the so-called AI winter of the 1980s, when AI's promise was not realized.
Basically, the potential of AI was “knifed in the back” by this rather boring-sounding book. Of course, the computational power and tools (like backpropagation) that would make this approach work still didn’t exist—so no one could train any networks of any substantial size—so it’s not like the Perceptron method was right on the cusp of working either. Additionally, the philosophical underpinnings were different. A lot of the motivation here was talking about modeling biology, which made it difficult to think about a notion like backpropagation, which is purely a mathematical contrivance.
1980-1990s: Symbolic Systems/Knowledge-Based AI
Strictly speaking, “symbolic AI” rather than “biological AI” of Perceptrons did not neatly separate into decades. John McCarthy published his paper on LISP in 1960, though it had been in development through the 1950s, and was based on the even older λ-calculus from the 1930s—as programmer LISP fanboys know well.
However, the big idea here is that the lists (which is the LIS in LISt Processor) can use a bunch of recursive expressions (called S-expressions) and functions (S-functions) to create a universal Turing machine. This was a symbolic direction for creating an intelligent system.
Along with the failures of the Perceptron direction, this was what drove the explosion of Lisp machines in the 1980s, evolving into expert systems—which hold the names of the Dendral, MYCIN, and Mistral that I cited at the beginning.
There were many ways of representing these symbolic, knowledge-based AIs, from just a giant tree of rules, to frames, objects, (symbolic) reasoning, etc.
One of the interesting aspects is that AIs of this era were still separated into “training” (inputting knowledge) and inferencing/control.
The big difference is, as indicated by the name, knowledge was at the center. All of these systems basically required a lot of manual input from experts (… which is the expert in expert systems) and were generally poor at handling variability. This inflexibility and sheer expense of constantly calling up experts to put their knowledge in is what ultimately doomed this generation of expert systems/knowledge-based AI to the dustbin of history by the time the Internet rose in the mid-1990s.
1990s-2000s: “AI Winter”
If you talk to engineers at Google, Amazon, and the like, AI never really “died.” Recommender engines, search algorithms, etc. All of these were essentially statistical learning, which became machine learning, which became modern AI.
There isn’t much I have to say here because this is probably the clearest version of an “AI winter” where AI was almost an eye-roll worthy way of describing what people did. Given this is where I came of age, it is still some of my mode.
2015+: AI Viability with “Whatever Works”
Those who know me well, know that this benchmark chat is the one I constantly pound the table with. This is from the EFF’s now-archived AI benchmarks.
Part of what I’ve said is that around 2015-2016, AI finally became commercially viable. This… isn’t strictly true. As said above, it’s not like AI wasn’t being used commercially by the large Internet companies. However, what we would today call AI, like with computer vision (which is the domain of ImageNet), started to hit human-levels of capability for restricted problem domains around this timeframe.
This was the real turning point, and was part of a philosophical turning point as well.
For 40 years, Hinton has seen artificial neural networks as a poor attempt to mimic biological ones. Now he thinks that’s changed: in trying to mimic what biological brains do, he thinks, we’ve come up with something better. “It’s scary when you see that,” he says. “It’s a sudden flip.” (MIT Technology Review)
One of the key changes at this time is the field went with “whatever works.” There used to be a greater emphasis on having a sounder theoretical basis in biology or symbolic systems/knowledge engineering. At this point, we just throw stuff into our models and see what works, and use whatever mathematical tools are necessary to do so.
In a way, the key is we went from biological systems → knowledge systems → whatever works (but a lot from the original biological systems).
We finally had the computational power and mathematical tools to make the original artificial neural network concept actually work. The form we took here no longer looks much like the biological systems we modeled them on in the 1960s and 1970s (what the heck is a Transformer in biological terms?), but basically, whatever works has become the way we’ve gone.
What comes next?
More data, more computation, better representations/tools. That’s what has continued to drive AI forward.
We also were driven forward by an abandonment of knowledge-based AI. Those systems were inflexible and brittle. However, they did have something that our modern AI doesn’t have—knowledge.
Ultimately, hallucinations have been criticized within LLMs (and even without the term, sometimes dumb and unpredictable results out of neural networks of other kinds). However, these happen because of the strengths of LLMs and modern AI. Modern AI can generalize to data that it’s never seen because it holds its knowledge quite loosely—it’s just a trained on it, but it never “memorized” it per se.
This is in direct contrast with knowledge-based AI which does hold knowledge at the center, which also contributes to its inflexibility.
Different startups and researchers have been able to “reduce hallucinations,” but it’s important to remember within this overall framework that hallucinations are part of what modern AI actually powerful. It’s not a knowledge or memorization machine, and it just “models” on existing knowledge—an approach that is often powerful in “novel” situations, but can sometimes go awry.
Gary Marcus thinks that we’ll have to go back to symbolic/knowledge-based AI, at least to some degree. (He has a Substack, by the way!) Others obviously think that just more data and more compute will have miraculous results (which I’ve expressed my skepticism about). Who knows, maybe we will have yet another swing around the circle on my diagram above that returns to some level of symbolic systems and knowledge engineering.
Either way, we’ll have to see. It’s interesting to see how far we’ve come through the journey we’ve taken in AI, and it’s still fast-evolving.
Why the skepticism? Since 2017, it’s been all about more data and more compute.