skip to content

Search

History of Deep Learning and Neural Nets

3 min read

Most engineers know stories about the nueral nets of old. Some of them are even true!

AI Only Means One Thing Anymore

Experts in the field really love when you just use the term “AI”, because there’s a big difference in technique between machine learning and deep learning. Both are AI, and no one gives a shit because we’re in the age of the LLM. Henceforth, AI will only ever mean that…but they are very different technologies.

The First Deep Learning Networks

There’s a very, very common story about the military (which military isn’t always clear) testing early neural networks, training them to identify tanks. While it appeared successful, a change in evrionment ruins the algorithm and the idea is that they’ve merely “trained” the network to identify a sunny sky, or something like that.

It’s an urban legend. It’s also a great example of just how long neural nets have been around and how long they’ve been a fixture of interest (and skepticism).

Since the 1950s, Deep Learning networks were mostly academic concepts. Synthetic neurons that did interesting things for reasons that we do not fully understand.

Now that we’re in the era of the LLM…they are still interesting things that do stuff for reasons that we do not fully understand, only we’ve managed to do it at a bigger scale.

Transformers and Attention

In 2017, Google released a paper describing the idea of the “transformer”. The idea of the transformer builds on the idea of vector space representations, but adds the idea of context and “attention”. To compute the next word in a given sentence, it considers every other word in the sentence, not just the previous word. Not only that, each word is weighted with an “attention” score, telling the model how important given words are in calculating the next word. Since it doesn’t process strings “word by word” it can operate with fixed steps and take greater advantage of parallelization in modern compute environments.

So not only does it yield more accurate results, it’s more performant, too.

If you were to boil all the complexity down, they made a way to shove internet-scale data into fake neurons. Like a Pachinko machine, your prompt falls across hundreds of millions of points and ends up someplace inherently random.

But Why does it Work…?

Nobody knows. What does “work” even mean? They’ve barely started to peak into the “minds” of LLMs, and the way they “think” is…sketch.

For example, when they do math? They basically guess, then lie about their work.

Gee, go figure, a bunch of randomly weighted synthetic neurons doing stuff we don’t fully understand yields unpredictable results. Consider that fact that they basically pushed the entire Internet into its “brain” and its a miracle it says anything useful at all. Knowing how the technology works, it isn’t a surprise that “vibe coding” quickly falls apart.