“You cannot make a machine to think for you”.
This was the beginning of a talk held in Manchester, England on one, presumably rainy, afternoon of 1951. The talk, held by one Alan Turing, was meant to present his “Heretical Theory”, as he himself called it.
And heretical it was, as the father of modern computing went on to explain why that very first sentence of his speech was completely wrong. Turing presented a paper he wrote a few years earlier in which he explained that computers could be used to mimic human brains.
Now, an interesting thing is that he was not alone in his heresy. Throughout the 1940’s mathematicians and neuroscientist joined hands in trying to create models and theories of how a human brain works.
The whole concept was best described in a theory made by a neuroscientist Donald Hebb. The Hebbian theory claims that connections between neurons grow in strength as we use them together, and grow weaker when we don’t. In simpler terms, “Cells that fire together, wire together.”
This led to creation of a fascinating concept of so-called “artificial neural networks”, a highly theoretical idea at the beginning, that we nowadays routinely use more and more. Naturally, this leads us to a fairly simple question, with quite a complex answer:
What are neural networks?
This question is so complex in fact, that it would be best if we took a bit of a roundabout way. Let’s first discuss the organ that Hebb, Turing and the others proposed we try and emulate:
The incredible human brain
Modern computers are amazing machinery that can crunch huge numbers and process an astounding amount of data. Scientists that work in the most advanced computer labs have access to huge supercomputers, which can process trillions of machine operations in a single second.
While these numbers might be mind-boggling, the capabilities of even the most advanced of these beasts are nowhere near what an ordinary human brain can achieve. This might sound crazy but if you think about it, there is ONLY one area where computers are better than humans: Pure math.
Everything else they do, has to be derived from math, which makes some of the stuff we take for granted nearly impossible for machines. Things like image and voice recognition, moving our limbs, and of course, our sentience and imagination.
Let’s take a simple example to show what actually goes on in our brains: Catching a ball.
First, our eyes have to take in the image and send it to brain which then analyzes it (did you know that our eyes actually see upside down? It is corrected in post-processing). The part of our brain dedicated to this recognizes the shapes and colors and says: that’s a ball.
Now, our brain has to track that object in real time, while at the same time it sends signal to our arms, which means that thousands of neurons throughout our body fire up – specific muscles extend, others contract, and our arms move.
During this whole process, brain makes a series of quick approximations in real time of our own position, speed, and the position of our limbs, while at the same time it tracks tracking speed, position of the ball, and where our arms are in the relation to said ball.
Brain or a computer?
It would take us hours to calculate the ball catching by using math, an event that takes as maybe a second or two in real time. So how does our brain do it? Is it anything like a computer?
Paul King, a computational neuroscientist, has this to say on this subject:
“Unlike a digital computer, the brain does not use binary logic or binary addressable memory, and it does not perform binary arithmetic. Information in the brain is represented in terms of statistical approximations and estimations rather than exact values. The brain is also non-deterministic and cannot replay instruction sequences with error-free precision. So in all these ways, the brain is definitely not “digital”.”
This means that our brain does not deal in absolutes. We estimate things, and as we approach the target, these estimates get closer and closer to true value. This is why we have to train hard to be able to repeat the same movement, and even then, it’s never quite the same. Fortunately for our future computing, it turns out that our brains are digital, in a way, as we can see in the second part of that King’s quote:
“At the same time, the signals sent around the brain are “either-or” states that are similar to binary. A neuron fires or it does not. These all-or-nothing pulses are the basic language of the brain. So in this sense, the brain is computing using something like binary signals. Instead of 1s and 0s, or “on” and “off”, the brain uses “spike” or “no spike” (referring to the firing of a neuron).”
This is what neural networks are. They simulate our brains by making a network of artificial, digital neurons (these are essentially little programs that are modeled after organic neurons), which can adapt by making certain connections stronger or weaker.
We even managed to create a simulation of one second of brain activity. The simulation took 40 minutes and it used a neural network of 1,5 billion digital neurons.
Brain has around 100 billion neurons (a number eerily similar to the estimated number of stars in our galaxy).
So, in essence, one of the most advanced supercomputers in the world took over 40 minutes to simulate one second of only 1% of the human brain. If that is not crazy enough, consider the power requirements:
The K computer used for this simulation uses 9.89 megawatts. That is 9.89 million watts, or what a town of around 20 000 people would consume. Brain on the other hand consumes less than 20. Watts. Not millions, not thousands, just 20 Watts.
We are nowhere near to unlocking the secrets of how our brains work exactly and what are the reasons for incredible efficiency we get from it. However, we have still learned a lot over the years and have been able to apply it to the way we program our computers.
How do artificial neural networks work?
In a way, computers are inherently stupid, so if we want to do anything really complex, like image and sound recognition, or real-time adaptations, we are better off if we use neural networks instead of standard algorithms.
When a programmer writes a code, he gives very specific instructions and a very specific order in which a program has to execute them. Let’s observe a great example where this turns out to be way too difficult, but neural networks step in as a wonderful solution:
Michael Nielsen has written an amazing book about neural networks and the handwriting example is shamelessly stolen from it. Here is what he has to say on the difficulties of identifying handwriting, and other visual cues.
“Most people effortlessly recognize those (written) digits. That ease is deceptive. In each hemisphere of our brain, humans have a primary visual cortex, also known as V1, containing 140 million neurons, with tens of billions of connections between them. And yet human vision involves not just V1, but an entire series of visual cortices – V2, V3, V4, and V5 – doing progressively more complex image processing. We carry in our heads a supercomputer, tuned by evolution over hundreds of millions of years, and superbly adapted to understand the visual world.
Recognizing handwritten digits isn’t easy. Rather, we humans are stupendously, astoundingly good at making sense of what our eyes show us. But nearly all that work is done unconsciously. And so we don’t usually appreciate how tough a problem our visual systems solve.”
Why is this so complicated? Why do computers have a problem with it?
The answer lies in how difficult it would be to define a shape of any number in an algorithm. For example, a four has three lines, but the vertical one can either go all the way to the top, or just until it meets the line that is perpendicular to it. The third line is at an angle, but we would have to set up an impossible amount of definitions and exclusions to cover the ways how everyone writes it.
How long is each line? What angle is between them, where do they intersect, how curved are they, and so on, and so on.
Simple math behind the neural networks
Much like the real brain, in order for a neural network to operate, we need artificial neurons. And, in the same way the evolution works, the artificial neuron came first. The Threshold Logic Unit (TLU) was created in 1943 by a neuroscientist Warren McCulloch and a mathematician Walter Pitts. Just to further illustrate how intertwined early research of this field was, the main point of their work was that a neuron can be described as a simple digital processor, and the whole brain can be treated as a Turing machine (a very early, primitive version of a computer Turing invented in 1936).
Since this preceded Turing’s “intelligent machinery”, it is obvious that their work was also strictly theoretical. One of the most popular early neuron models was the so-called perceptron (created in 1958 by a psychologist Frank Rosenblatt), and it is probably best to explain the whole concept by examining how it works.
Perceptron is a fairly simple mathematical function that takes several binary inputs and converts them to a single binary output. Now, the whole trick here lies in the rules on how it decides whether the output is a one or a zero (binary system means we have two values, usually ones and zeros). It does this based on the weights and the threshold.
Suppose we have a perceptron that has three binary inputs. We can assign different level of importance to these inputs, or “weigh” them. On the other hand, the threshold gives us a value that these inputs have to top in order for our output to be one, instead of a zero.
Say you wanted to order a pizza, and there were three factors that come in play:
- How recently you’ve eaten it.
- How hungry you are.
- How much money you have.
These would be the inputs that we can weigh. Let’s say that money in general is not an issue for you, and that you don’t have much problem with eating same food often. In that case we could assign weight values of w1=2 and w2=2 respectively. However, let’s say that hunger really plays a huge part in whether you want to order pizza, and assign the weight of w3=5 for it. This means that hunger is more important than the other two factors combined.
Now, suppose that you’re not a huge fan of pizza, and set the threshold at 7. This means that you would order pizza ONLY if you were hungry and at least one other factor was true. However, if we observe the more probable case, and say that you love pizza, we can set the threshold to be two. This means that you don’t have to be hungry at all to order pizza. In fact, you would only need enough money, or that you haven’t eaten it recently to decide and order one pie.
While this is not a perfectly accurate example, it illustrates how a simple artificial neuron functions. In short, these neurons are models, or functions, that assign certain values to inputs and then decide whether the output is one or a zero.
How do artificial neurons interact?
Obviously, human decision making is far too complex and subtle to be simulated by a single mathematical function. The inputs and outputs of the artificial neurons are only ones and zeroes, and we need a huge number of them to represent even the simplest of these real-world decision. So, how did we get from pure theory and papers to actual models made by computers?
In a bit of an ironic twist, Turing himself never had the chance to witness his theory applied in real world. In 1954, only few months after he died, two computer scientists created a first working neural network in MIT computation center – one of the first computer labs in the world. They used a huge clunky IBM monstrosity to create a first neural network of 128 neurons.
Belmont Farley and Wesley Clark trained their network, which meant that they allowed the neurons to interact randomly, and create connections of differing strength depending on the number of interactions of individual neurons. While 128 is a pitiful sum, compared to billions of brain cells of your typical human, it did prove that the concept is sound, and their feat was almost unimaginable at the time.
Perhaps an interesting side note that would shed a bit more light on the character and importance of these early computer pioneers is the story of Wesley Clark. He was a physicist who played a major role in development of the first computers in 50’s and 60’s. He is also the creator of the first personal computers, and also bears a very special distinction. In his own words, he was “the only person to have been fired three times from MIT for insubordination.”
With this slight detour over, we should address an important question:
How do these neurons connect and interact between themselves?
Once again, it would probably be best to show this on an example of a network built from perceptrons.
Let’s say we have three different inputs that we send to a first layer of our network. Each of the neurons in that layer gathers all the inputs and sends an output signal that is based on its threshold and weights it had assigned for each signal.
These neurons are not connected between themselves, and each of them sends a single output, but it sends that same value to multiple neurons that are in the second layer. This means that each neuron in the second layer is actually receiving four inputs, one from each neuron from the first layer.
Once again, each neuron converts these inputs to an output based on its own rules and sends it to a single neuron in the output layer. This last neuron receives inputs and converts them once again to a definitive output we receive.
This way, we can take any number of inputs and create a very complex decision making process that gives us a single output as a final result.
The beauty of neural networks is that we can have any number of layers we desire, and any number of neurons in each of these, in theory. We are limited solely by the available processing power of our computers.
There is a fantastic site that has a mini-neural network that you can play around with. The network doesn’t do much on its own, but it is a great example of how you can adjust various parameters, and see how good a result you can get.
Backpropagation and deep learning
After a very promising start and interesting theories and propositions, the research into neural network hit a bit of a dead end at the end of the 60’s. After all, there was only so much that we could do with the limited computing power of the time and simple neuron model, such as perceptrons.
It took more than a decade to break that lull, and the renaissance during the 80’s was brought by three factors:
- Greatly increased computing power
- The use of backward propagation of errors, or backpropagation
- Much more data available for training, thanks to the internet
Backpropagation means that the neural network can adjust the weights of neuron relationships in order to achieve a better, more accurate output. Remember when we told that neural networks can only give an approximation of the solution you need?
Well, in order for that estimate to be as accurate as we need, we need to train the network first. If we observe how a modern neural network recognizes data, it is a bit similar to how a child would learn. For example, if we want to teach our network to recognize dogs in pictures, we would first have to supply data. A lot of data.
First we have to create a so-called training set of photos. These photos have tags, or descriptions of what’s on them provided to the network. The network then tries to guess what’s on the picture, and compares it’s answer to what the tag says. If it is correct, it moves on to the next picture, if it’s not it uses backpropagation and adjusts some of the relations until it gets it right. With large enough quantities of training photos, patterns emerge, and the network creates its own set of features that it uses to recognize the object. Of course this features get more specific as the network is fed with more photos, so the process is never really over.
So, if the neural network says it sees a truck, and it’s a picture of a dog, it would go backwards through its layers and tweak the thresholds, weights, and other parameters and try again, until it gets it right. The same process works for sounds, mathematical functions, anything really.
This process of adjusting through iterations is commonly known as deep learning, and it starts slow, with a lot of mistakes, but it becomes surprisingly accurate over time. In essence, the intelligence of neural networks is dependent almost solely on the amount of data we can feed it with. Lucky for us, these data sets we can use have grown exponentially over the years. Think about it:
Google images have tags and explanations that almost always go with them. Youtube videos and movies have subtitles that can be used for training speech recognition. In short, internet is the biggest training ground for neural networks.
What can we do with this technology?
The math behind the neural networks is actually quite complex, and nowadays there are numerous neuron models, far more advanced than the simple, outdated perceptron (most commonly used are different forms of the so-called sigmoid neurons). Rather than delve into that very complex thematic, we will focus on what we can do with this technology.
To borrow once again from Michael Nielsen’s book (which is a really great resource if you want to know more about neural networks):
”Deep networks have a hierarchical structure which makes them particularly well adapted to learn the hierarchies of knowledge that seem to be useful in solving real-world problems. Put more concretely, when attacking problems such as image recognition, it helps to use a system that understands not just individual pixels, but also increasingly more complex concepts: from edges to simple geometric shapes, all the way up through complex, multi-object scenes.”
For example, Google’s speech recognition software, Google Now had an error rate of 23% in 2013. Two years later that number dropped to 8%. Both Apple’s Siri and Microsoft’s Cortana use neural networks for their speech recognition with over 95% accuracy.
The second important neural network that Google uses can be found behind the Google Photos. This app can organize your photos and sort and find them based on pretty much any criteria imaginable. You can search photos on your phone for animals, cars, buildings, and so on. It can even recognize individual people, if it is given enough photos with that person!
New image recognition technologies allow Facebook and Cortana to recognize extremely minute detail. Facebook’s Deepview can recognize human face with 97% accuracy, which is better than most humans can do, in all honesty.
On the other hand, Skype does not want to be left behind in this new craze for neural networks, either. The new translate-on-the-fly feature is also made available due to neural networks. Besides all these internet giants, neural networks have found their way to thousands of different applications.
You have probably heard of the self-driving cars that google have been developing recently? Guess what is used for recognition and control.
Factories and industrial robots all over the world have been using artificial networks for decades now, even some air travel and medical companies use them for their booking and appointment systems.
It is believed that neural networks are a stepping stone to artificial intelligence, and trained neural networks can already defeat even best human chess and go players. The newest generation of online chatbots also uses neural networks, and who knows, maybe one of them will be able to pass the Turing test – a test in which a human has to recognize whether he is talking with a machine or another human.
Programing and math software Matlab has developed a neural network toolbox, that can be used to identify various types of visual patterns. Same goes for Wolfram Alpha, which published a similar software package that you can use either stand-alone or implement into a new program that you made.
Slowly but surely, neural networks are truly making our human and natural world familiar to computers. It might not even be that long we have to ask ourselves “do androids dream of electric sheep?”
Still a long way to go
Of course, the road is still quite long and bumpy for neural networks, and like children, they also make mistakes. When Google launched its newest chatbot that anyone could teach via twitter, it had to shut it down almost immediately, due to the crass language, and often disgusting verbal imagery people shared with it. Both the Google’s Photos and Flickr’s recognition system recently identified several black man as “gorillas” or “apes”.
Now, it is important to note that this is a very innocent mistake made by a software that is still far from perfect, but the outrage was so great that google dropped the gorilla tag completely to avoid future embarrassment. If anything, this is not so much a knock on neural networks, as much a commentary on possible inherent racism in used data sets, and content on the internet in general.
Some experts theorize that the internet content is skewed towards white audience, so there are far less pictures of black people, which in turn means that data set is much smaller and a margin of error much bigger.
But, there is a far simpler and less polarizing explanation. Neural networks are far from perfect. Unlike us, they focus on a much smaller number of specific characteristics to identify any object, than humans do. For example, the following picture shows patterns that we recognize as abstract nonsense, but neural networks were fooled to find objects that aren’t there:
What’s the next step?
One of the first impressive applications would obviously be autonomous vehicles and robots. And there are already very promising examples on both fields, in form of Google’s self-driving cars, various UAVs, and even robots that were made for DARPA challenge.
Besides creating bigger and better networks that we feed with more data, we can also improve their capabilities by combining several independent neural networks in one app. Another image recognition software, created by combined efforts from Google and Stanford University can identify not only objects in pictures, but the context of the photo, too.
They have used two neural networks together, of which one is responsible for semantics and context, while the other is in charge of object recognition. That way, the software can understand even complex concepts, like actions and emotions.
At the same time, medical applications are almost limitless. It might not be long before we see software that could identify tumors, and other diseases in images, greatly cutting time and effort needed by doctors to establish a reliable diagnosis.
Marketing campaigns could be led by artificial intelligence alone. There have already been several books written by computers, which some people even found enjoyable. It might be sooner than we think that the time will come when humans will be free from chores and hard labor. Time when we will be free to pursue matters that are more important to us, and spend more of our time with our families and loved ones.
To quell any rising fears, a time when these machines gain true sentience is extremely far away, if we ever manage to attain it. And even then, the chance of our robots and computers rising against us are extremely unlikely. Well, probably.
In the end, it would only be fitting to finish this article the same way it started, with an Alan Turing quote:
“Those who can imagine anything, can create the impossible.”