"Learning is not attained by chance, it must be sought for with ardor
and attended to with diligence."Abigail Adams, Letter to John Quincy Adams, May 8, 1780
"The nervous system is organized (or organizes itself) so that it computes a stable reality. This postulate stipulates 'autonomy', i.e. 'self-regulation', for every living organism"Heinz von Foerster, On Constructing a Reality, 1973
If we desire to build an intelligent machine then what better way to start than by imitating the human mind, evolution's most intelligent species. Is this possible ? Well to determine if it is we must first understand what it is we are trying to imitate.
The human brain contains around 10 billion neurons. Each of these has perhaps 10,000 connections on average to other neurons, both incoming (via dendrites) and outgoing (via axons). The connections can be local (adjacent neurons) or distant (either in adjacent layers of the brain or further afield). We have what is surely the most massively connected network as yet known to man.
Traditionally the neuron has been regarded as simply a switch of some sort, giving an output for a particular combination of inputs - very like a computer logic gate. This view is however very wrong. Recent research has shown that neurons perform considerable processing in both space and time, the neuron output is the result of a vast computation, perhaps equivalent to one of our own supercomputers. A neuron is itself a cell, each of which we now believe to contain microtubule computers (thousands per cell, each operating at perhaps 10 million cycles per second).
It is from a background of such complexity that Neural Networks have been derived. Can we abstract any useful features from the brain that we can perhaps employ within our limited mechanical systems ? Well, the general idea of connecting units together such that the end result depends on the unidirectional interactions of the units, and not on their actual construction, is certainly one. We also notice that all parts of our neural network operate simultaneously (parallelism).
The aspect of the human brain that is of most importance however is its ability to learn. How is that possible ? We do not have the full answer to that question, but do know that the neural connections in our own brains change as a result of learning, both in strength and in connectivity - and this we can imitate. To make progress however we must simplify the problem, and so usually we treat the artificial neuron as a more or less simple switch, with interconnections whose strength can be varied at will (a sort of fuzzy logic).
In principle, given enough complexity, it is possible to create a Universal Computer with a neural network architecture, but this is well beyond our current abilities, so let us start with something simple. Any computation requires an input, a process and an output. This three stage design can be emulated by having a set of input neurons (connected to a sensing device), these in turn connected to a set or sets of (hidden) neurons to process the inputs, which are themselves connected to a set of output neurons (driving a display device). Each set of neurons is called a layer. The number of neurons used for each layer, their interconnections and the number of layers optimum for any particular task is subject to much debate, both theoretical and practical.
Given a suitable arrangement of artificial neurons we then have to teach the network to perform a task. How can we do that ? Several techniques have been suggested, broadly grouped into two classes. The first assumes that we know what the result should be (like a teacher instructing a pupil). In this case we can present the input, check what the output shows and then adjust the strengths/connections until the correct output is given. This can be repeated with all available test inputs until the network gets as close to error free as possible. Because we are correcting errors from the output back through the network, this type of technique is known as back-propagation (also as feedforward - its normal recognition mode). Many other forms of the technique are also used, with varying degrees of support and success. These supervised learning methods use manual reinforcement (strengthening of correct connections, weakening of poor ones) but are slow to train and have many other drawbacks, including inability to innovate (go beyond what is known).
An intermediate type of Neural Network is the Hopfield (or recurrent) network. In these systems the output is connected back to the input in a loop, and this automatic feedback allows the network to iterate or hunt for a solution. In other words we have a convergence to an attractor (usually one of many) for any particular input value. The available output values are however a fixed feature of the network and their determination is a difficult configuration problem.
The complexity of our own brains means that we can achieve multiple categorisation, we recognise many aspects of any object at the same time. As yet Neural Network systems are very limited in comparison, but simple network structures are known to have the ability to self-organise. The second class of techniques make use of this idea. This type of unsupervised learning mimics the more interesting aspects of human behaviour, our ability to learn for ourselves, to add one and one and make three. In these cases we need the network to recognise features of the input data itself (categorise it) and to display its findings in some way as to be of use (which may include movement or other actions). This is a much more demanding task.
Kohonen developed an algorithm (the Self-Organizing Map or SOM) to mimic the brain's ability to self organise and this forms the basis of most types of self-learning Neural Network. In this method arrays of data (initially random) are compared to the input signal and the closest match found adjusted slightly to improve the fit. This is repeated for all input options, gradually leading to the network weights converging upon the set of input options encountered. The network learns what things are out there by experience alone. The features recognised may, however, not be those expected by humans...
For all these techniques (and variants that add concepts from Genetic Algorithms or Fuzzy Logic) we have systems relying on probabilistic matching. That is, we cannot be certain of the results we obtain, each result is merely more probable than the alternatives, the system just chooses that result with the highest likelihood. That may seem a drawback, compared to the more mathematically exact computers with which we are familiar, yet almost certainly relates more closely to the actual workings of our own brains. Data in real life is noisy, it does not come in fixed categories. This means that we can make multiple generalisations about the same data - we can find several alternative patterns. Which then does our network discover ?
It is here that we face a difficulty with the simplified systems we are able to build. Our assumptions about where to start bias the network architecture considerably. We generally desire a particular result, so the network is built for a reason - perhaps to recognise speech say, and we structure it in a relatively fixed way that we feel appropriate for the task. Within a limited timescale, we are unable to try out many patterns, so lose the advantages of evolved architecture that characterises our own brains.
Any real solution space is likely to be divided into many solutions (attractors), surrounded by attractor basins leading towards them. The solutions themselves will vary in effectiveness, many will be relatively poor (what we call local minima) yet our attempts at a problem may well converge on such a solution. The solution we actually want (the global minima) may be unreachable from our starting position, unfortunately that is something we cannot know at the time.
One solution to this conundrum is to merge the techniques of Genetic Algorithms (GAs) and Neural Networks. By perturbing our solution by random changes (mimicking the GA processes of crossover and mutation) we can jump perhaps to a different attractor basin - maybe locating the one leading to the globally optimum solution.
Do our brains operate in that way ? Perhaps. Certainly at birth the wiring of our neural network is very fluid. We seem however to operate in a somewhat different way, having overconnected networks initially and pruning (dissolving) connections that prove ineffective - natural selection of advantageous neurons...