What is a Neural Network?
A neural network, at its most basic level, can be thought of as a function that consists of an input, a transformation process, and an output. One helpful analogy for this is the Plinko game in The Price is Right. The game consists of a contestant dropping a chip at the top of a board which then bounces, seemingly randomly, through a field of pegs before landing in one of several slots.
What makes this analogy especially appropriate is that the relationship between the input (starting position of the chip) and the output (the slot at the bottom of the board) is completely obscured by the complexity of the pathway; there is no way to tell the exact path the chip will take based on the starting position. This is what is typically meant when a neural network is referred to as a black box.
While a process that produces random outputs is useful in a game of chance, it is less useful when trying to utilize machines for decision-making. The power of neural networks comes through the training process. This consists of minor adjustments to the transformation process based on repeated evaluations of inputs. To continue the Plinko analogy, this could be thought of as allowing the contestants to move the pegs. After dropping chips potentially millions of times and looking at the output distribution, the contestant would be allowed to make slight changes to the location each of the pegs and repeat the process. If the overall total winnings increased then they would move each of the pegs further in the same direction. If the overall total winnings decreased they would move them in the opposite direction. This process, known as back propagation, is repeated over and over until an optimized peg configuration is found. The distance that the pegs are allowed to be moved each iteration is called the learning rate, the number of chips we use each time is called the batch size, and each training iteration is called an epoch.
While it would be tempting just to create channels with the pegs to force the outcome that you want, this is what is known in the machine-learning world as overfitting. Because the data that we train on (the starting position of the chips that we use to optimize the pegs) is rarely the exact positions that we would see when we are actually playing the game, we want to make sure that the final configuration works universally rather than just for the positions that we use to train. What we sacrifice in perfection with our training accuracy we will make up for with a generalized model that will be more successful over time in different situations.
Up to this point we have only discussed the transformation process as an abstract concept. What it actually consists of is a specified number of layers (rows of pegs) each with a specified size (numbers of pegs per row). The actual composition of these layers is determined during the training process with only generalized guidelines as to how many or how large they should be.
Continuing to describe neural networks in more detail is beyond the scope of this overview in part because the technicalities of their implementation extends beyond the Plinko analogy. However, it was a useful tool for describing the process of overall process works.