Introduction to Operant Conditioning

(this was originally published in NOTALAND in 2009, because the site is no longer operational I have reposted it here.)

Thorndike, Edward, (1911) Animal Intelligence, New York, The MacMillan Company

Thorndike presents his findings about intelligence with the results from his studies involving cats, dogs and chicks. Thorndike studies imitation, discovery and association using different methods for each. The first, and perhaps the most striking, of Thorndike’s included the use of a specialized box. The subject (either a cat or a dog) is placed into a box that requires the animal to execute a specific action in order to be rewarded with food. This action ranged from pushing buttons and levers, or biting and rubbing against strings. While the animals are able to accomplish the various tests Thorndike does not attribute this to intelligence. Instead he simply states that these actions are due to associations. Thorndike’s other experiments include instructing animals on how to accomplish a task (by physically moving their paw), seeing if animals learned to imitate an act after observing others, and finally evaluating if animals make connections between specific commands and rewards without first seeing the reward. Thorndike reached the conclusion that animals do not posses higher forms of reasoning. Thorndike then assumes that an animal’s “consciousness” is similar to the sensation we feel when we swim. We can feel the water, but, unless we make an effort, we do not “think” about water and it’s properties.

B. F. Skinner, Science and Human Behavior (New York: The Free Press, 1953).

Skinner gives us a brief history of the deconstruction of mystic processes to what he now titles Behaviorism. First he sites how historically there have been two entities, the mind and the body. This is followed by a reflection on how the mind is considered so complex that we feel complex methods must be used to assert how the mind works. Skinner goes on to illustrate how the works of Pavlov and Thorndike seem to indicate that the mind needs no special method of inquiry. Instead Skinner suggests that these earlier studies break every action down into a stimulus, some unseen process, and the resulting response. Skinner then argues that since the second part, the unseen process, does not provide us with any advantage, in terms of manipulating actions, it can be ignored. Thus Skinner postulates that a further study of stimulus and responses, with the aid of some motivator (a reinforcer), will yield great insights about the human mind.

The Artificially Intelligent Mouse

Following the theory of behaviorism we can now teach the artificial mouse to get to a certain spot on our board by rewarding it when it has reached the desired goal. Because we cannot give an artificial mouse a physical reward we must think of a similar way to motivate our mouse. In Thorndike’s famous puzzle box experiments Watson uses an animal’s hunger as motivation and provides food as a reward. We can model this by programming our mouse to seek out high numbers in much the same way that animals seek food. Keeping this same approach we can then reward the mouse with the number 100 once it has done what we wanted. In the same way that Animals narrowed down their actions, or as Thornlike said the associations get “stamped in”, our mouse will give a discount association to the spaces that are close, but not directly next, to the goal.

In our example we condition our mouse to reach the top left position of the board. The mouse will make random movements until it reaches the desired spot at which point we will give it a reward of 100. The mouse will be allowed as many trials as the users want and will continue to refine it’s path every iteration.

Try the example!

Where it embodies the theory of behaviorism:

The AI Mouse example Illustrates how ideas from behaviorism are used in the field of artificial intelligence. The same way that animals are thought to adopt behaviors that benefit them directly, the artificial mouse is programmed to seek out the optimal square on a given board. Following the same analogy animals are thought to associate rewards with the actions that preceded it, the artificial mouse also associates a reward from a new square with the square that preceded it. After a few runs we can then see how the mouse converges to an optimal solution.


Criticism of reinforcement being used as a way to replicate intelligence include Skinner’s own argument for the use of his method. Skinner stated that because we could not see the inner workings of the mind we could not successfully evaluate the processes that take place there. Skinner then argues that because it would all be speculation, we should avoid these internal workings and focus on what we can evaluate. In much the same way, this example successfully produce the results given the proper reinforcement but may not give us any insight into true, human, intelligence. It could be argued that even a real mouse has motivations other than hunger that drive it to look for food and that because we ignore the inner workings of the mouse’s mind we are not accurately interpreting the scenario.

Another critique of this approach is the fact that if we fail to give a proper reinforcement our artificial mouse would not stay in the square we desire. In the same way some have argued that actions learned using reinforcement do not persist once the reinforcement is no longer present.

Chomsky, Noam, (1959), Literary review of Verbal Behavior, Language, 35(1), 26-58

In Chomsky’s literary review of B.F. Skinner’s “Verbal Behavior” Chomsky analyzes Skinner’s claim that language is a system of stimulus/response pairs. Chomsky criticizes skinner’s definition of reinforcement as one that jumbles anything remotely related to acquisition and retention together. In the same way Chomsky argues that reinforcement is also ill defined in this context to that point, he argues, that they lose any objective meeting they might have ever had. Skinner’s definition of Stimulus is also considered too wide encompassing whoever is talking, the subject of the discussion, and background information. Chomsky then goes on to challenge the way that skinner measures the degree of responses. He gives the example of how the phrase “it’s beautiful” uttered in a low tone may carry just as much if not more weight than the same response said in a high pitch. Chomsky also suggests that Humans do things at random without any conceivable reinforcement. He then argues that due to this randomness the precise care and set up that reinforcement learning is suggested to need cannot be generated. Because of these ill-defined terms and seemingly unsound experiments, Chomsky concludes that it’s difficult not only to falsify skinners claims, but also to validate them.