Intelligent Systems And Their Societies Walter Fritz

Acting and Reinforcement


The GL decomposes all parts of the response into concrete concepts. If a concept has a link to its concrete examples, it replaces it by any of its concrete concepts. If a concept has a link to its parts, it replaces it by all of its parts. After this, the response is composed only of elementary concepts; in the GL these are characters, straight lines, and curves. Finally, the GL does the response, changing the screen window and the text window.


Once the GL has finished performing a response, its human operator may wish to express approval or disapproval of GL's action. A strong disapproval, which is coded and transmitted to GL by entering several down arrows on the keyboard, will result in a reversal of the action.

During a period of external inactivity, the sleep period, the GL creates new response rules. If a new rule later gets used correctly, it may receive an approval from GL's operator. This approval results in the increase of the value of each of the concepts on the situation side of the response rule and also of any rule patterns that GL used.

However, this new rule may cause an action that the operator considers inappropriate. The person may then expresses disapproval by typing the "down arrow". In this case, the GL assumes that the response rule is incorrect or incorrectly applied. This results in the reduction of the rule's positive values. After this it updates the response rule by adding (with a negative value) the concept(s) that the rule is not good at. (This it does by finding those concepts that exist in the present_situation but not in the situation side of the response rule.)


Corrective Reinforcement Example
Suppose in a game of tic-tac-toe the GL program decides to draw a cross and that this action was learned in a previous situation where this response was appropriate. To the understanding of GL, the present situation is similar to the one where it learned to draw a cross, so its decision to draw another cross seems reasonable. To us, however, this is an inappropriate choice because a circle already exists where GL wants to draw its cross. GL draws the cross over the circle and its opponent indicates disapproval.

In the terms and representations of GL:

Suppose the present situation was:
10235, 10122, 10544, 11333    
the response rule (situation -> response) was:
10235, 10122, 10544  ->  10111
and the corresponding values were:
 15     15     15             

After disapproval the response rule
would be changed to look like this:
10235, 10122, 10544 , 11333  ->  10111
15 15 15 -45

This has the effect that in a future "present situation" where the concept "11333" (a circle at a certain place) is present, this response rule would have an additional value disadvantage of 45. It is, therefore, far less likely that the IS (GL) will choose it when it has any other options. This shows how the mechanisms of GL make the future "inappropriate" use of this response rule for the above present situations, less likely.


Reinforcement Propagation
The GL uses reinforcement for more than just increasing the accuracy of its response rules' applications. It will also appropriately increase or reduce the values of any rule patterns that the GL used in creating the now reinforced "new rule". And even further, the GL will also back propagate this approval or disapproval on up through the ladder of previously used response rules -- and with each step, it uses a much smaller effect on the values. This propagation continues until the magnitude of these value adjustments becomes insignificant.

For continuous reading, like a book - continue here
Jump to the top of this document /General Learner /Artificial IS /e-book Contents.

Last Edited 11 April 2013 / Walter Fritz
Copyright © New Horizons Press