Intelligent Systems And Their Societies Walter Fritz

Selecting the Action

 

Finding Response Rules In Memory
The GL (General Learner) chooses a response, appropriate to the present situation, by selecting a response rule from its memory. At the start, no response rules exist in memory. A previous program, used random responses and curiosity (an instinctive, preprogrammed, response rule) when it could not find an applicable response rule. But the GL, in this case, does nothing and advises that it could not find a response.

Let's look at the case when response rules do exist in memory. Not all of them are applicable to the present situation. So, first the GL makes up a short list of all those response rules that have in their situation part some concept that also exists in the present situation. The response rules of this short list are not all applicable to the present situation to the same degree.

 

Evaluating a Response Rules
Let's assume that the present situation is a sentence typed by the person: "Draw a long line". If there are response rules that have their situation part identical to the present situation, then they receive the highest value; the sum of all the positive values of the situation part of the response rule. For instance, we may have a response rule like this: "draw a long line" -> (GL draws a long line), with values 20,20,20,20 (one value for each concept). This response rule covers all the concepts of the present situation and thus it is very good. Its total value is equal to the sum of all the values, or 80.

However, finding such a perfect response rule is not a common case. The present situation may have many parts which were not met before and which may be unimportant. So a response rule with such a situation part that all its concepts exist in the present situation, is eminently applicable and gets a high value; the same as in the previous case but less some value due to concepts of the present situation that are not covered by the response rule.

Consider, for instance, if GL also found the following response rule: "draw a line" -> (GL draws a vertical line), with values 20,20,20. If the GL finds no better response rule, this response rule is taken as applicable -- it does cover the present situation. Naturally, it is not as good as the response rule that started with "draw a long line". GL understands this and creates an adjusted total value for this almost perfect response rule. This value is obtained by summing the values of the response rule, 60, and then reducing this number by some programmed factor, for instance 10, because it did not have "long" in the situation side of the response rule. This results in a adjusted total value of 50.

 

Response Rules Containing Negative Concept Values
In all cases, if there is a concept in the response rule that has a negative value, and that concept also exists in the present situation, then the total value for that response rule is calculated as the sum of the positive values minus the amount of the negative value. Thus, GL subtracts the value of the value of that concept from the total value. Such a negative concept value usually means that the GL would not use the response rule if this concept exists in the present situation. However, it is still possible that this response rule will be chosen for use especially if the rest of the rule turns out to be significantly better than what any other response rule can provide.

This is important if GL finds a third response rule of: "draw a long inclined line" -> (GL draws of a short horizontal line), with values 20, 20, -30, -20, 20 for each concept. (Such a response rule can arise from the application of "draw a line" to the situation "draw a long inclined line".) The negative values indicate that the GL should not use them for a present situation in which "long" or/and "inclined" occur. Thus, the total value for this response rule is the sum of positive values existing in the present situation, namely 60, less the negative value of the concept for "long" giving a total of 30. We do NOT include the -20 in this calculation (and wouldn't even if it were positive) because our present situation -- a sentence typed by the person: "Draw a long line" -- does not include the concept "inclined".

Note: In the examples above we show present situations made up of words. We selected these examples because they are easy to represent in writing. For response rules used in playing games, both the situation and the response of the response rule would instead consist of concepts that referred to drawings.

 

Choosing From a List of Applicable Response Rules
GL now has a short list of applicable response rules that it could use for acting. These rules, ordered by their total values are:

Response Rule (starting with) Total Value
"draw a long line" 80
"draw a line" 50
"draw a long inclined line" 30

From this evaluated list, the GL does not always choose the response rule of the highest total value. Researchers have shown, that doing so would limit GL's learning. This is because once the program chooses such a response rule, it reinforces it and never looks for another response rule, thus potentially missing a new one that may be even better. But how can this be?

Giving values to the concepts of a response rule is the result of experience; so, in a partially novel present situation, the total value may not reflect the true value of a response rule. Therefore, GL chooses from the list by evaluated chance. This means that it chooses "randomly" any rule from the list, but more often those with higher total value.

 

Adjusting the Response Part
Once the GL has chosen a response rule, it may adjust the response part by using rule patterns. The GL creates these rule patterns during periods of external inactivity, during the "sleep" period. If some concepts exist in the present situation that the GL does not find in the concepts of the situation side of the selected response rule, it then looks up rule patterns applicable to these missing concepts and adjusts the response accordingly.

For example, suppose we have a response rule: "draw a line" -> "(GL draws a line)". But the present situation is: "draw a long line". The response rule is applicable. All three concepts of the concept side of the response rule exist in the present situation ("draw", "a", "line"). However, the concept "long" does not exist in the response rule. The GL looks for a rule pattern to make the line "long". If found, the GL applies it, modifying the response side of the response rule, and thus creating a new response rule. This new response rule will be added to all the other rules in memory but it will NOT replace the old rule -- GL also maintains the old response rule.

For continuous reading, like a book - continue here
Jump to the top of this document /General Learner /Artificial IS /How to Increase Our Intelligence / e-book Contents.


Last Edited 11 April 2013 / Walter Fritz
Copyright © New Horizons Press