Symbolic Logic:Learning:Symbolic Replacement

Data compression implies learning. A simple mechanism for learning is symbolic replacement. In other words, replacing a string of symbols with a single symbol. If the string of symbols occurs enough times the resulting representation of the sequence of symbols will be compressed (need less information to describe it). This compresion implies that some knowledge has gained about the structure of the data.

Replacement rules by themselves have little ability to represent semantic knowledge. Data compression needs to be applied based on rules that are cabable of describing any statement about the data. Second order replacement is the application of replacement to obtain a family of rules from an original rule. It turns replacements into a Turing Complete language.

Second order replacement implements the representation of facts by symbols, and implements the ideas of a naive mathematics. Learning second order replacement rules naturally leads to mathematics.

Simple learning algorithms may obtain rules from data. Algorithms that look for repeated sequences give simple replacement rules.

These simple rules may be compared using the diff algorithm. The diff algorithm compares sequences and finds sub-sequences that are the same and sub sequences that are different. The sub sequences are that are the same form the Longest common sub-sequence problem.

The subsequences that are different may be represented by multiple replacement rules for the same symbol. Symbols with this kind of rule set behave like sets.

Further analysis of the sets then leads to rules that behave like variables. A variable is implemented by a symbol that is replaced with a set symbol.

Language of Replacement
We would like the computer to learn the rules that describe the behaviour of the world. Before learning can take place there needs to be a rule language in which the rules can be written. Mathematics is such a language, but the mathematical language is already high level. Learning based on a lower level primitive language would allow the learning system to construct the rules that describe mathematics.

Mathematics is based on variables. How do you define what a variable is? Naively a variable is a symbol that is always replaced by the same value. Take for example the commutative law.


 * A + B = B + A

means


 * 1 + 2 = 2+ 1
 * 13 + 14 = 14 + 13

In the above examples the A and B symbols have been replaced. In one equations implementation, the two As and the two Bs are replaced by the same symbol.

Backus–Naur Form
Backus–Naur Form (BNF) uses replacement rules to define syntax. Statement in BNF define replacements for individual symbols by strings of symbols. Sentences in the language are stringss of "terminal" symbols. Non-terminal symbols are symbols to be replaced in generating the sentences in the language. To generate a sentence in the language, start with a non terminal symbol representing the language (or set of strings in the language) and apply replacements until all "terminal" symbols are replaced. This process generates a sentence in the language. For example, the non-terminal symbol is defined by,

So starting with we can apply replacements to generate strings that look like numbers.

Second Order Replacement
The same symbol defined in BNF may be replaced by different values. Second order replacement is an addition to BNF that gives it the power to define variables. A second order replacement is a replacement rule applied to a replacement rule pattern.

A replacement rule pattern is written like,

A replacement rule pattern may only be applied as a replacement after all non terminal symbols to the right of --> have been replaced. The replacement rule patterns in the example may be transformed into a specific replacement rules,

The commutative law of addition may be represented by,

By applying the replacements this becomes,

which is one example of the commutative law.

Local symbols
When defining rules it is useful to group rules and restrict the scope of symbols. Curly brackets group statements, while the @ symbol indicates that the symbol following the @ is local to the group of statements. This means that the same name used inside the group of statements identifies a different symbol outside the group of statements.

In the above examples the  symbol in the group of statements for "commutative law of addition" is a different symbol from the  symbol in the group of statements for "commutative law of multiplication".

You can think of the application of a group of statements as taking a copy of the group of statements. Then, before using "-->" statements replace all non terminal symbols on the right of "-->", which reduces the statement down to a simple "->" replacement.

Rules of second order replacement

 * 1) A replacement rule pattern may not be applied until all non terminal symbols on the right of "-->" have been replaced.
 * 2) A name for a symbol, which is declared local in a group of statements, identifies a different symbol outside the group of statements.

Learning
A learning system does not have to be perfect to be useful. The goal here is to give a series of simple transformation laws which, by repeated application, allow syntax and logic to be built up. There may be more data than processing power, so the goal is to have simple rules that extract data efficiently, without attempting to be perfect in the data extraction.

Choose the application of these rules, so as maximise the reduction of the total information content.

Repeating symbols
When a string of symbols is repeated, create a new symbol to represent the string of symbols.

cat cat cat

cat is repeated, and the above string may be represented by,

Suffix Trees may be used to efficiently find repeated strings of symbols.

Identify sets of symbols
There is a problem here. There seems to be very little information available about syntax in strings. So even if we had a perfect learning system there is not much information to be gleened. Also learning is computationally expensive. Back to the drawing board :(

The suffix trees find repeated strings of symbols, and also identify the set of symbols that come after a symbol. These sets form the basis for analyzing the syntax of the language.

The computer does not know that 0..9 are digits and A..Z are alphabetic unless we add extra information at the start. Without this information only the sets of symbols that occur after or before a symbol give information about syntax. For example take the syntax of an arithmetic expression,

For example,

556*384+64+(7 + myvar)

Where sets of symbols overlap, component sets formed out of the difference between sets form the basis sets.

An element of a set is an instance, but also a member of a set, and a member of a set that contains it. Each instance has a multi-level identity. So "0" is a digit which is a alpha-numeric character, which is an ASCII character.

Each symbol in a set is known by its hierarchy. Objects are also known by their hierarchy, down to the level of classification of the object instance.

Each object is an instance of every set to which it belongs.

Merging strings to create second order replacements
Where two strings may be made the same, by replacement of a symbol, create a replacement pattern for that symbol.

For example take the string, xqyxxx aqbaaa

If we replace x by a and y by b, xyxxx is transformed into abaaa. In this situation replacement rules may constructed to represent both strings as a pattern of symbols,

The rules are,

then the original string xqyxxx aqbaaa

may be represented as, xy ab

We can see this because the above rule group may be expanded as,

or as,

so,

Detailed algorithm
In comparing two strings of symbols, for each position compare the symbols.
 * If the symbols are different,
 * If there is no existing replacement for the symbol create a replacement rule.
 * The merged symbol is the relacement symbol.
 * If the symbols are the same the merged symbol is the symbol.

Counting
Consider the string

kkkk hhhh ddd ccc

Merging may be applied to "kkkk hhhh"

so,

Similarly merging may be applied to "ddd ccc".

So,

The rule for <4> will be simplified to,

So the computer is well on the way to learning to count.

Links

 * Symbolic Logic:Learning
 * Intelligence and Reasoning