Symbolic Logic:Learning:Hierarchical Object Classification

Objects and things may be classified to predict behaviour. This is an example of the compresion of data, leading to inductive inference.

This is described as Hierarchical Cluster Analysis, except that the approach generally described differs from what I describe here.

The problem
An object or thing may be described by a tuple of data giving attributes of that thing. Some attributes may be missing.

A set of things is described by a set of tuples.

We wish to classify objects into a class inheritance hierarchy so that we can predict attributes we dont yet know about an object.

If you have a set of tuples of attributes, each tuple representing an object or thing, then you could classify the things into an inheritance tree.

Minimizing complexity
The representation of a data set may be compressed by organising the information into an inheritance tree. The amount of compression indicates the predictive power of the inheritance tree.

Computational Complexity
The amount of processing required to create the inheritance tree may be large. At the expense of accuracy we would like the inheritance tree to be added to incrementally, and the time taken to add an element to the tree to be of order Log N.

The Complexity Tree
The complexity (information content) of a tree is the sum of the complexity of each branch of the tree. In calculating the effect on the complexity of the model of adding a tuple under a node, the complexity of the branch alone needs to be considered.

The Model
Let an InheritanceClassificationTree has a multiway tree made up of treenodes.
 * TreeNode m_RootTreeNode

Each TreeNode has attributes
 * Tuple nodeTuple
 * List childList

A ValueTuple is a structure of named attribute values. Each value may be missing or have a value of a particular type.

A ClassificationTuple has the samed named attributes. However each attribute is described by a ClassificationAttribute.


 * ClassificationAttribute has attributes
 * bool missing
 * bool value
 * ClassificationAttribute has attributes
 * bool missing
 * bool set<(string, frequency)>
 * bool number of values
 * ClassificationAttribute has attributes
 * Attributes
 * bool missing
 * bool number of values (for non missing values in immediate child nodes only).
 * bool sum of values
 * bool sum of squares of values
 * Methods
 * From these values the average and standard deviation may be computed.

Both ValueTuple and ClassificationTuple inherit from Tuple.

Calculating Complexity
We wish to calculate the change in complexity when a new tuple is added to a tree node.

Adding a tuple adds the data for the tuple to this node, but encodes it using the tree node. The complexity depends on the complexity.

Adding a tuple also changes classification attributes.

Encoding Boolean Value
If the boolean value is not missing in the classification attribute then it must be missing or have the same value attribute.

Encoding String Value
If the string value is not missing in then it is encoded using log of the relative frequency of the string.

Encoding Number Value
If the string value is not missing in then its information content is approximated as the standard deviation * number of values.

The Algorithm
InheritanceClassificationTree.ClassifyTuple(Tuple newTuple) is
 * if rootTreeNode is null then
 * rootTreeNode = new TreeNode(newTuple)
 * else
 * rootTreeNode.AddTuple(newTuple).

TreeNode Constructor
Set up the tree node with a classification tuple, and a value tuple. TreeNode.TreeNode(Tuple newTuple)
 * classificationTuple = new ClassificationTuple
 * classificationTuple.UpdateClassification(newTuple)
 * AddList(newTuple)

TreeNode.AddTuple
Add a tuple to a tree node. Get back the changes to the classification tuple, for use in updating the classification of any parent node.

void TreeNode.AddTuple(Tuple newTuple, Tuple oldClassificationTuple, Tuple newClassificationTuple) is
 * oldClassificationTuple = nodeTuple.Clone
 * let bestFitNode = FindBestFitNode(newTuple)
 * if bestFitNode
 * bestFitNode.AddTuple(newTuple, oldReclassificationTuple, newReclassificationTuple)
 * nodeTuple.UpdateClassification(oldReclassificationTuple, -1) // remove effect of old values
 * nodeTuple.UpdateClassification(newReclassificationTuple, 1) // add effect of new values
 * else if push child node down condition // More work needed here.
 * push child node down
 * else
 * nodeTuple.UpdateClassification(newTuple, 1)
 * AddList(new TreeNode(newTuple))
 * newClassificationTuple = nodeTuple

TreeNode.FindBestFit
Find the child node whose classification node is most similar to the tuple. This is the node which encodes the tuple with the least information. Consider the effect on the encoding of the whole sub tree in determining how much extra information is required to encode this tuple.

The tuple should only be considered for adding to a child to the tree if the increase in information size is less than the raw un-encoded size of the tuple.

TreeNode TreeNode.FindBestFit(Tuple newTuple)
 * bextFit = null
 * bestFitInfoDeltaSize = infinity
 * for each childNode
 * childInfoDeltaSize = childNode.GetDeltaSize(newTuple)
 * if childInfoDeltaSize < newTuple.RawSize and childInfoDeltaSize < bestFitInfoSize
 * bextFit = childNode
 * bestFitInfoSize = childInfoDeltaSize
 * return bestFit

TreeNode.GetNewSize
Calculate the change in the information size that would result from adding the tuple.

double TreeNode.GetDeltaSize(Tuple newTuple)
 * newClassification = classificationTupe.Clone
 * newClassification.UpdateClassification(newTuple)
 * return newClassification.GetTotalInfoSize - classificationTupe.GetTotalInfoSize

ClassificationTuple.GetTotalInfoSize
Calculate the amount of information required to encode a tuple under a classification tuple.

double ClassificationTuple.GetTotalInfoSize
 * totalInfoSize = 0
 * For each attribute
 * totalInfoSize += attribute.GetTotalSize
 * ret totalInfoSize

ClassificationTuple.UpdateClassification
Update the tuple classification to include this tuple

double ClassificationTuple.Encode(Tuple newTuple)
 * For each attribute
 * attribute.UpdateClassification(newTuple.attribute)