Simple visual classification
I've devised a system whereby the Rodney robot can visually identify a small number of objects, and give verbal pronouncements of what it sees.
Initially, I tried using a Kohonen-style SOM but this proved to be rather limited since it was difficult to encourage the system to reach any sort of stable classifications. Instead I've abandoned the SOM idea and used an even simpler GOFAI-style system similar to a binary tree. Each captured image is compared against a database and if it is sufficiently novel then it is added into the database as a new exemplar. Images within the database are organised into a pyramid structure of progressively lower sampling rates. The bottom level of the pyramid corresponds to the actual images observed (30x30 pixels). The next layer contains images of half the size (15x15), and so on. The higher you go up the pyramid, the more abstract the type of visual representation.
The system is trained explicitly by hand, with me examining the low level exemplars and assigning object names to them. To a limited extent the system is able to infer the properties of unclassified exemplars by performing a search up the classifications pyramid in classic GOFAI (almost chess-like) style. I trained the system under a variety of lighting conditions, and once the system had gathered around 700 exemplars it stopped gathering any further information (the feature space being adequetely covered). I then added a speech facility where the robot pronounces whatever type of object it sees.
At present the robot has a limited visual vocabulary of "bob", "face", "person", "picture", "telly" and "cup of tea". Whether this sort of system would scale up to a larger vocabulary is an open question, but even with this limited set it is possible to start to give the robot some degree of social interaction.
The robot already has a simple behavior arbitration scheme based on the levels of various stimuli such as average illumination and level of visual motion. Using the classification system a new "social" stimulus can be added into the system such that when the robot detects the "bob" or "face" objects it increments its level of social stimulus. If the "social" stimulus level rises above a certain amount the robot issues words such as "happy" or "good". If social objects are absent for a period of time the robot says "I'm lonely". Concievably, a reinforcement learning scheme could be used in order to try to maximise the robots level of social stimulus over time. The robot would need to learn to behave in some way which facilitated continued social interaction with a human teacher.
- Bob
|