Artificial Intelligence Depot
Visiting guest. Why not sign in?
News, knowledge and discussion for the AI enthusiast.
FEATURES COMMUNITY KNOWLEDGE SEARCH  
Developers, get involved!
The AI Foundry promotes the development of Open Source artificial intelligence projects. Why not join it now? All projects welcome extra help!
Visit the AI Foundry
Simple visual classification
Training a robot to visually recognise a limited number of objects
 
Simple visual classification

I've devised a system whereby the Rodney robot can visually identify a small number of objects, and give verbal pronouncements of what it sees.

Initially, I tried using a Kohonen-style SOM but this proved to be rather limited since it was difficult to encourage the system to reach any sort of stable classifications. Instead I've abandoned the SOM idea and used an even simpler GOFAI-style system similar to a binary tree. Each captured image is compared against a database and if it is sufficiently novel then it is added into the database as a new exemplar. Images within the database are organised into a pyramid structure of progressively lower sampling rates. The bottom level of the pyramid corresponds to the actual images observed (30x30 pixels). The next layer contains images of half the size (15x15), and so on. The higher you go up the pyramid, the more abstract the type of visual representation.

The system is trained explicitly by hand, with me examining the low level exemplars and assigning object names to them. To a limited extent the system is able to infer the properties of unclassified exemplars by performing a search up the classifications pyramid in classic GOFAI (almost chess-like) style. I trained the system under a variety of lighting conditions, and once the system had gathered around 700 exemplars it stopped gathering any further information (the feature space being adequetely covered). I then added a speech facility where the robot pronounces whatever type of object it sees.

At present the robot has a limited visual vocabulary of "bob", "face", "person", "picture", "telly" and "cup of tea". Whether this sort of system would scale up to a larger vocabulary is an open question, but even with this limited set it is possible to start to give the robot some degree of social interaction.

The robot already has a simple behavior arbitration scheme based on the levels of various stimuli such as average illumination and level of visual motion. Using the classification system a new "social" stimulus can be added into the system such that when the robot detects the "bob" or "face" objects it increments its level of social stimulus. If the "social" stimulus level rises above a certain amount the robot issues words such as "happy" or "good". If social objects are absent for a period of time the robot says "I'm lonely". Concievably, a reinforcement learning scheme could be used in order to try to maximise the robots level of social stimulus over time. The robot would need to learn to behave in some way which facilitated continued social interaction with a human teacher.

- Bob

136 posts.
Sunday 16 December, 12:05
Reply
Simple is Beautiful

It's interesting how the SOM approach fails for such a practical application. How long did it take you to code-up, and realise this?

If you're going to go down the social learning way, you could get Rodney to train itself when you tell it what it sees. For example, it could look at you and say "person", so you'd correct it getting it to associate "bob" with the current image. How well would your binary tree like approach work for incremental learning?

Then, this opens up a whole new area for voice recognition... (just trying to keep you busy ;)

935 posts.
Tuesday 18 December, 20:44
Reply
Classifier systems

The SOM idea didn't necessarily fail. In fact the pyramid system that I'm using instead is similar to a SOM in that input images are classified accoring to their euclidean distance from a set of stored examples. The main difference is that where the dimensions of the traditional SOM remain fixed the pyramid system expands to conver the feature space.

In my training example the pyramid system eventually reached a feature space size of around 700 exemplars, whereas the SOM that I was using before only used 10x10 = 100 possible features. To some extent this explains why the SOM never reached any stable classifications, but even if the map dimensions had been larger this would not have got around the problem of classifications shifting around unpredictably over the two dimensional surface of the SOM. The pyramid system is also slightly more unconventional in that it uses a recursive chess-like search through the feature space.

As for the incremental learning system that you describe it would be possible to do this provided that the speech recognition was accurate, or at least if there were an accurate way of recording speech waveforms for analysis. This is only usually possible with current technology if the microphone is very close to your mouth. A while ago I hunted around for the ideal speech recognition gadget - a radio mic - but couldn't find anyone selling them.

The main problem with the pyramid classifier at the moment is that like an elephant it never forgets anything, so that the feature space just keeps on increasing depending on the robots experience. I'll need to include some system whereby exemplars which are infrequently used, or which are not associated with important events, are selectively removed over time.

The amount of time taken to code the SOM was almost zero, since I wrote it more than a year ago and it's on my web site. The pyramid system too most of Sunday morning to code.

- Bob

136 posts.
Wednesday 19 December, 02:44
Reply
Training and Memory

Still, you could just use a console and text based interaction for initial training. Then you could move to voice recognition when it works, if you have time ;)

As for forgetting stuff, my current work is leaning that way. I've got my bot learn the level, but I want it to forget parts of it when it's not been visited for a while. I'm going to have a memory class, which handles 'forgetting' when it's not recalled after a while. The more you access it, the less likely it is of forgetting. One thing that I'll have to expand upon, is the heuristic for forgetting details - so the implementation doesn't collapse due to crucial information being forgotten. Also, I'll need to rearrange the level representation based on the information forgotten.

I'm not sure how much of this is applicable to your problem, but it's something to think about.

935 posts.
Wednesday 26 December, 19:46
Reply
Forgetting

That's a good point. For a visual memory it's probably more important to forget specific details of an image first, leaving an increasingly vague impression. That sort of system would be an efficient use of resources.

- Bob

136 posts.
Sunday 30 December, 06:53
Reply