1. Monday, July 9th, 2007
We have been working for the last week on setting up an experiment using pyrobot that looks into combining error anticipation and Simple Recurrent Networks.1.1. Ideas
We were talking about the kind of staging observed in the IAC experiments, and I (Xaq) remembered that the experiments with error anticipation (EA) showed that a network trying to anticipate it's own error was able to map patterns that it couldn't learn into a single point. I imagined a robot moving around it's environment trying to predict its sensory state, while using EA. If there were things the robot couldn't learn, maybe the EA would help it segregate those patterns into a separate area of hidden representation space. Then maybe a more powerful network could take over and try to learn those patterns, or the network would somehow augment itself to be more able.1.2. The Environment
The environment we decided to set up to explore these ideas is relatively simple. One robot, who we call the Watcher is stationary in the corner of an empty room, looking out into the room. Another robot, called the Mover, is equipped with several different behaviors that move it around to room. The Watcher will watch the Mover and try to predict its movement.1.2.1. The Mover
The behaviors that the Mover has right now are 'Do Nothing', 'Loop', 'Pace', 'Wander', 'Bounce', 'Stagger', and 'Teleport'.'Do Nothing' does nothing.
'Loop' crosses the room until it gets to the other end, then resets the pose of the Mover to the other end of the room, so the Mover is always moving in the same direction across the Watcher's field of vision.
'Pace' moves back and forth across the room.
'Wander' turns less and increases speed the closer the longest range sensor is to the front of the robot. 'Wander' was originally designed to explore a room, but the lack of features in the room cause to just circle around the middle.
'Bounce' drives forward at full speed except when it nears a wall in which case it turns to avoid it.
'Stagger' starts with zero translation and rotation and randomly increments them.
'Teleport uses the simulation object to randomly change the pose of the robot on each step.
These behaviors are designed to be increasingly non-linear in order to make learning each behavior an increasinly difficult task for the Watcher
1.2.2. The Watcher
The Watcher is situated in the lower left corner of the room, oriented directly toward the opposite corner. It is equipped with a camera. It's task is to predict the camera image it will see on the current time step given a previous image. The previous image is set as the input of a neural network equipped with a context layer and a EA layer.1.3. So Far...
Both brains are written. The Watcher needs a little bit more work, mostly for data collection machinery.edit: I've attatched the files so you can play with them: EARoom.py MoverBrain.py WatcherB3.py
2. Tuesday, July 10, 2007
After posting yesterday afternoon we let two instances of the environment run overnight. Both Watchers were observing the 'Pace' behavior. One had 10 hidden units and one had 20. The one with 20 hidden units showed significantly improved learning over the one with 10, which we had been using up to this point. We will be using 20 hidden units from now on.Once we had the robot trained sufficiently, we tested it out by manually moving the Mover robot. It didn't always make the correct prediction, but it did make a prediction in the right direction. It was sensitive to previous movement, not just the position.
The next steps are to continue testing the Watcher on other behaviors, and to implement a slightly different environment to study hidden representations.
In the new environment, the Mover will perform the Stagger behavior. The Stagger behavior is actually made of two separate behaviors, an Avoid behavior which turns away from obstacles, and the random movement which defines the behavior. While the random movement is inherently unpredictable, the Avoid behavoir follows a mathematical funciton, and should be predictable. To test whether or not this is true for our network, the Mover will change color when it is in the Avoid mode. In this way, color is analogous to the flag bits in the original EA experiment. The results we hope to see are a similar clustering of hidden representations, the ones representing Avoid spread out and organized to represent the knowledge the network has about that behavior, and the ones representing the Stagger behavior clustered into a close area to represent their unpredictability.
3. Friday, July 13, 2007
I was mistaken yesterday when I said we would use color to indicate predictability. Instead we indicated it by appending 5 extra units to the input layer and turning them on if the Mover was in Avoid mode.The network seemed unable to learn the behavior even after letting it train for 18 hours. We decided to take a different approach.
Instead of a camera image, the Watcher now recieves scaled x and y values from the Mover. While we access the simulation object to obtain these values, one can imagine a real life overhead camera with a blob filter performing the same function.
Data display window:
Information about the actual Mover is in blue. The dot is the the position of the Mover 10 steps previous, which is the input to the Watcher network. The cross is the current position of the Mover, which is the target of the network.
Network ouput is in red. The red plus is the output that corresponds to the the predicted position of the Mover. If it is correct, it will overlay the blue cross. The black line represents the error. The absolute values of the x and y components of that vector are the x and y error. The red line represents the error anticipation. If target - ouput = error, then output + error = target. Output + error anticipation then is equivalent to the vector pointing from the ouput to where the network "thinks" the target is.
We will begin running experiments with this new system to see if it can produce the behavior we want.
4. Friday, August 3rd, 2007
I ran a trial using the the new environment. I trained for 10000 on the Stagger behavior steps and saved the hidden representation in 200-step batches. Then I used the PCA analysis progaram that Neil was using to analyze those files. The results showed that after 10000 steps the Watcher was unable to either succesfully learn the behavior or develop structure in the hidden representations that correspond to the predictablilty of the input. This could be because the task is too difficult, or becuase the two behaviors are not distinct enough from each other, or the network simply didn't have enough time to learn.PCA representation after 10000 steps:
All the work I did this summer is in the "watching" directory.
watching/
-
multiRobot/
-
EaRoom.py
-
MoverBrain.py
-
MultiRobot.py
-
WatcherB3.py
-
old/
-
WatcherB2.py
-
WatcherBrain.py
-
PCA/
-
Data/
-
True/
-
.hiddens and .labels files
-
EaRoom.py
-
MoverBrain.py
-
MultiRobot.py
-
WatcherB3.py
-
XandY/
-
All files in this directory have myteriously disappeared. They evolved into the files in the PCA directory.
-
EaRoom.py
-
MoverBrain.py
-
WatcherB3.py
