CAROLL: a Cheap Autonomous Robot for On-Line Learning

Francesco De Comité

This document describes the conception, building and programmation of a non-expensive autonomous wheeled robot, using as few electronic parts and special skills as possible. It makes use of a cheap but strong micro-controller, with enough program and data memory to enable sophisticated learning schemes (reinforcement learning, neural networks implementation, population based incremental learning ...)

Overview

Figure 1 shows a general view of Caroll, while figures 2 and 3 show respectively a view from above and a view from below. Its principal characteristics are : The whole robot is mounted on a DVD box, and is powered by 2× 4 AA batteries. More power might be allocated to the motors if available. The motherboard can be connected to an ICSD (In-Circuit Serial Debugger), which enables programming, debugging, reading and writing the content of the EEPROM. I personnaly use Microchip's ICD2 programmer-debugger. If you know of other programmers-debuggers for this chip, please let me know !



Figure 1: A general view of Caroll

CAROLL : Cheap Autonomous Robot for On Line Learning






Figure 2: A view of Caroll from above
Caroll bot from above






Figure 3: A view of Caroll from below
HPIM6603




Building the robot

Mechanical parts

Figure 3 shows how Meccano parts are used to fix the motors to the DVD Box.

Light sensors

Light sensors, built from photo-resistors, have an analog output which will be converted inside the micro-controller by the built-in AD converters.

A light sensor corresponds to the schematics shown on figure 4. The wiring schematics, using a 22KOhm adjustable resistor is described in figure 5. Figures 6,7 and 8 are pictures of the final unit, which can give ideas on how to solder all the stuff together. Figure 9 shows a sensor in position.


Figure 4: Light sensor schematics




Figure 5: Light sensor wiring







HPIM6643
Figure 6: Light sensor (front view)



HPIM6641
Figure 7: Light sensor (view from above)



HPIM6642
Figure 8: Light sensor (view from below)


 

HPIM6605
Figure 9: Light sensor in position


 

Proximity sensors

Those sensors are build from an IS471F and an infra-red LED. They can detect obstacles at a distance of circa 10cm, depending on the obstacle color. The output is digital (0 if obstacle in front, 1 otherwise). Figures 10 and 11 show the final module. Connections are, as for the light sensors : GND, +5V, micro-controller, from top to bottom. Figure 12 shows the sensor in position.


HPIM6635
Figure 10: Proximity sensor from above



HPIM6639
Figure 11: Proximity sensor from below



HPIM6606
Figure 12: Proximity sensor in position



The motherboard

The final implementation of parts on the board strongly depends on personal skills...

Figure 13 shows the step preceeding the final soldering: the experimental board.

The coding wheel is used to choose how the robot will behave when powered on: either reflex moving (seeking the maximum light) or learning. Details on the choices are given into the program file.


HPIM6608
Figure 13: Motherboard wiring





Figure 14 shows the connections from the PIC to the ICD2 programmer.


connect
Figure 14: Motherboard with its connection to the ICD2 programmer





Programmation

The program gives the robot two basic behaviours and some variations: Both behaviours are coded inside the same program. Choosing the value on the coding wheel before switching on the robot lets us define the robot's behaviour.

Reflex moving

This behaviour is chosen when the coding wheel is set to 4 (100 in binary). The program then just reads the four light sensors, orders the sensors according to those values, and chooses the direction where the intensity of light is maximum (i.e. the digital value read from this sensor is minimal).

Little work is to be done in order to also integrate the proximity sensors...

Learning

The algorithm used here is a simple Q-learning algorithm : For details on reinforcement learning, see Sutton and Barto's book, and more particularly the description of the algorithm.

The program source is here.

This is the linker script.

You can also download the hex file.

In case you can use the hex file with another programmer than the MPLAB ICD2, please let me know...

Some comments on the program

A state, based on the value of the robot's sensors could have been defined by : There are 24 possible orders for the 4 light sensors, and at first glance 16 possible configurations of the (digital binary) proximity sensors. This would lead to a total of 24 × 16 = 384 different states. As there are four possible actions, this leads to a total of 1536 (state,action) couples. As reinforcement learning, in its basic acceptance, needs to store all the possible Q(s,a) values, it would then be neither impossible to store those values in EEPROM (1024 bytes), nor in the data memory (3968 bytes < 1536 × sizeof(float)=4).

If the robot can not store the Q(s,a) values into its EEPROM, it would be impossible to flush back its EEPROM in order to see what has been learned, and incremental learning (that is to say, beginning to learn from previously learned values) would also be impossible, as the only non-volatile memory is the EEPROM.

For this reasons, we restricted the state description to the following : This restricts the number of states to 24 × 4= 96, the number of (state,action) couples to 96× 4=384, and the space occupied by 384 floats to 384 × 4 = 1536.

It is now reasonnable compared to the size of the data memory, but still a little bit too much if you want to store values into the EEPROM.

We then decided to store an integer approximation of Q(s,a): as those values will always be between 0 and 10 (due to the elementary reward and the value of gamma), before storing a value, we multiply it by 6553.5, and take its integer part, which is stored into the EEPROM.

The inverse process is applied when one needs to extract a stored value from the EEPROM.

Then we use only 768 EEPROM bytes, with little loss of precision, and have still room to store other information (value of alpha, epsilon...).

Conclusions

Some tuning is still necessary to enhance the learning process: values of alpha, rank, gamma...

There is still plenty of room into the program memory (the actual program occupies only 10% of it!) to think of other learning schemes, neural networks for example.

Feel free to ask for more informations, bugs ...: Francesco.De-Comite@univ-lille1.fr

Should you find this article useful for you, may I ask you to send me a postcard from were you live ? My address is on my main page.

Logo Design
Hit Counter


News, Tutorials, Discussion Forums, RobotCafe.com Has It All!
News, Tutorials, Discussion Forums, RobotCafe.com Has It All!


Locations of visitors to this page

This document was translated from LATEX by HEVEA.