adtree and xadtree
user guide

F. De Comité   D. Devigne

Postscript zipped version

French version

1   Introduction

adtree is an implementation of Freund & Mason [FM99] Alternating Decision Tree Algorithm, extended to multilabel classification [CGT01].

xadtree is a graphic interface designed to simplify the use of the algorithm.

2   Requirements

adtree is written in basic C, and should compile and run on any platform, provided the following softwares are available :



3   Installation

In case of problem, send us a mail, together wih the error message : francesco.de-comite@univ-lille1.fr.

4   Data Format

The program needs at least two files : Two example files are provided in subdirectories:

The file format is very close to Roos Quinlan's C4.5 format [Qui93], and in fact all the files from the UCI [MM98] can be processed by adtree. In addition, new datasets allowing examples to have multiple classes are recognized and run by adtree.

The first part of file name must be the same for all three files.

For both files, a vertical bar (|) stands for the beginning of a comment, continuing till the end of the line. Comments and blank lines can be inserted anywhere.

A line can end with a '.', but it is no compulsory. If the last line of the file ends with a '.', it must be followed by carriage return.

4.0.1   Format of the .names file

The first line lists the names of the classes, separated with a comma. A class name can use any character, excepted comma. To put a comma in a class name, one must preceed it with a \.

Each of the following lines describes an attribute, in the same order they will appear in the data files.

An attribute description begins with the name of the attribute (any sequence of characters), followed by ':' and the attribute type.

The attribute type is one of the following :

| An example of  .names  file (this line is a comment)
velo,auto,ballon à gaz,fusée. 

nombre de roues : continuous. 
marque : ignore.
inconvenients : discrete 6.
destination : nulle part,ecole,la mer,ciel,lune,mars

4.0.2   Format of the .data file

An example is composed with the list of its attribute values, in the same order they were declared in the .names file, comma separated.

Attributes values are followed with the list of all classes an example belongs to, each class name being separated from the next one with a comma. (this is a difference beetween adtree and boostexter).

The following .data file corresponds to the above .names. Second example belongs to two classes, sixth example belongs to no class.

2,Eddy Merckx,crevaison,ecole,velo.
4,Peugeot,crevaison,nulle part,velo,auto
4,Opel,consomme de l'essence,la mer,auto
0,Mongolfier,atterrit n'importe où,ciel,ballon à gaz,fusée
0,Saturn 5,coûte cher,lune,fusée
8,Aldi,pas garanti,nulle part.

5   adtree

5.1   Basic commands

Provided adtree binary is in your path, the following command :

adtree -f filename

will build a ten nodes tree, and display it in text mode.

For the above short example, the first three rules will be :

Number of rules : 10
-------------------------
Rule 0 
if TRUE 
(Number of items verifying the precondition : 6)
 Class :                         velo    auto    ballon  fusée
         If TRUE then            -0.34   -0.34   -0.80   -0.34
Else 0
-------------------------
Rule 1 
if True 
(Number of items verifying the precondition : 6)
                               Class :          velo    auto    ballon  fusée
         If nombre de roues >= 1.00 then         0.00    0.00    -2.63   -2.63
                              Class :            velo    auto    ballon  fusée
         If nombre de roues < 1.00 then          -2.29   -2.29   0.00    2.29
Else 0
-------------------------
Rule 2 
If nombre de roues >=  1.00000
(Number of items verifying the precondition : 4)
                                Class :          velo    auto    ballon  fusée
         If inconvenients =crevaison then        2.67    0.00    -1.38   -1.38
                                 Class :         velo    auto    ballon  fusée
         If inconvenients !=crevaison then       -2.67   0.00    -1.38   -1.38
Else 0
The following figure shows the first four rules of the tree build from the exemple.




5.1.1   Reading the results

Decision structure is a tree, each rule corresponds to a branch, or part of a branch.

Branches are made of successive confidence nodes and test nodes. The root is a confidence node.

Classifying an example means make it go down the tree, according to the tests it encounters on its way, adding accordingly the values it meets.

Rule 0 corresponds to the root of the tree : each example is credited with -0.8 for the class ``ballon'', and -0.34 for the other classes.

Rule 1 is directly under the root, the value of harvested coefficients depends on the attribute ``nombre de roues'' (number of wheels). If the number of wheels is less than one, the chance to have an example of class ``velo'' (bicycle) decreases, which is shown by the -2.29 coefficient value. On the other hand, this increases the chances to have a ``fusée''(rocket).

Rule 2 is mode deeply situated in the tree : it is used only for examples having at least one whell.

When an example has a missing value for an attribute used in the test, it is cut into pieces, and those pieces follow all the branches under this test.

5.2   adtree command line options

The option :

adtree -h

gives the list of available options.

 Options:
         usage : adtree -f file_name
         Options : 
         -u : Evaluation on test set
         -b #n : number of boosting steps (default 10)
         -v : verbose
         -h : this text
         -n : non normalized weights (default : normalized)
         -d : drawing a postcript visualization of the tree
         -t : enable n_ary tests
         -B : boostexter behavior
         -r : random choice of precondition to develop
         -O : display One-Error
         -H : display Hamming loss
         -C : display confusion matrix and class by class results
         -A : display coverage and avg prec
Detail :

6   xadtree

xadtree is a graphical interface, calling adtree to create trees, classifying examples, display statistics.

xadtree can load files, choose adtree's options, build and visualize trees, show how an example go through a tree...xadtree was written by Damien Devigne (devigne@lifl.fr), student in Maîtrise d'Informatique at the University of Sciences (Lille).

Development is still in progress, please contact us for remarks, bugs and wishes.

6.1   Using xadtree

The command xadtree will launch the program.

6.1.1   Main Window

The first window opens, the only possible action is to enter a file name, without extension in the Data field.

(We plan to allow users to enter different file names, but it is still not implemented : button Use different filenames)






Once the name entered and validated, files .data,.names, and eventually .test, are loaded.






One can then :




Many buttons are reacting when the mouse passes upon them, so they could be later used for new functionnalities (attribute caracteristics, for example ...)

6.2   Configuration window

Allows the user to choose among adtree options :








Once the options are chosen, the user can, from left to right :

6.3   Result window

The two ways to reach this screen are from button New Adtree in main window, or button Save and Create new tree of configuration window.

It consists of two or three parts, whether a test set is available or not :






This page contains also four buttons :

Test item
: To see the path followed by an example through the tree.
Clear path
: to clear path information and come back to the original tree.
Create PostScript
: display, and save a PostScript representation of the tree.
Toggle Tree Display/ Stats Display
: to toggle between the tree window and the statistics window.
All those buttons and window are described hereunder.

6.3.1   The tree window

Blue lines are confidence nodes, green lines are test nodes.

A confidence line is splitted into three zones. From left to right :

A test line contains :

6.3.2   Data and Test Windows

Both windows are only lists of increasing numbers. They are the rank of the examples in the data files.

Selecting one example makes button Test item active.

6.3.3   Test Item

When an example is selected in either Data or Test window, one can visualize the path followed by this example through the tree.

Green branches are those followed by the example. Red branches are not taken by it.

If an example has a missing value for a test in the tree, all the branches are green : the example follows all of them.

The value inside the brackets (at the top of the tree) is the maximum value of the coefficients vector. If the classification is said to give only one classification per example, this corresponds to the class found by adtree.

6.3.4   Clear path

Clear all red branches, and return to the original tree.

6.3.5   Create PostScript

Call a function which :

6.3.6   Toggle Tree Display / Stats Display

Toggle between the tree window and the statistics window.

The statistics window displays One-Error, Hamming loss, coverage and average precision.

Confusion matrices give also class by class informations.






References

[CGT01]
F. De Comité, R. Gilleron, and M. Tommasi. Learning multi-label alternating decision trees and applications. In Gilles Bisson, editor, Proceedings of CAP'01 : Conférence en Apprentissage Automatique, pages 195--210, 2001.

[FM99]
Yoav Freund and Llew Mason. The alternating decision tree learning algorithm. In Proc. 16th International Conf. on Machine Learning, pages 124--133, 1999.

[MM98]
C.J. Merz and P.M. Murphy. UCI repository of machine learning databases, 1998.

[Qui93]
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA, 1993.

[SS00]
Robert E. Schapire and Yoram Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39(2/3):135--168, 2000.

This document was translated from LATEX by HEVEA.