Home     Browse     Genes     Families and HMMs     Pathways     Ontologies     Tools     Workspace  

Calculating Optimal Tree-Based Classification

Download script to determine the optimal Tree-Based Classification

View and compare trees built by different algorithms


The script "tree_based_classification" takes :
 - a rooted binary tree in Caley representaion
 - an attribute file with the list of sequence names and corresponding reference categories:

'seq_name'   'any_field'   'classification_level_1' ........
'-----------'   '--------' .....
'-----------'   '--------' .....


  The fields in the attribute file are separated by "tab" characters and there should be no "tab" characters inside the fields.
  The user should specify also the level of classification, for which the optimal tree-based classification will be found (1 is default)


  - The script optimally finds non-overlapping GOOD subtrees for some categories.
  - All sequences in a GOOD subtree for some category get assigned this category.
  - In the optimal tree-based classification the number of sequences whose assigned
  category coinsides with reference category,i.e. correctly classified, is maximal.
  The main result is summarized in the number( or percent) of misclassified sequences.
  The script produces also detailed information:
  The list of GOOD categories and corresponding subtrees with the information on their composition.




About   |   Contact Us   |   System Requirements   |   Privacy Policy   |   Disclaimer
© Copyright 2007 SRI International. All Rights Reserved.