Classification of colon cancer based on the expression of randomly selected genes

Author(s): X.H. Tan, R. Cheng, H.P. Hu and Y.P. Bai

In order to ascertain the relationship between gene expression and colon cancer localization, a classification method based on random gene selection and a self-organizing map network is proposed. Different numbers of genes were selected randomly from 54,675 genes of 53 colon cancer patients in stage union for international cancer control II. These patients were then divided into two sets: a training set of 36 and a validation set of 17 patients. In this study, we randomly selected 1000, 100, 50, 30, 10, 5, and 3 genes, 1000 times, respectively. The minimum misclassification ratio of each gene group was 3/17 to 4/17, and the percentage of gene groups that were less than 0.25 was approximately 1-7%. Moreover, the misclassification ratio of most gene groups (about 82-89%) was lower than 0.4. Through the analysis of these low misclassification ratio gene groups, we found that there were few common genes between them. This revealed that colon cancer localization is not associated with a single gene group but with many gene groups. Furthermore, K-fold cross validation was used to test the reliability of the possible informative genes, and the results indicated that using gene expression to classify colon tumor localization was not feasible.