Prognosis prediction of stage II colon cancer by gene expression profiling
3565 Background: The aims of the present study were: 1) to identify a prognosis signature (PS), based on microarray gene expression measures, in stage II colon cancer patients and to assess its accuracy with resampling techniques ; 2) to assess the accuracy, also with resampling techniques, of a previously proposed 23-gene PS. Methods: Colon tumor mRNA samples from 50 patients were profiled using the Affymetrix HGU133A GeneChip (22283 sequences). In a first part, the 50 patients were randomly divided into 2 groups (G1 and G2) of equal size that were considered alternately as training and validation sets. In a second part, the 50 patients were randomly divided into 1600 training (size=n) and validation (size=50-n) sets. Informative genes were selected on the training set by taking the 30 most differentially expressed genes between patients who recurred and those who remained disease-free; the accuracy of this PS was assessed by comparing the predicted prognosis (using a diagonal linear discriminant analysis (DLDA)) and the actual evolution for all the validation set patients. Using the same random splits, the accuracy of the 23-gene PS was assessed with a DLDA that used learning set patients as reference samples. Results: The 30-gene PS that was identified from G1 (G2) patients yielded a 80% (84%) prognosis prediction accuracy when applied on G2 (G1) patients. With resampling techniques, the prediction accuracy regularly increased with the learning set (LS) size: 65.5% (range=52.5–75%) with LS of size 10, and 82.7% (range=60–100%) with LS of size 40. Comparisons of compositions of the 100 PS for a given value of n suggested a high instability of informative genes; with LS of size 10, 7 genes were part of at least 10% of signatures; with LS of size 40, 7 genes were part of all the 100 signatures. The accuracy of the previously proposed 23-gene PS also increased with the learning set size. Conclusion: Microarray gene expression profiling represents a promising technique to predict the prognosis of stage II colon cancer patients. The present study also outlines the high instability of informative gene selection and suggests the usefulness of resampling techniques to obtain an honest assessment of prognosis prediction accuracy. No significant financial relationships to disclose.