Feature subset selection

Feature Subset Selection (FSS) is a method that can be used to find a subset of genes (features) that either alone, or in pairs are able to discriminate between groups of samples. To score genes based on how they separate between groups on their own, the ranking method should be set to Individual ranking. This is equivalent to doing traditional e.g. T-test (if t-score is selected as the scoring function). It is also possible to score genes based on how they discriminate between groups when paired with another gene. There are two ranking methods available for doing this analysis: Greedy pairs ranking and All pairs ranking.

Bø TH and Jonassen I: New feature subset selection procedures for classification of expression profiles. Genome Biology, 3(4):research0017.1-0017.11, 2002.

Statistical test available:

A number of different statistical tests are available for scoring the genes.

Running FSS

Click the ( Feature Subset Selection / ANOVA) button or select Methods | Supervised analysis | Feature Subset Selection/ANOVA from the J-Express menu bar.
Here you can see a list of the groups you have created. If the list is empty (apart from the one group named ALL), you have to:
1. Create groups of the samples you want to compare first.
2. Then open a new FSS window and select the groups you want to compare.
If you want compare 2 groups, select FSS. If you want to compare more than 2 groups, select ANOVA. Click next.
Select the FSS parameters or ANOVA values you want as results.

FSS:
1. Select a scoring method.
2. Choose the ranking method
3. Mark the format of your data. If your data has been log2 transformed, then it is important to mark this, as it will have an effect on the fold change values that will be calculated. If your data has not been transformed, then set the value to Linear.
ANOVA:
1. Select whether you want the score columns in the result to be listed as F values or P values.
  - F is a statistic from the ANOVA analysis.
  - If you want to see the P-values calculated from the F-values, select P-values.
2. Mark the format of your data. If your data has been log2 transformed, then it is important to mark this, as it will have an effect on the fold change values that will be calculated. If your data has not been transformed, then set the value to Linear.

The gene list in the table on the left shows the ranked list of your genes. Select a few genes in this list with good scores to see how these genes separates the groups you are comparing. The plot to the right shows the separation of the samples. The spots will have the colour of the group used in the analysis. If the spots in the plot are too small, right-click in the plot area and increase the spot size.

If you select one gene from the table, the data matrix values for this gene will be plotted against itself. Look at the spots along the diagonal and see if spots with the same colours are clearly separated. If selecting one gene does not separate the groups well you can try selecting two genes. The values of one gene will now be plotted against the other. This will spread the data out into two dimensions and hopefully give you a better separation. You can continue selecting more genes to see if you get a good separation of the data. When selecting more than two genes the first and second principal components are used as axis to plot the spots. Note: there is no point in plotting too many genes. Since it takes a long time to calculate principal components for many genes, it is advised to uncheck Update Plot for large selections. For more help on this component press F1 for help.
If you have a Gene Graph Viewer open, you can see the expression profile of your selected genes there (remember to click on the (Shadow Unselected) button. The names of the different states (see underneath plot) in the profiles are coloured with the group colours. See if the values in the profiles are different between the different groups.
Back to the FSS window: There are different ways of outputting the results:
- Branch Selection. This will add a new branch to the J-Express Project tree. This data set contains only the genes you have selected.
- Click FSS | Store Rank List. This will add a new branch to the J-Express Project tree. This data set contains all the genes in the data set. If you right-click on the newly added node you can open the component. This will display a table containing the score results from the analysis.
- Click File | Save Table. This allows you to save the table to a tab-delimited text file.