About
This tool identifies Arabidopsis genes with similar expression patterns. It does this by calculating the correlation coefficients for all gene expression vectors as compared to an expression pattern that you define with a graphic input tool, or to an expression pattern associated with an AGI ID or gene name that you specify. The output of this tool displays an eFP image depicting the expression data and correlation coefficients for each gene that meets your cut-off criterion, or specified number of hits.
Expression Angler was written by Jamie Waese, Jim Fan, Asher Pasha and Nicholas J. Provart.
If you find the tool useful, please cite: Ryan S. Austin, Shu Hiu, Jamie Waese, Asher Pasha, Nina Wang, Jim Fan, Curtis Foong, Robert Breit, Alan Moses, Nicholas J. Provart (manuscript in preparation), New BAR Tools for Mining Expression Data and Exploring Cis-Elements in Arabidopsis thaliana.
How do I use it?
Option 1: Define a custom expression pattern (aka "custom bait").
Use this mode if you are looking for genes with a particular expression pattern. For example, if you were looking for genes that are only expressed in mature seeds, you might use a query that looks like this:
Step 1: Use the dropdown menu to select a view. The "Developmental Map" view shows the whole plant at various stages of development. The "Abiotic Stress" view allows you to query expression patterns from multiple experiments at various time points (e.g., cold, heat, wounding). The "Chemical Stress" view allows you to query expression patterns from several chemical and hormone treatment experiments. The "Tissue Specific" views allow you to query expression patterns in specific tissues (e.g., Root, Embryo Development, Guard and Mesophyll Cells, etc.).
Step 2: Set values by clicking on a tissue and using the popup panel. Use the slider to adjust the relative abundance for gene expression in the selected tissue, or use the buttons to set maximum or minimum relative expression. Because the calculation for the Pearson correlation coefficient mean-centers and standardizes each profile by the standard deviation, the absolute value that you enter is not important. It is, rather, the shape of the profile that is important.
The 'Exclude From Search' button removes samples for the correlation calculation. If, for instance, you know that a gene is up-regulated in seeds, but are interested as to whether it exhibits co-expression with genes in other tissues, you may wish to drop the seed samples for the calculation, as otherwise you will identify large numbers of genes that are seed-specific. This is very useful for identifying genes which are:
- expressed only in certain tissues
- expressed only under certain conditions
- expressed only in a mutant but not wild-type.
You can also use these buttons to adjust all the tissues at once:
For those who prefer a text-based interface over a graphic display, click on the 'Table View' tab at the top of the right panel. Here you can manually enter values for each tissue:
Step 3: Limit the number of results you'd like Expression Angler to return. Because you don't know if the profile you are entering is realistic, it is probably better to select the 'Top 10' or 'Top 25' option, as opposed to specifying an exact r-value cutoff. You will then see if it is or is not realistic based on the profiles of the genes you identify, regardless of their r-value --you'd be looking for strong expression signals in the samples for which you designed the custom bait to identify genes for.
You can also use the slider to choose the number of best (or worst) correlated genes you'd like to see. Did you notice that the slider has two handles? Use these to enter a lower and upper r-value. The highest the r-value can be is 1, and that means that two vectors are a perfect match. Zero is no match, and -1 is a perfect anti-correlation, i.e. the expression response is exactly opposite to that of your gene of interest. The default lower r-value cutoff is 0.75, and the default upper r-value cutoff is 1.00. A tighter r-value range will result in fewer matches.
Step 4: Press the 'Search' button.
Option 2: Select by AGI ID
Use this mode if you are looking for genes with similar expression patterns to a particular gene of interest. For example, if you were looking for genes with similar expression patterns to ABI3, you would follow these steps:
Step 1: Enter an AGI ID or gene alias in the input box. An auto-suggest feature will list available gene names as you type.
Step 2: Limit the number of results you'd like returned.
Step 3: Press the 'Search' button.
Results
After doing its calculations, Expression Angler displays a results panel like this:
Select whichever genes you'd like to download, or press the 'Select All' button to select the entire list. Then press the green 'Get data for selected genes' button to download expression levels for all the selected genes.
How does it work?
The metric for measuring similarity of expression patterns is the Pearson correlation coefficient, commonly denoted by r. It is calculated for two gene expression vectors (series of values over a given number of samples) as follows:
Note that the Pearson correlation coefficient effectively normalizes the magnitude of the expression vector. Thus the Expression Angler program will identify those genes which respond in a similar manner. That is, genes which have a relatively moderate expression pattern in, say, Sample 1 and a high relative expression level in Sample 2, and a low relative expression level in Sample 3, and so on, will be identified as similar (will be scored with a higher r-value), even if the expression levels are dramatically different. For the two expression vectors for Genes X and Y above, the Pearson correlation coefficient is one, that is, the genes are responding identically! You may wish to manually examine the absolute expression levels on the output, especially if you are planning to use the genes as markers - you would want relatively strong expressors in this case. All genes having a correlation coefficient higher than the specified threshold will be returned, with the stipulation that a given gene must map to a probe set on the GeneChip. As a rule of thumb, anything below 0.7 is not very well co-expressed, but may be significant. See Usadel et al. 2009 for a discussion of how to convert r-values to p-values.
If you are interested in genes that are anti-correlated with your gene of interest, choose the bottom 10 or 25 genes to view. The genes have the most opposite expression pattern to your gene of interest are at the very top of the list that is returned, which is sorted by increasing r-value.
Data Sources
The data sets used by this tool come from the
Bio-Analytic Resource,
NASCArrays, the
AtGenExpress Consortium's Tissue, Abiotic Stress, Pathogen or Hormone compendia, and other compendia compiled by the BAR curators. The AtGenExpress developmental data set was produced by Markus Schmid and Jan Lohmann (MPI Tübingen) and is published in
Schmid et al., 2005, Nature Genet. 37: 501-6. The other AtGenExpress data sets are best documented at TAIR. Click the following links to go to documentation for
Abiotic Stress,
Biotic Stress and
Hormone data sets. All contain gene expression data for ~22000 Arabidopsis genes generated using the ATH1 Affymetrix Whole Genome GeneChip.
The expression compendium from the BAR consist of 93 samples, with plant age, experiment type, tissue type, and treatment information appended. The AtGenExpress Developmental compendium consists of expression level measurements from 79 tissues samples in triplicate, again with meta-information appended. The data from NASCArrays are from 392 samples. [Thanks to David Craigon at NASCArrays for making this data set available in a file, called supercluster.txt, that was downloaded on the 23rd of February, 2004.] Other compendia that we have compiled include the AtGenExpress Plus - Extended Tissue Compendium encompassing the AtGenExpress Developmental Map and cell-type-specific samples as denoted in our eFP views here and here, a Root Compendium as denoted in this eFP view, a Seed Compendium as per this eFP view, and finally a Natural Variation Compendium as in this eFP view. The actual GEO or NASCArrays or other identifiers for each sample are available on the Table View of this tool.
Note, because of the different samples in each data set, the r-value results returned may not be the same when you do the analyses with a given gene in different compendia - but this is also informative, especially if you are searching for secondary roles for your gene of interest: leave out the tissue types where it is known to play a major role (see Usadel et al. 2009 for a discussion of this phenomenon). The AGI ID to probe set conversion is based on a file from TAIR, indicated on the 'About' popup of the BAR homepage.