Forward genetic screens hold great promise for identifying mutants with cell wall phenotypes of interest; however, the time and cost required to physical map these mutations can be considerable. We have developed a method to quickly and efficiently map recessive mutations identified in such screens using next-generation genomic technology, and validated the approach by mapping 3 Arabidopsis cell wall mutations. Briefly, F2 lines generated by crossing a mutant plant to a mapping line are sequenced en masse using a nextgeneration sequencing platform such as the Illumina Genome Analyzer, producing tens of millions of short reads, which are aligned to a reference genome using existing public domain software. We then apply our novel approach that identifies the mutation of interest by examining the distribution of SNPs between the mapping and mutant genomes. All 3 causal mutations were successfully identified and functionally confirmed in planta.
To identify cell wall related mutants we developed a screen, which involved identifying mutants that exhibited hypersensitivity to an herbicide (flupoxam) that specifically interferes with cell wall biosynthesis.2 This strategy relies on the principle that mutations in genes that lead to altered cell wall structure will exacerbate the deleterious effects caused by the herbicide. A known cell wall mutant (mur11)3 and two putative mutants, designated fph1 and fph2, were identifed in the screen.
For each mutant, a mapping population of 80 F2 plants was sequenced en masse using Illumina's Genome Analyzer at 38 cycles paired-end. Between 115 and 300 million reads were then mapped to the TAIR9 genome using the Maq short read mapper to a depth of 29x, 74x and 39x respectively.1 Genome-wide SNPs were then pulled from the mapping and filtered with recommended quality score cutoffs.
SNP frequencies in the mapping population, binned at 250Kb intervals, show reproducible natural variation patterns across each chromosome. However, the non-recombinant region, possessing the mutation of interest, is a readily identifiable SNP desert. Such deserts were found in the telomeric regions of chromosome 3 and 1 for mur11 and fph1 respectively and in the right arm of chromosome 1 for fph2. Genome-wide SNP frequencies for fph2 are shown below.
In order to distinguish between SNPs arising from natural variation and those representing potential causal mutations, we devised a slightly modified version of Illumina's chastity statistic. Termed discordant chastity (ChD), the statistic measures the degree of difference between the SNP and the expected reference base. Using the mapping information comprising a SNP, the most frequent base that is not the reference base is compared to the next most common base after it.
The positional SNP frequencies across the non-recombinant chromosome are partitioned into discordant chastity intervals (window=0.1, slide=0.01) and smoothed using kernel density estimation. K-means clustering of the resulting "threads" identifies distinct chastity belts, from which the signal corresponding to natural variation (ChD ~ 0.5) and that to mutation (ChD ~ 1.0) are extracted. The ratio of these two signals then provides an estimate of the mutation position. Repetition of the procedure at finer kernel sizes (adjustment = 0.5 and 0.25) decreases the amount of smoothing employed in the kernel density estimation to yield a more empirical representation of the data and thus a more refined estimation of the mutation position.
SNPs localized using the above ratio are then annotated for their ability to cause a non-synonymous mutation in coding sequence or splice site disruption and filtered (ChD > 0.85). Results produced 5, 1 and 2 potential mutation candidates for each mutant respectively.
Since it would desirable to know the depth of coverage required to effectively apply our procedure, we replicated the analysis using incrementally increasing data. As our original analysis employed 7 lanes of an Illumina GA flow cell, the procedure was simply repeated using data from a single lane up to all 7 lanes. Although resolution was poor with a single lane, the mutation was still present in the results.