Phenotypic assessment of mesotrione responses in resistant and susceptible waterhemp genotypes
Samples used for RNA-seq were harvested prior to the development of visual mesotrione treatment symptoms; therefore, herbicide-treated and mock-treated control plants were maintained in the greenhouse for 3 weeks after mesotrione application to assess phenotype responses. Both genotypes responded to the mesotrione application as expected. The resistant population initially displayed the major HPPD-inhibiting herbicide characteristics of chlorosis and bleached meristematic growth followed by necrosis but recovered by the third week after application. Visual comparison of mock-treated resistant and mock-treated susceptible to mesotrione-treated resistant (Fig. 1) at 3 weeks after treatment showed slight differences, primarily minor stunting and sparse tissue damage within the canopy of the mesotrione-treated resistant population. Conversely, the mesotrione-treated susceptible population sustained heavy tissue bleaching and eventually necrosis and plant death. These observations and comparisons verified the proper herbicide response of both genotypes to mesotrione treatment.
RNA-Seq and de novo assembly of the waterhemp transcriptome
Purified RNA from three replicates of 16 samples (3, 6, 12, and 24-h samples of mesotrione-treated or mock-treated susceptible and resistant genotypes) were sent to the Iowa State University DNA Facility for the creation and sequencing (100 base pair, paired-end sequencing) of 48 multiplex libraries. A total of 2.45 billion raw reads were produced. Following the sequence clean up described in the Materials and Methods, 2.36 billion sequences were used for de novo transcript assembly using the program Trinity (version 2.0.6, [24]) with three different kmer lengths (k = 25, 29, and 32). Sequences from all samples were used to yield a broad representation of the waterhemp transcriptome and allow the identification of genes expressed in a genotype, treatment or time-specific manner.
When comparing the three assemblies (v1, v2 and v3) generated with differing kmer lengths (25, 29, and 32, respectively) we noted that as the kmer length increased, transcript number decreased and Contig N50 increased (Table 1). The Contig N50 is a weighted median of contig (contiguous overlapping sequences) length where 50% of the assembled nucleotides are contained in contigs greater than or equal to the length of the Contig N50; it can be used as an important measurement in assembly evaluations and was a major factor in the decision of which assembly to use for our analysis [46, 47]. In addition, we visualized contig length distribution for each of our different assemblies (Additional file 1). As suggested by the contig statistics, increasing kmer size increased average contig length and decreased the number of contigs. This was especially evident for contigs smaller than 1000 base pairs (Log10 3). Therefore, we chose to focus on the third assembly (v3, kmer = 32) for subsequent analysis. Following selection of the v3 assembly, we still needed to remove sequences that lacked open reading frames (ORFs), were redundant, or were expressed at extremely low levels. From the initial v3 assembly containing 451,199 transcripts, TransDecoder [25] was used to identify all transcripts with ORFs> 100 base pairs and remove redundant transcripts, leaving 128,737 transcripts. Similarly, kallisto [26] identified 97,944 lowly-expressed transcripts in the v3 assembly. Cross-referencing the TransDecoder and kallisto datasets resulted in 119,635 transcripts with ORFs> 100 bp and read counts > 10. Finally, a series of BLASTN analyses described in the materials and methods were used to eliminate transcripts with the best homology to non-plant species or transcripts with no significant hits. This left 113,893 transcripts as the basis of our de novo waterhemp transcriptome (v4) used for differential expression analyses. Sequences for the v4 assembly can be found in Additional file 2.
Functional annotation of the waterhemp transcriptome
The waterhemp v4 assembly was annotated using BLASTX (E < 10− 10, [27]) against predicted proteins from A. thaliana (The Arabidopsis Information Resource version 10 [TAIR10], [31]), sugar beet (Refbeet v1.2, [28]) and Uniref100 (version 1/22/2016, [32]) and using BLASTN (E < 10− 20, [27]) against predicted transcripts from grain amaranth (Phytozome v12.1, Amaranthus hypochondriacus v2.1, [33]). The best A. thaliana hits were used to assign the gene ontology (GO) biological processes and the molecular function terms [34] to each transcript of the v4 assembly. Annotations for the v4 assembly can be found in Additional file 3.
To verify the accuracy and coverage of the v4 assembly, GO biological process terms inferred from homology with A. thaliana were mapped to GO slim terms using custom Perl scripts. GO slim term abundance was then compared between the waterhemp v4 transcriptome assembly and all predicted proteins of the A. thaliana, poplar, papaya, asparagus, sugar beet, and grain amaranth genomes (Fig. 2). Waterhemp, sugar beet, and grain amaranth all belong to the Amaranthaceae family [28, 33], poplar, papaya and asparagus are dioecious species [35,36,37], and A. thaliana is a well-established plant model [48]. For each GO slim term, the abundance of assigned transcripts was measured as a percentage relative to the entire transcriptome or genome, allowing us to normalize for any potential genome duplications within a given species. We found that for thirteen of the fourteen GO slim terms, the v4 waterhemp transcriptome assembly was comparable to the genomes of the six other species. This suggests the breadth of the waterhemp v4 transcriptome is consistent with the breadth of the A. thaliana, poplar, papaya, asparagus, sugar beet, and grain amaranth genomes. The only exception was the GO slim term ‘unknown biological process’ which was overrepresented in A. thaliana, compared to the six other species.
Identification of waterhemp transcripts differentially expressed in response to mesotrione
To allow identification of DETs responding to mesotrione treatment in each genotype, individual samples were mapped to the v4 waterhemp assembly using the protocol described in the Trinity user manual [24]. In total, 782,456,581 reads were mapped to the assembly. The raw expression counts were normalized across samples using the Trimmed Mean of M-values (TMM) method [40] in edgeR [41]. Following visual inspection, all replicate samples were considered good quality. Isoforms were considered expressed if they contained at least 1 count per million across three samples or replicates. Of the 113,893 isoforms in the v4 assembly, 72,697 were considered expressed (v4 isoforms). The average length for the expressed isoforms was 1580 base pairs and contigs assumed a normal distribution (Additional file 1).
edgeR was used to identify DETs responding to mesotrione treatment relative to mock-treated controls within each genotype (herbicide resistant and susceptible) across time and at specific time points (3, 6, 12, and 24 h after treatment (HAT)). DET expression is reported as a log2 fold change (log2 FC). A log2 FC greater than 1 indicates a DET is induced by the mesotrione treatment, while a log2 FC less than one indicates a DET is repressed by the mesotrione treatment. DETs with an FDR < 0.05 [43] are considered significantly differentially expressed in response to mesotrione treatment (Additional file 4).
We identified 89, 62, 61, and 1983 DETs in the resistant waterhemp genotype at 3, 6, 12, and 24 HAT, respectively, and 500, 77, 61, and 565 DETs were identified in the susceptible waterhemp genotype at 3, 6, 12, and 24 HAT, respectively (Table 2). We plotted the number of DETs per genotype within each time point to analyze expression trends across time (Fig. 3). The susceptible waterhemp genotype exhibited large fluxes in DET expression across time. At 3 HAT the susceptible genotype induced 409 transcripts suggesting a quick initial response to the mesotrione treatment. The response diminishes in the middle two time points but then increases again at 24 HAT. In contrast, the resistant waterhemp genotype demonstrated little response to mesotrione treatment at 3, 6 and 12 HAT while a large number of transcripts respond at 24 HAT. Remarkably, while symptoms in response to HPPD herbicide treatments can take as long as 1 week to develop, both resistant and susceptible waterhemp genotypes responded within three HAT. Furthermore, few DETs overlapped between time points within a given genotype (Fig. 4) or between genotypes (Table 2). At 3, 6, 12, and 24 HAT we found 7.7, 3.7, 8, and 3.4% of DETs were common to both waterhemp genotypes, respectively, suggesting a rapid and dynamic response to mesotrione treatment (Table 2).
In addition to identifying transcripts responding to mesotrione treatment at specific time points, we also identified transcripts responding to mesotrione treatment across time. We identified 2091 and 1246 DETs responding to mesotrione treatment across time in the resistant and susceptible genotypes, respectively (Additional file 4). Of these, only 330 DETs were common to both waterhemp genotypes. This reaffirms that the resistant and susceptible genotypes have different responses to the mesotrione treatment.
Characterization of mesotrione responsive transcripts
While differential expression is useful in identifying individual transcripts found in response to the mesotrione treatment, we were interested in identifying transcripts responding to mesotrione treatment that might have similar functions or act in the same molecular pathway. Therefore, for each time point by genotype combination, we used a Fisher’s Exact Test [44] with a Bonferroni correction [45] to identify gene ontology biological process terms [49] significantly overrepresented (P < 0.05) among DETs, relative to the waterhemp v4 assembly (Additional file 5). In the resistant waterhemp genotype, we identified 11 and 12 GO terms significantly overrepresented at 3 and 24 HAT. No significant GO terms were identified at 6 and 12 HAT. Combining all DETs from the resistant waterhemp genotype, we identified 18 significantly overrepresented GO terms. In the susceptible waterhemp genotype, we identified 34, 3, 2 and 24 significant GO terms at 3, 6, 12 and 24 HAT, respectively. Combining all DETs from the susceptible waterhemp genotype, we identified 39 significantly overrepresented GO terms.
To allow direct comparison between resistant and susceptible waterhemp genotypes, we compared unique transcript counts for significant GO terms (P < 0.05) identified at specific time points and over time in both genotypes (Fig. 5). To aid in data visualization, GO terms with DETs that perfectly overlapped with a larger, significant GO term were removed. In addition, only GO terms with at least 10 DETs in either the resistant or susceptible waterhemp genotype are shown. Using this approach, we were able to identify 18 GO terms significantly overrepresented only in the susceptible waterhemp genotype, nine GO terms significantly overrepresented only in the resistant waterhemp genotype and nine GO terms significantly overrepresented in both waterhemp genotypes.
GO terms uniquely overrepresented in the susceptible waterhemp genotype response were largely associated with stress and defense responses including ‘response to osmotic stress’ (GO:0006970), ‘response to hyperosmotic salinity’ (GO:0042538), ‘response to other organism’ (GO:0051707), ‘response to virus’ (GO:0009615), ‘response to wounding’ (GO:0009611), and ‘respiratory burst involved in defense response’ (GO:0002679). Other significantly overrepresented GO terms were associated with metabolism including ‘lignin metabolism’ (GO:0009809), ‘flavonoid metabolism’ (GO:0009813), ‘coumarin metabolism’ (GO:0009805), ‘cellular modified amino acid’ (GO:0042398), ‘pentacyclic triterpenoid biosynthesis’ (GO:0019745), ‘sterol biosynthesis’ (GO:0016126), ‘acetyl-CoA metabolism’ (GO:0006084), ‘phenylpropanoid metabolism’ (GO:0009698), and ‘polyamine catabolism’ (GO:0006598). Other significant GO terms included ‘protein peptidyl-prolyl isomerization’ (GO:0000413), ‘peptidyl-proline modification’ (GO:0018208), and ‘chaperone-mediated protein complex assembly’ (GO:0051131). For 11 of the 18 significantly overrepresented GO terms unique to the susceptible waterhemp genotype, we observed more DETs in the susceptible than the resistant genotype.
GO terms significantly overrepresented in both waterhemp genotypes included ‘response to cyclopentenone’ (GO:0010583), ‘response to endoplasmic reticulum stress’ (GO:0034976), ‘response to hydrogen peroxide’ (GO:0042542), ‘response to high light intensity’ (GO:0009644), ‘response to reactive oxygen species’ (GO:0000302), ‘response to cadmium ion stress’ (GO:0046686), and ‘response to salt stress’ (GO:0009651), ‘heat acclimation’ (GO:0010286), and ‘toxin catabolism’ (GO:0009407). For five of the nine GO terms significant in both genotypes, a greater number of DETs were observed in the susceptible waterhemp genotype.
GO terms uniquely overrepresented in the resistant waterhemp genotype were quite varied in their functions. Similar to the responses in the susceptible waterhemp genotype, we identified GO terms associated with response to stress (i.e., ‘hyperosmotic response’ (GO:0006972), ‘response to temperature stimulus’ (GO:0009266), and ‘response to karrikin’ (GO:0080167)). Interestingly, a number of GO terms were associated with energy metabolism including ‘amylopectin biosynthesis’ (GO:0010021), ‘proteasomal protein catabolism’ (GO:0010498), ‘gluconeogenesis’ (GO:0006094) and ‘trehalose biosynthesis’ (GO:0005992). Other significant GO terms observed in the resistant waterhemp genotype included ‘cytoskeleton reorganization’ (GO:0007010) and ‘transcription’ (GO:0006351).
To understand how DETs in these GO terms responded to mesotrione treatment, we compared their expression patterns and expression profiles between resistant and susceptible waterhemp genotypes (Fig. 6). Of the 4799 total DETs, 1311 and 3034 were uniquely significantly differentially expressed in response to mesotrione treatment in the susceptible and resistant waterhemp genotypes, respectively. A total of 454 DETs were significantly differentially expressed in both genotypes. We compared DET expression patterns across ten overrepresented gene ontology terms identified above including ‘cytoskeleton organization’, ‘gluconeogenesis’, ‘hyperosmotic response’, ‘response to cadmium’, ‘response to high light intensity’, ‘response to salt stress’, ‘response to wounding’, ‘sterol biosynthesis’, ‘toxin catabolism’, and ‘trehalose biosynthesis’. When we examined the DETs common to both the resistant and susceptible waterhemp genotypes, we found that the majority of these genes were induced in both genotypes. However, in the susceptible genotype, expression was strongly induced 3 HAT, weakly expressed 6 and 12 HAT, and again strongly induced 24 HAT. A similar response occurred in the resistant genotype, however the dip in gene expression was largely restricted to the 12 HAT timepoint. In contrast, genes repressed in response to mesotrione were weakly repressed at 3, 6, and 12 HAT, but strongly repressed at 24 HAT.
For DETs unique to the susceptible or resistant waterhemp genotypes, we observed differences in the number and expression of DETs depending on the GO terms of interest. The GO terms ‘cytoskeleton organization’, ‘gluconeogenesis’, and ‘trehalose biosynthesis’ were largely unique to the resistant waterhemp genotype response and were repressed by mesotrione treatment. Aside from those DETs common to both genotypes, few DETs were observed in the susceptible waterhemp genotype. For the GO terms, ‘response to cadmium’, ‘response to salt stress’, ‘response to high light intensity’, and ‘toxin catabolism’, DETs unique to the susceptible waterhemp genotype were largely induced, while DETs associated with these GO terms in the resistant waterhemp genotype were largely repressed. Unique DETs associated with the GO term ‘sterol biosynthesis’ were repressed in the susceptible waterhemp genotype but had mixed expression in the resistant waterhemp genotype, while unique DETs associated with the GO terms ‘response to wounding’ and ‘hyperosmotic response’ had mixed expression among unique susceptible and unique resistant DETs.
Clustering of herbicide-responsive DETs
To identify DETs likely clustered in the waterhemp genome, we used reciprocal BLASTN (E < 10− 20, [27]) to predict transcript location relative to the grain amaranth genome (Phytozome v12.1, Amaranthus hypochondriacus v2.1, [33]) and then tested for overlap using a sliding window of 100,000 bp centered on transcription start sites. We identified 302 unique DETs that fell into 67 clusters (Fig. 7, Additional file 6). Sixty clusters had multiple genotypes, 7 clusters were unique to the resistant genotype, and no cluster was only associated with the susceptible genotype. DET clusters were identified on all grain amaranth chromosomes except for chromosome 16. The most clusters were found on grain amaranth chromosome 6 and the most unique DETs were found on grain amaranth chromosome 12. The biggest cluster of DETs consisted of 8 transcripts. Finding evidence of DET clustering relative to the grain amaranth genome suggests coordinate regulation of gene expression in response to herbicide treatment within clusters.
Identification of candidate single nucleotide polymorphisms
To identify candidate SNPs that could be used for mapping herbicide resistance and other traits in the future, we called variants between the resistant and susceptible genotypes relative to the v4 transcriptome. We identified 189 high-quality candidate SNPs (Fig. 8, Additional file 7). Accounting for multiple SNPs located on the same transcript we identified 137 transcripts that contained at least one SNP. To determine if SNP-containing transcripts were associated with specific functions, we queried the biological process GO terms associated with these transcripts. Identified GO terms included: ‘nuclear-transcribed mRNA catabolic process’ (GO:0000956), ‘DNA-templated regulation of transcription’ (GO:0006355), ‘protein glycosylation’ (GO:0006486), ‘response to xenobiotic stimulus’ (GO:0009410), ‘DNA-templated positive regulation of transcription’ (GO:0045893), ‘response to ethylene’ (GO:0009723), ‘response to abscisic acid’ (GO:0009737), ‘response to gibberellin’ (GO:0009739), ‘response to salicylic acid’ (GO:0009751), ‘response to jasmonic acid’ (GO:0009753), and ‘response to cadmium ion’ (GO:0046686). Collectively, these data suggest that SNP-containing transcripts can be associated with defense and stress responses.
To predict the location of SNP-containing transcripts we used reciprocal BLASTN (E < 10− 20, [27]) against transcripts from grain amaranth (Phytozome v12.1, Amaranthus hypochondriacus v2.1, [33]) to identify orthologous genes. SNPs for which a putative ortholog could not be confirmed were removed from the mapping analysis. We found candidate SNPs distributed across the grain amaranth genome (Fig. 8, Additional file 7). A high concentration of SNPs (64 of the 118 mapped) was located on chromosome 15. These SNPs correspond to 43 unique transcripts. Several of these transcript annotations were associated with defense responses. The SNP concentration and transcript annotations suggest this chromosome could be important for determining herbicide responses in resistant and susceptible populations.