Transcriptome analysis of embryo maturation in maize
© Teoh et al.; licensee BioMed Central Ltd. 2013
Received: 23 August 2012
Accepted: 28 January 2013
Published: 4 February 2013
Skip to main content
© Teoh et al.; licensee BioMed Central Ltd. 2013
Received: 23 August 2012
Accepted: 28 January 2013
Published: 4 February 2013
Maize is one of the most important crops in the world. With the exponentially increasing population and the need for ever increased food and feed production, an increased yield of maize grain (as well as rice, wheat and other grains) will be critical. Maize grain development is understood from the perspective of morphology, hormone responses, and storage reserve accumulation. This includes various studies on gene expression during embryo development and maturation but a global study of gene expression of the embryo has not been possible until recently. Transcriptome analysis is a powerful new tool that can be used to understand the genetic basis of embryo maturation.
We undertook a transcriptomic analysis of normal maturing embryos at 15, 21 and 27 days after pollination (DAP), of one elite maize germplasm line that was utilized in crosses to transgenic plants. More than 19,000 genes were analyzed by this method and the challenge was to select subsets of genes that are vitally important to embryo development and maturation for the initial analysis. We describe the changes in expression for genes relating to primary metabolic pathways, DNA synthesis, late embryogenesis proteins and embryo storage proteins, shown through transcriptome analysis and confirmed levels of transcription for some genes in the transcriptome using qRT-PCR.
Numerous genes involved in embryo maturation have been identified, many of which show changes in expression level during the progression from 15 to 27 DAP. An expected array of genes involved in primary metabolism was identified. Moreover, more than 30% of transcripts represented un-annotated genes, leaving many functions to be discovered. Of particular interest are the storage protein genes, globulin-1, globulin-2 and an unidentified cupin family gene. When expressing foreign proteins in maize, the globulin-1 promoter is most often used, but this cupin family gene has much higher expression and may be a better candidate for foreign gene expression in maize embryos. Results such as these allow identification of candidate genes and promoters that may not otherwise be available for use. mRNA seq data archived in NCBI SRA; Accession number: ACC=SRA060791 subid=108584.
Our laboratory is using the maize embryo to express foreign proteins for industrial applications [1–3]. For example, genes for an endocellulase, E1, and an exocellulase, CBH I, have been transformed into maize, plants recovered, and seed collected. The original maize tissue culture germplasm, Hi-II, is transformation competent but not agronomically productive . Thus, high-expressing transformants must be bred into elite germplasm for improved field performance to optimize the production system for commercialization. In each case, when original transformants are bred into elite germplasm, higher accumulation of the target protein can be achieved by selection [3, 5, 6]. While this has been observed empirically many times in corn, the mechanism of this phenomenon is not known. In order to understand the genetic basis of this mechanism, an understanding of the genes involved in normal embryo development is critical. The embryo maturation stage is critical for our studies because the foreign genes of interest, i.e. the cellulases, are expressed from the globulin-1 promoter—an embryo seed storage protein promoter that is active during the mid-maturation phase of embryo development .
The process of seed development in maize is understood from the perspective of morphology, storage protein accumulation and hormone responses [8–12]. For example, Kiesselbach  published one of the earliest treatises on the development of maize, including the gametes, embryo and seed. His was primarily a visual study using the light microscope with limited sub-cellular detail though elegantly detailed on the developmental timeline. In addition, Kriz  added detail on reserves by showing that the globulins (1 and 2) are the most abundant storage proteins in the embryo. These proteins are formed during maturation and degraded during germination, providing carbon and nitrogen sources for the growing seedling. In the endosperm, zeins are the major storage reserves and their accumulation is intricately staged . These reserves are also degraded during germination to feed the growing embryo. McCarty  reviewed the viviparous mutants of maize and their phenotypic responses to abscisic acid (ABA) and gibberellins (GAs) during embryo development and maturation. ABA is best known for its control of plant responses to stress, including drought . Similarly, dessication in seeds is controlled by ABA . Collectively these studies describe a framework that defines seed development and maturation. However, to fully understand these processes, we must take advantage of the new technologies that are available, such as transcriptome sequencing.
Specifically for the embryo, Kiesselbach  showed that at 13–15 days after pollination (DAP) the embryo has only the rudimentary structures of a scutellum (cotyledon) and meristem. By 21 DAP, the embryo structure is apparent with the shoot and root tips discernible. By 25 DAP, the embryonic structures are fully formed with shoot tip and radical covered by the coleoptile and the coleorhiza, respectively . According to Vernoud et al. , the embryo starts maturation at 15 DAP and storage reserves are accumulating by 21 DAP. We used these early studies to design transcriptome sequencing experiments of maturing embryos.
Previous gene expression studies describe the regulation of seed development and maturation, including patterns for cell division, DNA replication, induction of morphological changes, storage protein accumulation and desiccation for dormancy [14–17]. Vernoud et al.  provide an overview of gene expression during maize embryogenesis, including descriptions of mutants and cloned genes. Microarray profiling of gene expression changes during embryo development was performed by Lee et al. . Their custom array included 900 genes from EST libraries that were predicted to be involved in metabolism and embryogenesis. They found groups of genes that were expressed at different stages during seed development, moving from cell division activities at early stages to storage reserve synthesis and dessication. Genome-wide microRNA genes are also being surveyed to eventually determine their functional role in regulation of seed growth and development .
We undertook a transcriptomic analysis of normal maturing embryos of one elite germplasm line that is utilized for the crosses to transgenic events originally made in Hi II germplasm. In this study, we analyzed the transcriptome of embryos from 15, 21 and 27 DAP to determine the genes that are expressed at these time points. More than 19,000 genes were analyzed by this method and the challenge was to choose subsets of genes that are vitally important to embryo development and maturation for the initial analysis. We describe the changes in expression of genes relating to primary metabolic pathways, DNA synthesis, late embryogenesis proteins, and embryo storage proteins. We describe changes shown through transcriptome analysis and confirmed expression levels of a subset of genes in the transcriptome through qRT-PCR.
Summary of trimming and alignment
Mapped reads (%, mapped/trimmed)
Uniquely mapped reads (%, unique/trimmed)
Uniquely mapped reads in genes (%, in genes/unique)
Number of genes expressed at each stage of embryo maturation
Total number of genes expressed
Differential expression of genes between pairwise comparisons of embryo stages
No. of genes tested1
Up-regulated (≥ 1)
Down-regulated (≤ 1)
S15 vs S21
S21 vs S27
S27 vs S15
Embryo maturation follows embryo pattern formation and differentiation, beginning at 15 DAP and lasting for about 30 days . Embryo maturation is marked by growth, active accumulation of reserve substances and some developmental events. Growth is characterized by a period of active precursor biosynthesis, DNA synthesis and cell division [19, 20], and reserve accumulation is characterized by the biosynthesis and deposition of storage proteins, fatty acids and starch [21–23].
Three prominent genes that encode storage proteins in the embryo, grmzm2g067919 (glb1), grmzm2g026703 (glb2) and grmzm2g078441, a gene in the cupin family of unknown function, are highly expressed (Figure 4). The transcripts of these three genes taken together constitute about 99% of the transcripts of all storage protein genes. The globulin proteins are recognized as the major storage proteins in the mature embryo . However, the transcriptome data suggest that their expression levels are not as high as grmzm2g078441. The transcript level of grmzm2g078441 even at its lowest (15 DAP) is higher than the peak levels of glb1 and glb2 (Figure 4A). Because of our interest in embryo expression of foreign genes, we performed quantitative PCR to confirm the high levels of these critical genes from the transcriptome. As shown in Figure 4B, expression level of the unknown cupin gene is 8-fold higher than glb-1 at 15 DAP, approximately equal at 21 DAP and 65% higher at 27 DAP. Although the qPCR results do not exactly mirror the transcriptome results, this unknown cupin gene holds interest for further analysis.
The oleosins represent the lipid storage proteins that are part of the structural unit of the single layer membrane surrounding lipid bodies in seed. Seven genes are assigned to the oleosins and three are highly represented in the maturing embryo transcriptome (Figure 12). The most highly represented oleosin gene is grmzm2g337229 (oleosin 1), which at 21 DAP represents 75% of all the oleosin transcripts in the transcriptome.
Among the eight LEA genes in our transcriptome data, two stood out very prominently. These two LEA transcripts, grmzm2g162659_t01 and ac233879.1_fgt002, are homologs of Arabidopsis ATEM1 and ATEM6, respectively. These two LEA transcripts increased dramatically (30- to 50-fold) from 15 DAP to 21 DAP. The LEA transcripts have been shown to be inducible by abscisic acid (ABA) but do not require vp1[30, 31]. The transcripts of the other LEA genes do not accumulate to a significant level (Figure 13).
Numbers of SNPs discovered per chromosome and per genotype via comparison to the reference genome
Transcriptomics is a powerful tool to analyze gene expression within any living system. The data generated in a single experiment will answer many questions about the system of interest. These data will also generate a host of new questions that can be explored in more detail using these same techniques on related samples. The drawback to this generation of megadata sets is that the analysis of the data requires large computing capability, multiple software packages and decisions based on value judgments about how to parse the data into package sizes that make sense and can be interpreted biologically.
The Mapman Pathway and Pageman programs have allowed us to observe changes in gene expression patterns that correlate with the maturing phases of the maize embryo. At 15 DAP, the embryo is at the transition between late development and early maturation, therefore many of the genes associated with the early events of embryo development are beginning to wane in their expression levels. For examples, genes encoding precursor biosynthesis (lipids, amino acids), DNA and RNA synthesis, as well as cell division and organization functions are more highly represented at 15 DAP than at later stages (Figure 5). At 21 and 27 DAP which are firmly within the embryo maturation phase, gene expression for activities such as growth and build-up of reserves rose, while genes from the earlier phase dropped. These results are similar to those observed by Lee et al.  using microarray techniques. The advantage of transcriptomics over microarrays is the breadth of the discovery potential with RNA sequencing, in contrast to microarrays which are based on a limited number of gene sequences.
Almost 30% of the genes in our transcriptome do not have functions assigned to them. Many of these genes are unique to the specific stage of the embryo while others showed big differences in the level of expression in a pairwise comparison. Some of these uncategorized genes such as grmzm2g409101 and grmzm2g075042 are expressed at very high levels and are coordinately expressed with globulin-1 genes (data not shown). Further functional analysis will provide deeper insight into the roles these genes play in the maturation of the embryo or in the accumulation of storage proteins, roles that will be identified through mutations and network prediction and manipulation. Our transcriptome data also showed that the total number of expressed genes was highest at 21 DAP and decreased as the embryos increased in maturity. This is in agreement with data published by Lee et al.  and Luo et al.  who showed accumulation of individual mRNAs during maize kernel development were much lower after 25 DAP. The 21 DAP embryo also represents a transition stage from 15 DAP to 27 DAP and a large number of genes were shared between 21 DAP and the other two stages. Davidson et al.  found 22,493 genes expressed in 25 DAP B73 embryos, very near the number that we discovered in maturing embryos from SP114.
We focused on several sets of genes to show the utility of our data in understanding the gene expression changes at specific time points of embryo maturation. For example, the 2 histone genes that encode histones H1 and H4 showed very high expression throughout maturation, suggesting a very important role for these two genes during this critical time. In contrast, the LEA genes have been proposed to play a role in desiccation tolerance [35, 36] which probably accounts for their increase in expression level later in embryo maturation. We are also interested in the accumulation of storage proteins in the maize embryo because understanding the expression pattern of these genes may help us understand how these genes are regulated. One of our most interesting observations relates to genes in the cupin family that encode storage proteins within the embryo. In our bio-factory experiments, we use the globulin-1 promoter  to drive expression of foreign genes for industrial enzyme production [3, 5]. The goal of the bio-factory production system is to increase foreign protein accumulation by as much as possible to lower the cost of production. Thus, the observation of a transcript in the cupin family, unknown gene grmzm2g078441 in Figure 4, that shows significantly higher expression than the globulin-1 gene (grmzm2g067919) at 15 and 27 DAP, suggests that its promoter would be a more effective promoter for foreign gene expression. Belanger and Kriz  found glb-1 and 2 to be the most abundant proteins in the embryo. The question is why the cupin transcript appears to be expressed at a higher level than the globulins if this is the case. One explanation could be that the transcript does not produce a protein, but is a type of pseudo-gene. Another explanation is that the studies were performed on different types of maize, W64A and Va26 in the 1991 study, and SP114 in this study. In addition, globulins are produced by a multigene family and possibly were not fully accounted for by the alignment, whereas the cupin gene transcript could be higher, though the protein not as abundant. Each of these possibilities could be addressed in further studies.
“The GO project has developed three structured controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species-independent manner” (http://www.geneontology.org/GO.doc.shtml). This useful tool allows cross-species comparisons of gene functions because of the uniformity of gene annotation language. It also reduces complexity of gene expression categories to 3 functional categories in order to find unique functions in a particular transcriptome. A GO term that stands out as highly over-represented is notable for its activity and describes the basic functions going on at that particular time and place. When GO analysis was applied to the embryo maturation transcriptome, a few terms were found that were unique or shared at each stage (Figure 14). These terms will drive some interesting analysis in the future.
Alexandrov et al., through large scale sequencing of maize cDNAs, showed the distribution of mRNA characteristics associated with their promoters, transcriptional start site predictors, and GC content, especially in the third position of the codon. It would be interesting to understand if the genes expressed in a particular tissue at a particular time differ in their specific structural characteristics based on these authors’ analysis. For example, groups of genes in one of the GC content modal groups may be preferred in one or another tissue or developmental event.
The results reported here provide a baseline for further studies on individual genes or groups of genes that will elucidate how a corn embryo matures and begins to shut down for dormancy. Results from those types of studies can be utilized to enhance genes that would improve yield for increased food or feed productivity. In this world of ever-increasing populations, such outcomes will be critical.
Numerous genes involved in embryo maturation have been identified, many of which show significant changes in expression level during the progression from 15 to 27 DAP. An expected array of genes involved in primary metabolism was identified. Of particular interest are the storage protein genes, globulin-1, globulin-2 and an unidentified cupin family gene. When expressing foreign proteins in maize, the globulin-1 promoter is most often used, but this cupin family gene has much higher expression and may be a better candidate for foreign gene expression in maize embryos. Results such as these allow identification of candidate genes and promoters that may not otherwise be available for use. The transcriptome data show patterns of expression of different genes involved in embryo development and storage protein accumulation. The transcriptome data will also serve as valuable resources for functional characterization of maize genes as more than 30% of transcripts represented un-annotated genes, leaving many functions to be discovered. As genes in Arabidopsis and other model systems are annotated, the identification of some of these novel genes will be accomplished.
The maize (Zea mays L) inbred SP114 (USP #6,252,148) was grown in the greenhouse at the Arkansas Biosciences Institute in Metro Mix 200 (SunGro Horticulture, Bellevue, WA) soilless medium and fertilized with Osmocote. Temperature and light cycles were set at 27°C to 31°C for 16-h light and 20°C to 24°C for 8-h dark. The embryos were isolated at 15, 21 and 27 days after pollination (DAP) under aseptic conditions, frozen in liquid nitrogen and stored at −80°C until used for RNA extraction.
The frozen embryos were ground into a fine powder in liquid nitrogen and homogenized in TRI Reagent solution (Ambion, Austin, TX). Total RNA at 15, 21 and 27 DAP was isolated following the RNA Isolation protocol from Invitrogen (Carlsbad, CA) and purified using Qiagen RNAeasy Mini Spin Columns (Qiagen, Valencia, CA). The concentration and purity of the total RNA were determined using an ND-1000 Spectrophotometer Nanodrop system (Thermo Scientific, Wilmington, DE) as well as RNA gel electrophoresis (Formaldehyde buffer system).
cDNA library construction and sequencing of the transcriptome were contracted to Tufts Core Facility at Tufts University School of Medicine, Boston, MA. The cDNA libraries were constructed following the procedures outlined in the manufacturer’s manual (Illumina, Inc, San Diego, CA). The sequencing of the transcriptome was done using the Illumina Genome Analyzer II /Solexa (Illumina, Inc, San Diego, CA). The cDNA library for the 21 DAP (S21) samples was run in a single Illumina flow cell lane while the cDNA libraries for 15 DAP (S15) and 27 DAP (S27) were each paired with a transgenic sample of the same age. The number of trimmed reads was 11.6 million for S15, 18.8 million for S21, and 15.4 million for S27. Single-end reads were obtained with ranges in length from 64 (S15 and S27) to 66 (S21) bases.
Raw reads were trimmed to remove low-quality nucleotides via a custom Data2Bio (Ames, Iowa) trimming script. GSNAP (Genomic Short-read Nucleotide Alignment Program, version 2010-07-37) , which allows for gapped alignments, including intron-spanning alignments, was used to map trimmed reads to the reference genome. Only reads with one unique best match in the reference genome and ≤ 2 mismatches every 36 bp, and ≤ 3 bp tails were used for subsequent analyses. The read depth of each gene was computed based on the coordinates of mapped reads and annotated locations of genes in the reference genome.
All reads were aligned to the reference genome for Zea mays, ZmB73AGPv1; Mitochondrian (AY506529.1) and Chloroplast (X8563.2). The alignment and initial analysis of the transcriptome data were done by Data2Bio (Ames, Iowa). Further analysis was conducted in-house using CLC Genomics Workbench (Cambridge, MA). Visualization of the mapping and pathways was carried out using publicly available software including Mapman (http://mapman.gabipd.org/)  and Integrated Genomic Viewer (IGV) (http://www.broadinstitute.org/software/igv/) .
Normalization was conducted using a method that corrects for biases introduced by RNA composition and differences in the total numbers of mapped reads in the two samples . Normalized read counts were used to calculate fold-changes (FC) and statistical significance. Fisher’s exact test was used to test the null hypothesis that expression of a given gene is not different between the two samples. Only genes having at least 50 mapped reads from the two samples combined were tested. Genes identified as candidates for differential expression were further filtered by correcting for multiple testing  and a false discovery rate of 0.1% (q-value). Statistically significant variation can be a consequence of either biological or technical variation in gene expression between the two samples.
The software goatools (https://github.com/tanghaibao/goatools) was used to perform the GO analysis. Over- and under-representation of certain GO terms were determined based on Fisher’s exact test. Two multiple correction controls (Bonferroni and permutation to control false discovery rate)  were implemented.
SNPs were called via comparisons to the reference genome, ZmB73AGPv1. Sequence variants identified by Genomic Short-read Nucleotide Alignment Program (GSNAP) were further filtered to identify SNPs using uniquely mapped reads. SNP sites were called if they have ≥3 reads supporting it, minimum SNP base quality value ≥ 15, and rare allele coverage among all the reads must exceed 0.8 which stringently controls false SNP discovery potentially derived from sequence errors and paralogs.
To verify RNA-seq results, quantitative real-time PCR was conducted using SYBR green (Bio-Rad) and CFX384 Real-Time PCR detection system (Bio-Rad, Hercules, CA). SYBR® green primers for qPCR were designed using AlleleID® 7 software (Premier Biosoft, Palo Alto CA). To ensure target specificity gene sequences were blasted against non-redundant database (GenBank, NCBI) to determine cross homology with other sequences. Gene sequences were analyzed for secondary structures to avoid designing primers in these regions. Primers were designed to obtain a product between 75 and 200 bp. Primers were synthesized by Integrated DNA Technologies, Inc., San Diego, CA. Two-step RT-qPCR was performed using SYBR Green detection chemistry. cDNA was synthesized from 1 μg of total RNA and oligo(dT) primers, using the iScript™ Select cDNA Synthesis kit (Bio-Rad), following the manufacturer’s procedure. Quantitative real time PCR was carried out in a total volume of 5 uL containing 0.5 uL of template and 4.5 uL of master mix. The following amplification program was used: denaturation at 95°C for 30 s, 40 cycles of amplification (95°C for 10 s, 60°C for 30 s) and a melting curve program (from 65°C to 95°C, with an increment of 0.5°C for 5 s). Three reference genes (Additional file 2: Table S1) were used to normalize expression and these values were then compared to reads per kilobase of exon per million mapped reads (RPKM) estimates. All PCR reactions were done in triplicate on 384-well full-skirt PCR plates (USA Scientific, Ocala, FL).
EEH has worked in foreign gene expression in maize embryos for 15 years and is interested in embryo development as it informs protein accumulation.
Days after pollination
Quantitative real time polymerase chain reaction
Genomic short-read nucleotide alignment program
Reads per kilobase of exon per million mapped reads
Tricaboxylic Acid Cycle
Lipid transfer proteins
Late embryogenesis abundant
Globulin 2 gene
Single nucleotide polymorphism
Data2Bio, Pat Schnable, Wei Wu; The Plant Powered Production (P3) Center is funded through the RII: Arkansas ASSET Initiative (AR EPSCoR) I (EPS-0701890) and II (EPS-1003970) by the National Science Foundation and the Arkansas Science and Technology Authority. This research is partially supported by the National Institute of Health grants from the National Center for ResearchResources (5P20RR016460-11) and the National Institute of General Medical Sciences (8P20GM103429-11).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.