DNA barcoding of the Lemnaceae, a family of aquatic monocots
© Wang et al. 2010
Received: 7 June 2010
Accepted: 16 September 2010
Published: 16 September 2010
Skip to main content
© Wang et al. 2010
Received: 7 June 2010
Accepted: 16 September 2010
Published: 16 September 2010
Members of the aquatic monocot family Lemnaceae (commonly called duckweeds) represent the smallest and fastest growing flowering plants. Their highly reduced morphology and infrequent flowering result in a dearth of characters for distinguishing between the nearly 38 species that exhibit these tiny, closely-related and often morphologically similar features within the same family of plants.
We developed a simple and rapid DNA-based molecular identification system for the Lemnaceae based on sequence polymorphisms. We compared the barcoding potential of the seven plastid-markers proposed by the CBOL (Consortium for the Barcode of Life) plant-working group to discriminate species within the land plants in 97 accessions representing 31 species from the family of Lemnaceae. A Lemnaceae-specific set of PCR and sequencing primers were designed for four plastid coding genes (rpoB, rpoC1, rbcL and matK) and three noncoding spacers (atpF-atpH, psbK-psbI and trnH-psbA) based on the Lemna minor chloroplast genome sequence. We assessed the ease of amplification and sequencing for these markers, examined the extent of the barcoding gap between intra- and inter-specific variation by pairwise distances, evaluated successful identifications based on direct sequence comparison of the "best close match" and the construction of a phylogenetic tree.
Based on its reliable amplification, straightforward sequence alignment, and rates of DNA variation between species and within species, we propose that the atpF-atpH noncoding spacer could serve as a universal DNA barcoding marker for species-level identification of duckweeds.
The cost of DNA purification and sequencing has dropped considerably in recent years so that identification of individual species by DNA barcoding has become an independent, subtler method than solely morphological-based classification to distinguish closely related species, which also defines the systematic relationships by analysis of genetic distance. The key element for a robust barcode is a suitable threshold between inter- and intra-specific genetic distances. Sequence variation between species has to be high enough to tell them apart while the distances within species must be low enough for them to cluster together . The mitochondrial coxidase subunit I (COI) gene has proven to be a reliable, cost-effective, and easily recovered barcode marker to successfully identify animal species [2–4], but its application in the plant kingdom is impeded by a slow nucleotide substitution rate, which is insufficient for the diagnosis of individual species [5, 6]. However, the Consortium for the Barcode of Life (CBOL) plant-working group recently proposed seven leading candidate sequences for use as barcoding markers . Four plastid coding genes (rpoB, rpoC1, rbcL and matK) and three noncoding spacers (atpF-atpH, psbK-psbI and trnH-psbA) have been selected based on previous investigations among different plant families [8–10]. However, the utility of each of these sequences for individual families of species within the plant kingdom is hardly predictable [11, 12].
Although there have been attempts to use the single-locus of matK , a combination of two loci, rbcL and trnH-psbA , and even multi-loci combinations  as barcoding sequences, the use of a unified barcode for the identification of all the land plants would be difficult due to conflicting needs of different researchers. For example, an optimal barcode marker that has been determined empirically to distinguish plants at the family level may prove less useful for making accurate species level identifications. Most of the proposed plant barcode markers were designed primarily for identifying distantly related organisms in biodiversity hotspots such as Panama  and Kruger National Park in South Africa . So far, little attention and only a few studies have been devoted to developing unified barcodes suitable for making identifications within a family, within a genus, or between closely related sister species. A test of seven other candidate barcoding sequences in the family of Myristicaceae was applied to eight species within a genus and yielded two suitable barcodes . Recently, it has been shown that all three markers (rbcL, trnH-psbA and matK) can discriminate 4 sister species of Acacia across three continents . The marker matK has been reported to distinguish 5 Dendrobium species . More complex approaches have been developed at the subfamily level identification of larger groups of related plants . Although an extensive barcode study for 31
Carex species suggested that a single locus or even multiple loci cannot provide a resolution of greater than 60%, it did not include some of the new markers (atpF-atpH and psbK-psbI) . When atpF-atpH and psbK-psbI were included for distinguishing Carex and Kobresia, it could be shown that matK identifies 95% as single-locus or 100% of the species when combined with another marker. However, this study used material from a well defined regional perspective, the Canadian Arctic Archipelago, where the number of co-existing closely related species is limited . Our objective was to determine whether one or more of the markers proposed by the CBOL plant-working group would serve as an optimal marker for species-level identification within the family Lemnaceae.
The members of the family Lemnaceae, commonly called duckweeds, comprise 38 species in five genera . They are all aquatic plants that grow on or below the surface of the water all over the world and they include the smallest flowering plants . They are ideal material for physiological, biochemical, and genomic studies because of their direct contact with medium, rapid growth and relatively small genome sizes . They are valuable means for biomanufacturing through genetic engineering technology and due to the recent progress towards duckweed-based commercial products . They can be easily maintained by vegetative reproduction in aseptic cultivation for decades . The small size of the plant is ideal for maintaining diverse accessions and therefore for evolutionary studies at the DNA level. Some species, such as Lemna minor, are used by the Environmental Protection Agency for measuring water quality because their growth rates are sensitive to a wide range of environmental contaminants such as metals, nitrates, and phosphates . Indeed, wastewater treatment with duckweed has been proposed as a "green" way to remediate municipal water supplies . Rapid growth also offers practical applications of duckweeds as a biofuel crop. Some duckweeds form starch-rich over-wintering fronds called turions, which can be easily induced from vegetative fronds by treatment of cold shock, starvation, or with abscisic acid [26, 27]. Resulting from their size and density, both vegetative fronds and turions are much more easily harvested than microalgae , which make duckweeds an attractive feedstock for bioethanol production that does not compete for agriculturally productive land.
Given these potential uses, the 160-Mb Spirodela polyrhiza genome has been selected for whole genome sequencing by the DOE-JGI community-sequencing program (CSP). A reference genome within this family will be invaluable for gene discovery and evolutionary analysis of aquatic monocot species. Furthermore, from a systematic point of view, classification solely based on morphological characteristics has been a significant challenge. The most readily observed anatomical feature of the minute and highly reduced duckweeds are their fronds with or without roots. These few and somewhat variable morphological characters and rarely emerging flowers or fruits make identification of duckweeds extremely difficult even for professional taxonomists . Complementing traditional classification methods with a DNA-based method would be highly applicable for such a family of species. It would permit these species to be classified in a highly reproducible and cost effective manner because DNA-based methods are independent of morphology, integrity, and developmental stage of the organism and can distinguish among species that superficially look alike .
Here, we present a simple and accessible protocol to barcode duckweeds and establish a sequence database against which unknown species may be compared and tentative species identifications can be validated. This database also provides a high-resolution phylogenetic resource for this important plant monocot family.
Success ratios of PCR amplification and sequencing for seven candidate barcoding markers.
Max. length of product*
Min. length of product*
# tested Samples
% Success of PCR and sequencing
Measurement of inter- and intra-specific divergences for seven barcoding markers.
Aligned length (bp)*
Mean interspecific No. of substitution
Mean interspecific Kimura 2-parameter distances
0.1648 ± 0.0221
0.1133 ± 0.0120
0.0715 ± 0.0061
0.0633 ± 0.0068
0.0338 ± 0.0051
0.0303 ± 0.0050
0.0216 ± 0.0038
Mean interspecific Kimura 2-parameter distances
0.0072 ± 0.0015
0.0058 ± 0.0014
0.0019 ± 0.0003
0.0008 ± 0.0002
0.0069 ± 0.0008
0.0006 ± 0.0002
0.0004 ± 0.0002
Mean interspecific P-distances
0.1435 ± 0.0156
0.0986 ± 0.0095
0.0671 ± 0.0052
0.0601 ± 0.0059
0.0327 ± 0.0048
0.0295 ± 0.0048
0.0212 ± 0.0037
Mean interspecific P-distances
0.0066 ± 0.0012
0.0057 ± 0.0014
0.0019 ± 0.0003
0.0008 ± 0.0002
0.0062 ± 0.0007
0.0006 ± 0.0002
0.0004 ± 0.0002
Identification success based on "best close match" tools.
psbK-psbl + atp F-atpH
trnH-psbA + atp F-atpH
rpoCl + atp F-atp H
rbcL + atpF-atpH
Number of monophyletic species recovered with the best two phylogenetic methods for six markers
Although the location of most grouped ecotypes in the taxonomic trees did not change in regard to each marker, a close examination consistently revealed two interesting connections. First, despite the fact that very little is known about how cross pollination in these tiny flowering plants occurs, L. japonica has been suspected to originate from a hybridization event between L. minor and L. turionifera based on morphological characters . Our data indicates that sequence from each of the seven tested markers of L. japonica 7182 was always identical to and clustered with L. minor (Figure 3). Since the chloroplast is maternally inherited in many (but not all) plants, our data is consistent with L. japonica arising from a cross between L. minor and L. turionifera.
The second connection was S. polyrhiza 9203, which consistently clusters with S. intermedia rather than other S. polyrhiza in all seven tested markers (Figure 3). We examined 34 ecotypes of S. polyrhiza from the collection using the atpF-atpH marker and found four additional ecotypes that grouped closely with S. intermedia (Additional file 4). This suggested that these accessions might have been misidentified as S. polyrhiza due to the overlap in morphological characteristics between these species.
Here, we present data validating the most useful DNA barcoding markers for the family of Lemnaceae from among those proposed by the CBOL plant-working group. Such a fundamental, whole family-wide analysis lays the groundwork for phylogenetic and genomic studies. Our samples represent a worldwide collection from the same family with many sister species (Figure 1 and 3, Additional file 1). Specimens in previous taxonomic classifications using barcoding markers were mainly from distantly related groups from broadly different families that originated from the local or more defined regions, such as the National Park , the Amazon , and the Panama region . Because of the diversity of the collection that has accumulated over the years, duckweeds provide a unique system to test the proposed barcoding markers for closely related species. Furthermore, it is difficult to classify members of this family by morphology alone. Therefore, we can not only validate the universal application of barcoding markers, but also apply it to species that may be solely dependent on such an approach for conservation. The advantage of universal barcoding markers is the design of universal primers for barcoding markers from reference sequences, which in this case was L. minor . The primers worked very well for all the samples (31 species and 97 ecotypes) with PCR amplification and the sequencing success rates better than 95%, except in the case of matK, which yielded a rate as low as 71% (Table 1). In addition, a lower PCR annealing temperature than optimal for Lemna minor permits primers to anneal to the target sequences despite sequence polymorphism in related species. It is interesting that most PCR failure existed in the Wolffioideae subfamily (Additional file 1). The locus matK has been shown to be very variable in numerous phylogenetic studies [34, 35]} and other studies have also noted the difficulties of its utilization due to PCR failure and lack of truly universal primer sites [9, 10]. Further improvement of primer designs for matK for other targets could increase amplification success, but might fail because of less conserved sites near the most variable sequences of the locus. Although matK DNA sequences exhibited the highest interspecific variation among the four coding markers (Table 2), the low percentage of successful PCR amplification and sequencing in duckweeds would restrict its extensive use.
It was not surprising that the noncoding spacers showed dramatically higher sequence variability than the coding markers (Table 2). Given the slow evolutionary rate of rpoB, rpoC1 and rbcL (especially for rbcL, which is strongly recommended for barcoding across all land plants), they work well to distinguish distantly related species either alone or when combined with other more variable regions [6, 9]. However, their sequence polymorphisms might not be sufficient to distinguish closely related species. The non-coding spacers of psbK-psbI and trnH-psbA were the most polymorphic plastid sequences with variable sequence length in duckweeds (Table 1). The size of trnH-psbA in Spirodela (~504 bp) was 218 bp longer than in the other four genera (~286 bp). The length of the psbK-psbI sequence was the most variable, ranging from ~185 bp in S. polyrhiza to ~479 bp in S. intermedia even though they were sister-species (Table 1 and Figure 3). These significant length variations caused by deletion/insertion, simple sequence repeats and rearrangements were problematic for accurate alignment, but could potentially be adapted for simple diagnostic tests that would not require DNA sequencing. Furthermore, the high sequence polymorphisms of the aligned sequences of psbK-psbI and trnH-psbA could offer greater distinction between species in a diverse set of genera in certain families [5, 8]. Still, one has to use caution for intraspecies comparison where the relatively higher intraspecific distance compromised their power in barcoding duckweed species. One nearly has to cluster samples into two groups, one for ecotypes of the same species and one for species to species comparison (Table 2, 3, and 4, Figure 3). Failure to do so would prevent the detection of true differences between congeneric species and conspecific ecotypes and therefore impede the use of a universal duckweed barcode (Figure 2).
Although previous studies showed that atpF-atpH as a barcoding marker was inferior to psbK-psbI, trnH-psbA and matK based on distantly related species [5, 8, 9], our data suggested that it was the most promising barcoding marker for duckweeds with respect to high PCR amplification, ease of alignment, and sufficient sequence divergence (Table 1, 2, 3, 4 and Figure 2). Therefore, our data differed from the conclusions of evaluating barcoding markers made from unrelated species. Although it was shown that barcoding plants by more than one region tended to be more effective [11–13], combination of atpF-atpH with any of the other markers resulted in only slight increases or drops of the rate of successful identification of species compared to itself alone (Table 3), indicating that the discriminatory power of atpF-atpH has already reached an optimum. When the atpF-atpH marker was combined with other markers, the reduced resolution lowered the differential value without complementary benefits. A similar finding that a combination of matK and trnH-psbA did not improve species identification has been reported as well .
One of the most significant applications of DNA barcoding is to overcome taxonomic obstacles, where it is difficult to identify unknown or wrongly named species in a family with similar morphology (Figure 3). Furthermore, DNA barcoding could offer us a primary screen for further characterization of cryptic species. Although scientists within the duckweed community were trying to resolve the question of whether L. japonica (Lj) originated from hybridization of L. minor (Lm) and L. turionifera (Lt), preliminary attempts to cross Lm and Lt (50 crosses) to reproduce the hybridization event were not successful . The key problem is that flowering is very rare and the flower is small in size, which makes outcrossing extremely tedious . Here, the sequences from the seven tested chloroplast markers of L. japonica 7182 were always identical and clustered with L. minor (Figure 3). Therefore, we used the limited nuclear markers (glyceraldehyde-3-phosphate dehydrogenase, histone 3 gene, beta-1,2-xylosyltransferase isoform 1, expression control elements from the Lemnaceae family) to uncover the relationship among them by polymorphisms. Unexpectedly, the sequences showed great conservation and there was not sufficient variation to answer this question. However, the identical alleles in L. japonica 7182 and L. minor support the assumption that L. japonica might have come from the cross of L. minor and L. turionifera.
Generally speaking of members of the duckweed family, the more derived they are, the simpler their morphologies. The reduction in size and simplification in structure make the fronds more mobile and better successfully adapt to variable conditions . S. intermedia was characterized by a slight degree of primitivism of more nerves, roots, and ovules compared to S. polyrhiza, which suggested that S. intermedia was differentiated into S. polyrhiza potentially through gradual morphological reduction and isolation. However, gradual differences were sometimes difficult to distinguish from each other due to overlapping characteristics . Our studies for 34 ecotypes of S. polyrhiza using atpF-atpH markers showed five ecotypes that have been clustered with S. intermedia (Additional file 4), which is mainly restricted to South America . Good trace evidence comes from S. polyrhiza 9203 (Figure 3). Among five ecotypes, three are derived from South America, while another two are from India. Therefore, a refined classification is necessary to determine whether another four ecotypes except S. polyrhiza 9203 should be classified as S. intermedia rather than S. polyrhiza.
Both phylogenetic data  and our barcoding data showed that closely related species W. gladiata and W. oblonga, L. minuta and L. valdiviana could not be separated from each other (Figure 3). These sister-species share identical sequences for barcoding markers, which would require a search for additional barcoding markers with greater sequence polymorphism. In fact, a universal DNA barcoding marker has not been reported to distinguish more than 90% of species tested until now [8, 32]. Elucidation of recently evolved species sharing identical barcoding sequences still needs further taxonomic or case-by-case morphological, flavonoid, and allozyme analyses. On the other hand, use of next-generation sequencing technologies and corresponding software applications are emerging where low pass coverage of different specimen could provide the necessary resolution.
In this study we have demonstrated that atpF-atpH noncoding spacer could serve as a universal DNA barcoding marker for species-level identification of duckweeds. This marker will allow to identify unknown species or to exploit new species of duckweeds by reason of its reliable amplification, straightforward sequence alignment, and rates of DNA variation between species and within species. DNA barcoding developed in this study are a significant contribution to the taxonomical structure in duckweeds compared with insensitive morphological classification.
The Lemnaceae collection originated from the Institut für Integrative Biologie (Zürich, Switzerland), the BIOLEX company (North Carolina, USA), and the University of Toronto Culture Collection of Algae and Cyanobacteria (UTCC, Toronto, Canada) where it was maintained for many years. Detailed information about many of these accessions is included in Dr. Landolt's monographic study . In total, 97 ecotypes representing 31 species (81.6% of the known species) were sampled in this study. Since the intraspecific distance is very important for evaluating a suitable barcoding marker, 2 to 8 representatives per species are included for 19 species, whereas another 12 species are represented by a single ecotype. Moreover, the selected ecotypes represent a worldwide geographical distribution (Figure 1). A summary of all specimens included in this study was listed in Additional file 1.
List of primers for the seven proposed DNA barcoding markers.
Amplicon size (Lemna minor)
Ta Optimum (Lemna minor)
Reverse: 5'- AAAGTTTGAGAGTAAGCAT -3'
Reverse: 5'- CGCGCGTGGTGGATTCACAATCC-3'
Reverse: 5'- ATCCGGTCCATCTAGAAATATTGGTTC -3'
Reverse: 5'- GCTTTTATGGAAGCTTTAACAAT -3'
Reverse: 5'- TCGGATGTGAAAAGAAGTATA -3'
Reverse: 5'- CAATTAGCATATCTTGAGTTGG -3'
Reverse: 5'-ATGTCACCACAAACAGAGACTAAAGC -3'
Genetic distance was calculated using pairwise alignments of sequences between and within species (Table 2). The average intraspecific distance was calculated with the mean pairwise distance in each species with more than one representative, which eliminated biases due to unbalanced sampling among taxa. We evaluated conspecific and congeneric variability for each pair of marker sequences by Wilcoxon signed rank tests (Additional file 2 and 3) . Median and Mann-Whitney U tests were executed to examine the extent of DNA barcoding gap/overlap between intra- and inter-specific divergences .
For assessing success in species assignment or identification among our data set, we adopted the "best match" function in the program TAXONDNA (Table 3) . We calculated pairwise distances as uncorrected pairwise distances and compared two sequences over at least 300 bp except for psbK-psbI (230 bp). We suppressed indels when computing distances. The threshold was set at a value below which 95% of all intraspecific pairwise distances were found. Since the best match was based on direct sequence comparison with other conspecific ecotypes, the analysis only counted species with multiple ecotypes per species.
The other criterion used to measure success of species identification was based on generating a phylogenetic tree. We built trees with MEGA 4.1 by using the best algorithms methods of UPGMA and MP compared with other tree building techniques for DNA barcoding . UPGMA trees were made from K2P distances. The MP trees were constructed using the close neighbor interchange (CNI) method with search level 1. The initial tree for the CNI search was created by random addition for 10 replications. Each tree contains the bootstrap values as calculated by the software from 500 replicates. Here, we only calculated the number of successfully clustered species as monophyly among the species with multiple conspecific individuals (Figure 3, Additional file 4, Table 4).
We thank Elias Landolt from Institut für Integrative Biologie (Zürich, Switzerland), Lynn Dickey, Nirmala Rajbhandari, and Peaches Staton from BIOLEX (North Carolina, USA), and Judy Acreman (UTCC, Toronto, Canada) for their generous provision of duckweed ecotypes. The research described in this manuscript was supported by the Selman A. Waksman Chair in Molecular Genetics.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.