Interactions between genetics and environment shape Camelina seed oil composition

Background Camelina sativa (gold-of-pleasure) is a traditional European oilseed crop and emerging biofuel source with high levels of desirable fatty acids. A twentieth century germplasm bottleneck depleted genetic diversity in the crop, leading to recent interest in using wild relatives for crop improvement. However, little is known about seed oil content and genetic diversity in wild Camelina species. Results We used gas chromatography, environmental niche assessment, and genotyping-by-sequencing to assess seed fatty acid composition, environmental distributions, and population structure in C. sativa and four congeners, with a primary focus on the crop’s wild progenitor, C. microcarpa. Fatty acid composition differed significantly between Camelina species, which occur in largely non-overlapping environments. The crop progenitor comprises three genetic subpopulations with discrete fatty acid compositions. Environment, subpopulation, and population-by-environment interactions were all important predictors for seed oil in these wild populations. A complementary growth chamber experiment using C. sativa confirmed that growing conditions can dramatically affect both oil quantity and fatty acid composition in Camelina. Conclusions Genetics, environmental conditions, and genotype-by-environment interactions all contribute to fatty acid variation in Camelina species. These insights suggest careful breeding may overcome the unfavorable FA compositions in oilseed crops that are predicted with warming climates.

diploid progenitor species, C. neglecta J. Brock et al. and C. hispida Boiss [15,16]. Genomic and cytological evidence indicate that this allopolyploidization event occurred prior to C. sativa's domestication from its wild progenitor, the hexaploid species C. microcarpa Andrz. ex DC [16,17]. With similar genome sizes and welldocumented interfertility [18,19], crosses between C. microcarpa and C. sativa could increase genetic diversity in the crop and introduce traits for agronomic improvement. Camelina microcarpa has been estimated to harbor roughly twice the genetic diversity of C. sativa [17], which further suggests that this wild species could be valuable for breeding programs. However, little is known about C. microcarpa and its potential for agricultural improvement, especially regarding seed oil composition. In addition to C. microcarpa, other close relatives of C. sativa include the tetraploid species C. rumelica Velen., and the diploid species, C. hispida, C. laxa C. A. Mey, and C. neglecta. The genus comprises~7-8 species in total [15,20]. Several additional species-rank entities were recognized in the past, often based on minor morphological differences (see historical overviews in: [21,22]; see also [17], and references therein), and some authors continue to recognize numerous narrowly-defined species (e.g., [23]).
Fatty acids are a primary seed energy source in > 80% of all flowering plant species [24]. Studies in model systems have established that both genetic and environmental factors play a role in determining their composition and total content within the seed. In maize, a genome-wide association study (GWAS) has documented that variation in kernel oil content and FA composition are controlled in part by enzymes involved in oil biosynthesis [25]. Similarly, in soybean, domestication-related genomic signatures of selection for increased oil content overlap oil content QTLs and genomic regions containing FA biosynthesis genes [26]. In Arabidopsis, a GWAS analysis identified the fatty acid desaturase gene FAD2 as contributing to natural variation in seed FA composition [27]. Evidence for environmental influences on FA synthesis have been documented in controlled growth experiments using Arabidopsis thaliana (L.) Heynh. and several oilseed crop species, which have demonstrated temperaturedependent plastic responses in seed oil production [28][29][30]. Consistent with these findings, field trials of C. sativa genotypes cultivated across multiple years have revealed environmental effects in seed FA composition and oil content [31].
There is also evidence from wild species that variation in seed FA composition may play a role in local climatic adaptation. In general, higher latitudes and cooler climates are associated with decreased FA saturation in seeds; this has been documented in Salvia, Helianthus and Arabidopsis [27,30,32,33]. Unsaturated FAs have lower melting points than saturated FAs, and while less energy-dense, are potentially more easily metabolized during germination in colder climates than saturated FAs. Climate-associated FA variation has thus been proposed to reflect an adaptive tradeoff between saturated FAs (high-energy, but less easily metabolized in colder climates) and the lower melting-point unsaturated FAs (lower-energy, but better suited to colder germination conditions) [33]. Within species, variation among populations in seed FA content may potentially reflect adaptive phenotypic plasticity across heterogeneous environments, and/or genetic factors that underlie local climatic adaptation. For the particular case of Camelina, the extent to which wild populations show climateassociated FA variation has not been examined, nor is it known whether such variation, if present, is attributable to genetic or environmental factors.
The present study was conducted with the goal of assessing environmental and genetic contributors to seed FA composition and content in Camelina species. Using wild population sampling and a combination of phenotypic and genetic assessments, we addressed the following questions: 1) Does seed FA composition differ among Camelina species, and to what extent is this variation associated with environmental differences in regions where they occur? 2) For the geographically widespread crop progenitor species, C. microcarpa, are latitude, elevation, local climate, and/or genetic substructure important predictors of seed FA composition? 3) For its domesticated derivative, to what extent can the environment alone elicit plasticity in seed FA composition? To address these questions, we analyzed the FA composition of mature seeds from wild-collected Camelina species, examined population structure of C. microcarpa, and conducted a growth chamber experiment with C. sativa to determine the degree of phenotypic plasticity in seed FA composition and total oil content.

Seed oil composition differs among wild Camelina species
Total oil content varied widely among Camelina seed samples (19.01-41.91%) as inferred by FAME analysis. Average seed oil content was highest in the domesticated species, C. sativa (37.41% ± 3.69) and lowest in C. laxa (31.63% ± 3.64); however, after correcting for multiple comparisons the only significant differences were between C. sativa and C. microcarpa (LMM, p = 0.007) and between C. sativa and C. hispida (LMM, p = 0.042) ( Table 1). For FA composition, several FAs were found to vary widely among species (Fig. 1), such as eicosenoic acid (20:1), which was higher in C. rumelica relative to all other species. Erucic acid (22:1) showed the greatest relative differences among all species, with C. microcarpa having the highest levels at 2.66% ± 0.51 and C. laxa having the lowest levels at 0.74% ± 0.03 of seed oil.
We used random forest analyses to assess whether seed FA composition as a whole could be used to distinguish between species and to identify the most important FAs for differentiating them. Notably, our best random forest model was able to predict 90.8% of the species labels based on FA composition alone with a kappa of 0.825, suggesting strong predictive ability [34]. The two most informative FAs in the best model were erucic acid (22:1, mean decrease in accuracy = 119.06) and eicosenoic acid (20:1, mean decrease in accuracy = 115.45) (Supplemental Figure S1a). Although the random forest design was unbalanced due to an excess of C. microcarpa observations, high accuracy was nonetheless achieved for species with fewer observations; the one exception was the crop species C. sativa, which was not consistently distinguished from its wild progenitor, Camelina hispida (blue) n = 6, C. laxa (yellow) n = 3, C. microcarpa (red) n = 57, C2019. rumelica (green) n = 17, C. sativa (purple) n = 6.Values for S/U are represented as the proportion of saturated to unsaturated fatty acids. Total seed oil is represented as the percent oil relative to seed weight C. microcarpa (Supplemental Figure S2). These results indicate that while FA compositions superficially appear similar across Camelina species (Fig. 1), they are nonetheless readily distinguishable between species using random forest models.

Camelina species occur in distinct environments
Camelina accessions used in this study originate from a broad geographical context, including the Caucasus (eastern Turkey, Georgia, and Armenia), Ukraine, and the eastern Rocky Mountain range of the U.S. (where Camelina species occur as introduced weeds). Environmental niche analyses revealed significant differences in the environments where wild Camelina species were found (F = 6.387, p < 0.001). Further analysis revealed that most pairwise species comparisons were also significantly different ( Table 2; see also Supplemental Figure  S3). At the intraspecific level, we found significant differences in environments between geographically distinct regions of C. microcarpa (F = 20.144, p < 0.001). This finding suggests the possibility of unique environmental niches for the geographically disparate populations of this species (see also Supplemental Figure S3). Together, these results suggest Camelina species largely occupy different climatic niches from each other, and that for the single species with extensive population sampling, geographical regions of that species' range may differ environmentally as well.
Population structure of C. microcarpa Cross validation scores obtained from ADMIXTURE were the lowest for K = 2 (CV error = 0.2548) and K = 3 (CV error = 0.2553), indicating that these are the two most optimal K values. At K = 2, accessions in the native range fell into two distinct subpopulations, corresponding largely to the Caucasus (eastern Turkey, Georgia, Armenia) and Ukraine; most introduced U.S. accessions fell into the Ukrainian subgroup, although several were in the Caucasus subgroup (Supplemental Figure S4). At K = 3, the Ukrainian accessions were further split into two subgroups corresponding largely to northern and southern parts of the country, with U.S. collections falling mostly in the northern Ukrainian subgroup (Supplemental Table  S1). Statistical models at K = 3 provided lower AICc's relative to K = 2; thus, K = 3 provided a stronger model fit and was chosen for subsequent analyses (Fig. 2). The Caucasus genetic subgroup showed high genetic differentiation from both the northern and southern Ukraine subgroups (F ST = 0.303 and 0.314, respectively), whereas the two Ukrainian subgroups exhibited much less differentiation from each other (F ST = 0.042).
Results from principal component analysis (PCA) of the genetic data were highly congruent with ADMIX-TURE results (Supplemental Figure S5). Distinct clusters are evident for the Caucasus and the two Ukrainian populations, and U.S. accessions were clustered with the northern Ukrainian and Caucasus accessions. The first principal component (PC1) accounted for 65.2% of the total variation and separated the Caucasus subpopulation from the two Ukrainian groups. The second principal component (PC2) accounted for only 3.5% of the total variation and separated the northern and southern Ukrainian genotypes. These patterns of cluster separation are consistent with pairwise F ST measures in the ADMIXTURE analysis.
Population-by-environment interactions shape C. microcarpa oil traits Fatty acid composition of the three C. microcarpa genetic subpopulations was broadly similar. Nonetheless, the northern Ukraine population showed a distinct FA profile compared to the others (Supplemental Figure  S6), and random forest analysis was able to categorize these three populations based solely on FA composition with 72.6% accuracy (kappa = 0.532), providing some support for unique overall FA composition between these three groups (Supplemental Figure S1b). The distinguishable FA composition of C. microcarpa populations potentially suggests a genetic component to observed FA differences between populations.
We sought to determine whether population structure and environmental conditions interact to influence FA composition in C. microcarpa. To account for collinearity between environmental measures, a PCA was generated using all 19 BioClim variables for the local climate of each accession (Supplemental Figure S7). We used PC1 and PC2, which together accounted for 73% of the variation in environment, as variables in our models. Larger values of PC1 were associated with increased annual/diurnal range in temperature, maximum temperature of the warmest month, temperature seasonality, and isothermality, whereas lower values for PC1 were indicative of higher precipitation. On the other hand, values of PC2 were almost entirely driven by various temperature measurements such as annual mean temperature (Supplemental Figure S7).
Linear mixed modeling (LMM) uncovered interactions between population identity and these climate PCs as  Colored predictors shown had confidence intervals in which the lower bound (7.5%) and upper bound (92.5%) did not overlap zero in the linear mixed effect models, and their 85% confidence intervals did not overlap zero when robust regression was performed. The 85% confidence intervals were consistent with the model selection method (see Methods) important predictors for FA measures. Figure 3 displays the important predictors and the size of their effects on the response variable. Moreover, robust regression showed that overall patterns were not influenced by outliers (Supplemental Table S3), and calculation of variance inflation factors showed that models did not exhibit multicollinearity among predictors. Interactions between PC1 and population identity were important predictors of mono-unsaturated fatty acids (MUFAs), poly-unsaturated fatty acids (PUFAs), and total oil, whereas interactions between PC2 and population identity were important predictors for MUFAs and PUFAs. Population identity and climate PCs individually were also important predictors for many traits independent of their interaction effects. Saturated fatty acids (SFAs) were the only group of FAs that did not include a population-by-environment interaction; however, climate (PC1) did affect SFAs in our model. Thus, larger PC1 values (associated with higher maximum monthly temperature and seasonality measures), resulted in increased SFAs; this provides some support for the hypothesis that plants in warmer climates have increased seed SFA content which may enhance germination efficiency in warm climates [33]. Across all FA measures, genetic population was found to be an important predictor six times, environment two times, and genetic population-by-environment interactions six times. These results indicate that genetics, environment, and their interactions all have an important effect on FA accumulation in C. microcarpa seeds (Fig. 3). In contrast, latitude and elevation were uninformative. For total oil content, both linear mixed models and robust regression analyses indicated that SFAs and MUFAs each had a negative relationship with total oil, whereas PUFAs were positively related to the amount of total oil (Fig. 3, Supplemental Table S3). Seed circularity, used as a proxy for plant health and abiotic stress, was only informative in the SFA model, indicating that less circular seeds had higher SFAs. As with oil composition LMMs, latitude and elevation were uninformative variables that did not improve model fit for total oil content.

Temperature elicits plasticity and GxE interactions for seed oil traits
Using the crop species, C. sativa, as an experimental model, we uncovered a highly plastic response for seed oil development between the cold (12°C) and warm (30°C) growth chamber treatments. FA composition varied greatly for each accession between treatments (Supplemental Figure S8). Mixed models showed that PUFAs and total oil decreased in the warm treatment while SFAs increased (Fig. 4, Supplemental Table S2, p < 0.000001), while MUFAs had a marginally significant increase in the warm temperature treatment (p = 0.075).
The winter genotype PI 650155 displayed the lowest degree of environmental plasticity, with a 37.1% increase in total oil in cold treatment relative to warm treatment, while the spring genotypes Suneson and PI 652885 showed 77.4 and 89.9% increases, respectively, in total oil in the cold treatment (Supplemental Table S4). Taken together, these data provide strong evidence that FA composition and oil content are both environmentally plastic traits in C. sativa, specifically with regard to growth temperature, and that there are strong GxE effects.

Discussion
Understanding the environmental and genetic factors that influence Camelina seed FA composition is a necessary first step for future plant breeding and agriculture, and can also shed light on mechanisms of local environmental adaptation in wild species. We examined the role of these factors in shaping FA composition and content. Wild Camelina species were found to have unique FA profiles and to largely occur in different environments (Fig. 1, Table 2, Supplemental Figures. S2 and S3). For the crop wild progenitor, C. microcarpa, three genetic subpopulations were discovered, which correspond to different geographical regions within the native range of the genus (Fig. 2, Supplemental Figure S5). Both local environment and subpopulation identity of C. microcarpa accessions were found to influence seed FA composition, including genotype-by-environment interactions (Fig. 3). Within the crop species, and when controlling for genetic background, we found that temperature alone elicits large changes in FA composition and oil content of seeds (Fig.  4). From these observations we can conclude that environment, genetics, and genotype-by-environment interactions all play a strong role in determining seed FA composition in the genus, revealing a complex path in determining , and total oil by seed weight in three replicates each of three C. sativa accessions grown at 12°C (blue) and 30°C (orange). P-values < 0.001 denoted with *** seed oil characteristics. Below we discuss these findings in the context of FA variation in Camelina species across environments and their potential implications for oilseed agriculture.
Camelina species harbor unique variation in seed oil composition Characterizing natural variation in agriculturally relevant FAs, such as the antinutritive erucic acid (22:1), holds important relevance for crop development. While FA composition between the Camelina species studied herein appear superficially similar (Fig. 1), species could nearly all be readily distinguished based on FA composition using random forest models (Supplemental Figure  S2). The predominant exception, the domesticated species C. sativa and its wild progenitor C. microcarpa, can likely be accounted for by the very close evolutionary relationship of these two species. The lack of differentiation in FA composition between the crop and its progenitor further suggests that FA composition was not a major target of selection during C. sativa's domestication. In contrast to composition, total oil content was significantly elevated in the crop species compared to the wild progenitor (LMM, p = 0.007), consistent with selection for increased seed oil content during domestication. This pattern of selection on oil content but not composition during seed crop domestication has also been observed in several domesticated species relative to their predomesticates, including chickpea (Cicer), soybean (Glycine), grass pea (Lathyrus), common bean (Phaseolus), and pea (Pisum) [35].
Random forest analyses revealed that variation in erucic acid (22:1) was the most informative FA for distinguishing between Camelina species; at the intraspecific level, palmitoleic acid (16:1) was most informative for distinguishing genetically differentiated subpopulations within C. microcarpa, although due to its low abundance, 16:0 and 22:1 are likely more biologically informative (Supplemental Figure S1b). Fatty acids such as these, which differ significantly between evolutionarily diverged groups within Camelina, warrant further study. Knowledge of the genetic basis of this variation could provide an important avenue for producing a more desirable FA profile in C. sativa and potentially other oilseed crop species.

Geographical and climatic distributions of Camelina species
While there is considerable overlap among the environments where the sampled Camelina species occur (Supplemental Figure S5), our data provide evidence that there is detectable environmental differentiation among some members of the genus (Table 2). For example, C. hispida and C. rumelica are present in similar environments but occur in significantly different environments from all other species (Table 2). In principle these patterns could be indicative of adaptive differences for the climates in which these species occur [33]; future population level experiments would be required to test this hypothesis.
Although the sampling for our study provided a broad representation of Camelina species diversity, it did not include one extant species, C. neglecta, as wild population samples were not available. This newly described species is known from a few collections in France [15]. Previous research has reported a unique seed FA composition and exceptionally high erucic acid content in C. neglecta when grown in controlled environments [36]. Additional sampling and characterization of C. neglecta populations may provide a promising avenue for crop improvement, as recent studies have uncovered up to two of the three subgenomes of C. sativa to be derived from C. neglecta or a close relative [16,37]. Resynthesis of this hexaploid crop may prove possible as is the case in Brassica and the 'Triangle of U' [38], thus facilitating additional natural diversity and agronomic traits for crop improvement [15].

Camelina microcarpa population differentiation and taxonomic identity
Population structure analyses based on genome-wide SNPs revealed three distinct genetic subpopulations of C. microcarpa (Fig. 2c), with a predominately Caucasus population that shows high differentiation from both northern and southern Ukrainian populations (F ST > 0.30 for both pairwise comparisons). The lack of admixture between the Caucasus population and the two Ukraine populations (Fig. 2a,b), despite genotypes sometimes occurring in close proximity (e.g., in introduced U.S. locations), may indicate that these populations are divergent enough to have evolved reproductive isolating barriers that prevent admixture. Therefore, crossing experiments would be valuable to determine whether they are genetically compatible.
The genetic substructure we detect in C. microcarpa may also have implications for the current taxonomic ambiguities related to C. sativa and its congeners. Camelina microcarpa was formally described by Augustin Pyramus de Candolle [39] based on a specimen collected by Antoni Andrzejowski in the western and/or westerncentral part of Ukraine (Podillya / Podolia region). The species was provisionally named by Andrzejowski as C. microcarpa Andrz., but the name was not properly published before its validation by de Candolle in 1821. Thus, in our opinion, the type of the name C. microcarpa has not been properly designated yet, as the application of plant names at the rank of family and below requires nomenclatural types (Principle II and Art. 7.1 of the International Code of Nomenclature for algae, fungi and plants (ICN): [40]). If the lack of admixture we observe between the Ukrainian and Caucasus subpopulations is reflecting reproductive isolating barriers, a separate species designation may be warranted for one of the two groups. Given the close relationship of the Caucasus subpopulation to the crop species, this could have important implications for the taxonomic identity of the crop's wild progenitor species. The correct type designation will be discussed in detail in a separate nomenclatural note (Mosyakin & Brock, in preparation).
A recent study on Camelina spp. that sampled extensively across Eurasia has revealed several ETS sequence ribotypes for C. microcarpa which are predominantly split between western ribotypes in Europe and eastern ribotypes in Asia [41]. However, that study did not incorporate samples from Turkey, Georgia, or Armenia, where our Caucasus population of C. microcarpa was predominantly found. Thus, it is unclear whether our study is missing an additional subpopulation found in Asia, or if our Caucasus population represents the same population as the Asian 'eastern' ribotype group. Another recent study on wild Camelina species also uncovered a C. microcarpa population that is genetically distinct from other C. microcarpa and C. sativa accessions; in this case, however, the geographical sampling suggests that the distinct genetic group corresponds to the Ukrainian populations identified in the present study.
Interestingly, introduced populations in the U.S. include representatives of at least two of these subpopulations (Caucasus, northern Ukraine) (Fig. 2a,c). These data provide, for the first-time, evidence of multiple introduction events of C. microcarpa as a weed into the U.S. In Canada, a recent survey of wild Camelina species has uncovered some individuals that are morphologically similar to C. microcarpa but which were discovered to be tetraploid according to flow cytometry and chromosome counts [42].
Using flow cytometry, we tested a random sample of 12 of our C. microcarpa collections from the eastern Rocky Mountains to determine whether any were likely to be tetraploid. All genome size measurements were consistent with hexaploidy (Supplemental Table S1). Our results thus do not provide evidence for multiple ploidy states of C. microcarpa within the U.S., and they eliminate the possibility that ploidy variation could be responsible for the distinct genetic differentiation and FA composition reported herein.

Genotype and environment jointly affect oil traits in wild populations
Our study provides support for the capacity of environmental variables, including temperature and precipitation, to elicit changes to the FA composition and content of seed oil crops. In our models, large values of PC1, a proxy for maximum temperature of the warmest month and annual range of temperature, significantly increased saturated FAs (SFAs) and decreased oil content (Fig. 3). These findings agree with the notion that warm temperatures result in elevated levels of SFA [33]. Consistent with this pattern, previous studies have also revealed increased unsaturated FAs at low temperatures in flax, canola, and sunflower [28,43,44]. A study in C. sativa showed that high temperature decreases total seed oil and PUFAs, and higher precipitation improved oil content and PUFAs [45]. However, these main effects of the environmental predictors in our models were also strongly affected by interactions with population identity for MUFAs, PUFAs, and total oil; this suggests that populations could be evolving adaptations in response to climate differently. After accounting for interactions between environment and population, we identified phenotypic differences in FA measures between subpopulations, consistent with our random forest models for C. microcarpa. The Caucasus and southern Ukraine populations display lower SFAs with higher PUFAs and total oil when compared to the north Ukraine population (Fig. 3). Common gardens should be performed to more conclusively evaluate whether local adaptation is responsible for these differences.
An interesting outcome of our study is that environmental variables affect the same trait to different degrees between genetic populations (PCxSubpopulation interactions in Fig. 3). All FA response variables yielded at least one genotype-by-environment interaction with the exception of SFAs. A previous study did not find a significant genotype-by-environment interaction for FA composition in cultivated C. sativa [31]; however, the low genetic diversity in cultivated C. sativa [12][13][14] may be responsible for those results. Therefore, it is unclear whether the genotype-by-environment interactions described here are unique to C. microcarpa or might also exist in C. sativa. The ability to disentangle environment from the genetic component of seed oil composition allows for the identification of populations which may be desirable for introgression-based approaches to biofuel improvement in C. sativa and merits further investigation. Finally, the northern Ukraine population appears to be widespread as it occurs throughout the U.S. and Ukrainian ranges and may represent a good candidate population for introgression-based approaches to crop improvement due to its unique genotype-by-environment interactions and FA composition (lower erucic acid and higher total oil, see Supplemental Figure S6).
None of our models showed an effect of latitude or elevation on seed oil traits. These findings contradict observations in other systems such as Helianthus and Arabidopsis [30,33] in which FA composition was found to vary across latitude. As related to previous hypotheses on local adaptation of seed oil composition, we do not see direct evidence of this in C. microcarpa. One potential explanation is the broader climatic and geographic sampling of Helianthus spp. [33], which may have revealed more coarse-scale patterns with latitude which we did not find in our study. Coarse climatic measures such as latitude and elevation are only proxies for actual environmental factors; thus, we advocate the use of finer-resolution climatic data such as those available from Bioclim [46].
Camelina exhibits plasticity in seed oil composition in response to temperature Previous research shows that high temperatures have a detrimental impact on seed oil content and composition [28,29,43,44]. This may be caused by a reduced period of seed maturation, preventing developing seeds from continuing lipid biosynthesis in addition to reduced desaturation efficiency at high temperatures [29,43]. Field trials in C. sativa have previously reported increases in the polyunsaturated α-linolenic acid in mild climates relative to warmer ones [47,48]. Our growth experiment of C. sativa cultivated in two temperature regimes (12°C and 30°C) yielded strong support for environmental plasticity in seed oil content and FA composition and are consistent with previous studies. In the warm condition, lines of C. sativa exhibited a significant reduction in PUFAs and total seed oil and significantly elevated levels of SFAs relative to the cold (Fig. 4). Furthermore, plants had reduced levels of the omega-3, α-linolenic acid (18:3), and increased erucic acid (22:1) in the warm condition (Supplemental Figure S8). Phenotypic plasticity observed in C. sativa seed FA composition and oil content reported herein is also largely congruent with that observed in growth trials of Arabidopsis thaliana conducted at 10°C and 30°C [29], with the exception of 16:0, 18:2, and 22:1, which showed opposite responses to high temperatures in C. sativa relative to A. thaliana. Finally, the elevated levels of 18:1 in C. sativa grown at high temperature (Supplemental Figure S8) indicates that lipid biosynthesis may have been inefficient or prematurely halted, as 18:1 is a known substrate for both FA desaturation and elongation [49]. Thus, temperature alone elicits a plastic response in FA composition and oil content in C. sativa. Notably, these insights suggest that rising temperatures resulting from climate change could pose a detrimental effect on cultivation of C. sativa and other oil-seed crops through the reduction of favorable oil composition and decreased oil yield.

Conclusions
Our study indicates that Camelina species often occupy specific environmental niches and that at both the species and population levels, FA compositions are distinguishable among genetically differentiated groups. Within C. microcarpa, environmental factors and genetic background both play a role in FA composition and total oil content, with many genotype-by-environment interactions. When controlling for genetic background, temperature alone was shown to elicit a large phenotypic shift in FAs. Thus, the present study supports the dogma that environment and genetics together determine complex phenotypes but also that populations respond to environmental conditions differentially through genotype-by-environment interactions. Considering the wide geographical distribution of C. microcarpa, and evidence presented herein of at least three genetically distinct populations, as well as differences in oil composition between populations, we believe that further studies on this wild predomesticate may uncover useful variation for agricultural improvement via introgression into C. sativa. Traditional morphology-based taxonomy should be applied in combination with molecular and experimental approaches along with broader geographical sampling to achieve a better understanding of geographical patterns, population structure, genetic relationships, and infraspecific taxonomy of the C. sativa and C. microcarpa species complex. Furthermore, insight into the effects of environment on seed oil quality in Camelina may be useful for future studies examining the ecological functions of seed oils and how climate change will affect wild plant populations.

Sample collections
To sample widely across the environmental and geographical range of wild Camelina populations, mature seeds were collected by J. Brock in the field from Turkey  Table S1). Species determinations were used from [17] and additional determinations were carried out by J. Brock with assistance from Ihsan Al-Shehbaz (Missouri Botanical Garden), with representative vouchers deposited at MO and ARIZ. All plant material was collected in compliance with institutional and international guidelines. Samples from Turkey, Armenia, and Georgia were previously described in [17] and collections were carried out in collaboration with Haceteppe University (Turkey), the National Academy of Science of Armenia, and the Georgian Academy of Sciences. All taxa collected are not regulated weeds, not listed on The International Union for Conservation of Nature's Red List of Threatened Species (IUCN) Red List or regional Red Lists, and are not protected under the Convention on International Trade in Endangered Species of Wild Fauna and Flora (CITES).
Collections focused on C. sativa's wild progenitor (C. microcarpa) and three closely related wild species (C. rumelica, C. hispida and C. laxa). Turkey and the Caucasus are likely the center of diversity for Camelina species and is where every extant species can be found except for the newly described species C. neglecta [15]. All C. rumelica, C. hispida, and C. laxa accessions used in this study were found in this region. The geographically widespread weedy species, C. microcarpa, was recovered from a broader geographical region including eastern Turkey, Georgia, Armenia, and Ukraine, as well as in the western U.S. where it is an introduced weed. The sampled range also includes areas of historic C. sativa cultivation, particularly Ukraine, although only one C. sativa accession (JRB 153, from Turkey) was found growing outside of an agricultural context; all other C. sativa accessions (JRB 179, 180, 181, 188, and 190) were collected from rural Ukrainian family farms where it was being cultivated on a small scale as an oilseed crop. Aside from these Ukrainian crop collections, all other accessions used in analyses were wild or weedy. No permissions were required for collecting the samples. GPS coordinates and mature seeds were collected for each accession, and geographical locations of collecting sites were mapped using ArcMap v.10.6 (ESRI, Redlands, CA, USA) and World Topo basemaps: https://www.arcgis.com/home/. The newly described species C. neglecta is the only extant Camelina species that was not sampled for the study, as wild collections were not available.

Fatty acid phenotyping and analysis
Relative abundance and composition of seed FAs was determined for field-collected seeds of 89 Camelina accessions (including 57 C. microcarpa, 6 C. sativa, 17 C. rumelica, 6 C. hispida, and 3 C. laxa). Determinations were performed with a Fatty Acid Methyl Ester (FAME) extraction protocol slightly modified from Augustin et al. [5] as follows: Seed samples were weighed in triplicate for each accession (3-15 mg seeds per replicate). Individual replicates were then added to glass tubes with screw tops and ground using a glass stir rod with 1.5 mL 2.5% sulfuric acid in methanol before the addition of 500 μL toluene. An internal standard (50 μg mg − 1 triheptadecanoin) was added to each sample before incubation at 95°C for 50 min. Samples were cooled to room temperature before the addition of 1 mL hexane and 1 mL 1 M NaCl followed by rapid mixing. Samples were centrifuged at 1500 rpm for 5 min, and the resulting hexane layer was transferred to glass autosampler vials. FAME analysis was performed by GC-FID on a Thermoquest Trace Ultra GC system with an Agilent HP-INNOwax column (30 m × 250 μm × 0.25 μm) using helium as the carrier gas. GC conditions were as follows: 60°C for 1 min, increasing to 185°C at a rate of 40°C min − 1 , increasing to 235°C at a rate of 5°C min − 1 followed by a 5 min hold. FAME species were identified by retention time compared to known standards, and relative FA abundance was determined by individual peak area divided by total area of all peaks (Supplemental Table S1). Total FAME abundance was quantified relative to the triheptadecanoin internal standard to estimate total seed oil content. FA values for all accessions were based on all three technical replicates except for accession JRB_275, where one replicate was excluded due to instrument peak integration errors.
A random forest analysis [50] was performed with randomForest in R to determine whether seed FA composition could be used to predict species and identify the major FA predictors. FA composition was defined by the relative abundance of 12 FAs present in seed oils. The random forest algorithm draws subsamples of the data with subsets of the total FA profile and generates decision trees for prediction. This process is bootstrapped to generate a model with better predictive ability than individual decision trees. A subset of the FA composition data (70% of the total) was used as a training set to create 5000 trees. Accessions were divided between training and test datasets so that the model had to predict completely novel accessions. To improve model fit in the face of unbalanced designs, the random forest algorithm was implemented with stratified sampling such that only one technical replicate was included in each iteration of the algorithm. Randomly sampling five and four FA variables at each split resulted in the most accurate model for Camelina species and C. microcarpa populations, respectively. The resulting model was used to predict species in the remaining 30% of the data. We assessed models with both the accuracy of predictions on the testing set and the kappa statistic [34], using the confusionMatrix function in the caret package in R [51], which is more informative for unbalanced designs.
To assess the relationship between seed morphology, FA composition and environmental factors, we measured seeds from accessions used in FA phenotyping. Greater circularity is often an indicator of higher seed fitness in Camelina, where abiotically stressed plants typically exhibit lower seed circularity (J. R. Brock, unpublished observations). Seeds of each accession were imaged on a Canon LiDE110 office scanner, and images were saved at 600 dpi resolution. Image files were processed in the SmartGrain analysis software [52] for measurements of seed width, length, area, perimeter, and circularity. Because these measures are all highly corelated, we chose only circularity as a measure. Seed Detection Intensity and Nogi Detection Intensity were set to 'rough' to allow for maximal identification of seeds before curating each scan by hand to eliminate incorrect seed identifications. Average values for seed circularity were then used in statistical analyses.

Environmental niche analyses
Climatic variables for population locations were generated from GPS coordinates of each individual seed collection using the WorldClim 1.4 dataset [46] at the highest available spatial resolution (~1 km 2 ). Environmental differences between locations were then assessed using permutational multivariate analysis of variance (PERM ANOVAs) [53] with Euclidean distances. We tested whether there was significant dispersion between factors, which could bias the PERMANOVA results [54], in R using the betadisper function in the vegan package [55]. Dispersion tests are similar to Levene's test in univariate ANOVAs. For comparisons between species, we used unscaled and untransformed data because scaling and transformations resulted in significantly different dispersions. For comparisons between regions of C. microcarpa samples, we log transformed and scaled (by subtracting the mean and dividing by the standard deviation) the data to yield no difference in dispersion. PERMANOVAs were performed with the adonis function in the vegan package with 10,000 permutations. If a significant PERMANOVA was found, we performed pairwise comparisons with false discovery rate (FDR) corrected p-values. PERMANOVA results were visualized with non-metric multidimensional scaling (NMDS) plots.

Genotyping by sequencing
The influence of population structure on intraspecific FA composition was examined for the crop wild progenitor, C. microcarpa, which was also the species with the most FA phenotype data. DNA was extracted from 83 accessions, including 56 of the 57 accessions phenotyped for FA composition, using either a modified CTAB DNA extraction protocol [17] or DNeasy Plant Mini Kits (Qiagen, Valencia, CA, USA). Twenty-seven additional samples without FA composition data were added from collections of J.R.B. and the USDA GRIN germplasm collection to bolster sample sizes for population structure analyses (Supplemental Table S1). Genotyping-bysequencing (GBS) libraries were then prepared with a method modified from Elshire et al. [56]  Raw sequence reads were processed to generate a filtered SNP dataset for population structure analyses. A modified version of the Fast-GBS program [57] was implemented to enable the use of paired-end sequencing data. Within the pipeline, Sabre (https://github.com/ najoshi/sabre) was used to sort and filter barcodes, and Cutadapt [58] was used to trim reads of the barcode region. Paired-end reads were then aligned to the C. sativa reference genome JFZQ01 [59] using BWA. Variants were searched with Platypus variant caller (https:// github.com/andyrimmer/Platypus). PLINK [60] was then used with the following conditions: Minor Allele Frequency = 0.05, Genotyping = 0.1, Missing data per individual = 0.74. This resulted in 261,529 variants from 83 samples. VCFtools was used to generate the final SNP dataset using the following parameters: Max Missing = 0.5, Minor Allele Count = 3, Minimum Quality Score = 30. One sample (JRB 120) was removed due to excessive missing genotype data; 248,195 variants were removed due to missing genotype data; 3184 variants were removed due to deviations in the Hardy-Weinberg exact test; and 1450 variants were removed based on the minor allele threshold. A total of 8700 variants in 82 accessions remained in the final strict filtering dataset.

Population structure analysis
Genetic substructure within C. microcarpa was assessed with ADMIXTURE v.1.3.0 [61] on the final dataset at K values from 1 to 10 with ten iterations for each K. F ST measures of genetic differentiation between subpopulations were output by ADMIXTURE. Cross validation scores were obtained from ADMIXTURE for each K value. Cross validation is used to estimate error in a predictive model, where the lowest cross validation error values from ADMIXTURE represent optimal K-values. Admixture results were displayed in pong [62]. As a complementary analysis, filtered SNPs output from PLINK were also used as input in a principal component analysis (PCA) in R to visualize clusters of genetically similar accessions.

Genotype-by-environment interaction analyses
We investigated the effect of environment and population structure on FA composition using linear mixed effect models (LMMs) fit using maximum likelihood in the lme4 package [63]. LMMs are models used to account for variation explained by fixed effects (variables of interest) and random effects. Response variables included proportions of saturated fatty acids (SFAs), mono-unsaturated fatty acids (MUFAs), polyunsaturated fatty acids (PUFAs), and total oil content in seeds. The two most optimal population structure outputs from the ADMIXTURE analysis (K = 2 and K = 3; see Results) were included as categorical variables. To account for correlations between the BioClim variables, we performed a principal component analysis (PCA), a way to partition variation into uncorrelated components, and included the first two components in our models as fixed effects. We included latitude and elevation in our models to investigate whether these coarse measures of environment are important when accounting for more direct environmental variation using principal components (Supplemental Table S2). Seed circularity was included as a proxy for plant health, where lower values may be indicative of abiotic stress. All continuous variables were centered by subtracting the mean and scaled by dividing by the standard deviation to control for varying scales of measurement between different variables. Accession was included as a random effect to account for variation between oil measurements of the same genotype. Total oil was included as a covariate in models of SFAs, MUFAs, and PUFAs to understand how these measures change with total oil content, while SFA was included as a covariate in the total oil model such that: Total oil $ SFA þ elevation þ latitude þ seed circ: þ PC1ÃPC2Ãpopulation SFA; MUFA; PUFA $ total oil þ elevation þ latitude þ seed circ: þ PC1ÃPC2Ãpopulation We selected the best model using small sample sizecorrected Akaike Information Criteria (AICc) and the decision tree in Leroux [64]. When confidence intervals were required to choose between models, we used 85% confidence intervals since this is consistent with model selection using AIC [65]. We ensured that models were not affected by multicollinearity by calculating variance inflation factors for all final models and assessed the influence of outliers by using robust mixed-model regression from the robustlmm package [66] presented in Supplemental Table S3.
We also used linear mixed models to test for differences between total oil content between different species. Accession was included as a random effect. P-values were generated with the lmerTest package using Satterthwaite's method [67] and corrected for multiple comparisons with FDR.

Growth chamber experiments
To assess effects of growth temperature on seed oil content and composition, three lines of C. sativa (Suneson, PI 650155, PI 652885) were cultivated in two controlled environment growth chambers with equivalent levels of light set at 200 μmol m − 2 s − 1 . One chamber mimicked a warm climate with a constant temperature of 30°C, while the other was a cold climate chamber with a day temperature of 12°C and night temperature of 10°C. Both chambers were set on 16 h day, 8 h night regimes. Accession PI 650155 is a winter variety that requires vernalization to initiate flowering. Plants from this accession were therefore exposed to an 8-week period of vernalization at 4°C before placement in the growth chambers. Seed samples from controlled growth trials were then run in FAME analysis, with three technical replicates for each of three biological replicates per accession, per condition. FAMEs were analyzed by GC-MS on a Thermo Trace Ultra GC with Thermo ITQ-900 MS system with an Agilent HP-INNOwax column (30 m × 250 μm × 0.25 μm) using a helium carrier gas. GC method conditions were as follows: 70°C for 7 min, increasing to 185°C at a rate of 70°C min − 1 , increasing to 260°C with a 6 min hold, decreasing to 70°C at a rate of 120°C min − 1 with a 2 min hold. FAME species and total seed oil content were determined as described above for wild seed samples (Supplemental Table S4).
Growth chamber experimental results were analyzed with mixed models using temperature treatment as a fixed factor. To account for the nested design of the study, we included a nested random term with individual plants nested within accessions. P-values for comparisons between SFAs, MUFAs, PUFAs, and total oil were computed using Satterthwaite's method as described above.