Variation in morpho-physiological and metabolic responses to low nitrogen stress across the sorghum association panel

Access to biologically available nitrogen is a key constraint on plant growth in both natural and agricultural settings. Variation in tolerance to nitrogen deficit stress and productivity in nitrogen limited conditions exists both within and between plant species. However, our understanding of changes in different phenotypes under long term low nitrogen stress and their impact on important agronomic traits, such as yield, is still limited. Here we quantified variation in the metabolic, physiological, and morphological responses of a sorghum association panel assembled to represent global genetic diversity to long term, nitrogen deficit stress and the relationship of these responses to grain yield under both conditions. Grain yield exhibits substantial genotype by environment interaction while many other morphological and physiological traits exhibited consistent responses to nitrogen stress across the population. Large scale nontargeted metabolic profiling for a subset of lines in both conditions identified a range of metabolic responses to long term nitrogen deficit stress. Several metabolites were associated with yield under high and low nitrogen conditions. Our results highlight that grain yield in sorghum, unlike many morpho-physiological traits, exhibits substantial variability of genotype specific responses to long term low severity nitrogen deficit stress. Metabolic response to long term nitrogen stress shown higher proportion of variability explained by genotype specific responses than did morpho-pysiological traits and several metabolites were correlated with yield. This suggest, that it might be possible to build predictive models using metabolite abundance to estimate which sorghum genotypes will exhibit greater or lesser decreases in yield in response to nitrogen deficit, however further research needs to be done to evaluate such model.


Introduction
Malthus predicted that exponential population growth would always surpass linear increases in food production resulting in constant famine [1]. Both dramatic increases in total agricultural land and technological innovations have staved off Malthusian catastrophy in the 20th and early 21st century. One of the key technological innovations was invention and widespread adoption of the Haber-Bosch process, which reduces atmospheric nitrogen gas (N 2 ) to reactive forms of N, to provide an abundance and reliable source of nitrogen fertilizer for agriculture [2]. The widespread adoption of synthetic nitrogen fertilizers has dramatically increased crop yields but these increases have not come without some negative externalities, including increased greenhouse gas emissions and decreases in rural water quality [3]. In addition, for many non-irrigated agricultural systems the cost of fertilizer is either the single biggest variable input cost of production, or the second biggest after the cost of seed [4]. As the human population continues to grow and populations around the world shift to more calorie intensive diets, incentives and pressure on agricultural productivity will increase as well [5,6]. It has been estimated that only 30-40% of nitrogen fertilizer is taken up and utilized by crops [7]. Increasing the nitrogen use efficiency of major agricultural crops would enable farmers to meet these growing requirements for food production with stable or decreasing applications of nitrogen fertilizer, increasing farmer profitability, and decrease the environmental and energy footprint of agriculture [8].
Substantial genetic variation in nitrogen use efficiency exists within crop plants [9,10]. Between 1969 and 2010 European wheat breeders increased the nitrogen use efficiency of wheat by an estimated one third of one percent per year [11]. The global impact of a 1% increase in nitrogen use efficiency is estimated to be $1 billion dollars per year [12]. Understanding the genes controlling variation in nitrogen use efficiency and the other phenotypes associated with these differences would aid in both evaluating the feasibility of increasing nitrogen use efficiency in different crops -while sustaining the high yields necessary to meet global demand for food -and, where feasible, designing breeding strategies to achieve such an increase. However, nitrogen use efficiency is a complex trait and multiple morpho-physiological and metabolic mechanisms likely play roles in determining how well or poorly a given plant genotype can compensate for limited N availability in different environments and at different life stages.
Understanding the morpho-physiological and metabolic mechanisms associated with differences in tolerance for nitrogen deficit stress in agriculturally relevant environments represents a stepping stone to the subsequent identification of genetic loci associated with response to low nitrogen stress and finally to crop improvement via breeding or engineering. To date the majority of research on the morpho-physiological and metabolic responses of plants to nitrogen deficit stress has been conducted in controlled environment conditions, particularly emphasizing severe stress applied early in development [13][14][15]. The nitrogen deficit stress experienced by crops in agricultural settings is typically less extreme and may not produce obvious visual effects, but is sufficient to result in substantial grain or biomass decrease over the course of a growing season. Collecting phenotypic and metabolic data from large sets of genotypes experiencing agriculturally relevant degrees of stress under field conditions can provide substantial insight into natural variations in stress response and tolerance within individual crop species [16].
Here we quantified crop yield and eight morpho-physiological traits from a large and diverse sorghum population (Sorghum bicolor L.) grown to maturity in field conditions under both nitrogen limiting and non nitrogen limiting conditions. For a subset of 24 replicated genotypes, large scale metabolic profiling was conducted from leaf tissue collected at the flowering stage. Significant plasticity and genotype x environment interactions were observed for both yield and a subset of metabolic traits, while substantially less genotype x environment interaction was observed for morpho-physiological traits. The abundance of several metabolites at flowering exhibited significant correlations with plant performance (e.g. yield) at maturity.

Genetic variability of sorghum's response to differential nitrogen application
A population of 347 sorghum genotypes drawn from the Sorghum Association Panel (SAP) [17] were grown under two nitrogen treatments with replication in Lincoln, Nebraska: low nitrogen (LN; no supplemental nitrogen) and high nitrogen (HN; 90 kg/ha, following local agronomic recommendations to avoid nitrogen limitations on yield in sorghum). A mixture of manually scored -leaf number, flag leaf length, flag leaf width, plant height, days to flowering-and phenotypes estimated from hyperspectral reflectance data -specific leaf area (SLA), chlorophyll content (CHL) and nitrogen (N), phosphorus (P) and potassium (K) content following previous workflow [18] -were collected from plants grown under both conditions (Table S1).The overall hyperspectral reflectance profile of sorghum leaves collected from plants grown in HN and LN treatments was similar (Fig. S1a), and neither of the first two principle components clearly separated the two treatments (Fig. S1b). Ground truth data were obtained for five traits: CHL, SLA, N, P, and K content from 265 samples, and partial least squares regression (PLSR) were used to predict values for scored six traits for whole panel based on hyperspectral data. Raw spectral data used for PLSR model building, as well as prediction traits were provided in Table S2. Employing five-fold cross validation with ground truth samples the accuracy (R 2 ) of phenotypes estimated from hyperspectral reflectance data varied from 0.18 for P to 0.82 for CHL (Table  S3). Similar performance was observed in the validation set ( n = 80 , Table S3 and Fig. S2) indicating models were not overfit. Prediction accuracy for K and P were low (R 2 in validation set 0.22 and 0.25; Table S3), and these traits were excluded from downstream analyses.
The effect of N treatment was statistically significant for all traits evaluated, except for plant height (p < 0.05; likelihood ratio test (LRT); Fig. 1). Flag leaf width and flag leaf length were reduced by approximately 3.5% and 6.5% respectively under LN treatment. Plants grown under LN took 4% more time to flower. Larger differences were observed in chlorophyll and nitrogen content, with reductions of 15.3% and 13.8% respectively under LN treatment. However, the single largest impact of low nitrogen stress was observed on grain product, with a 48% reduction in grain yield under LN treatment.
While overall population level responses to nitrogen deficit treatment were statistically robust, individual genotypes often exhibited different degrees of response to nitrogen treatment. A mixed model, considering genotype, treatment, and the interaction between genotype and treatment (genotype-by-environment, GxE) effects, was fit to each individual phenotypic dataset. A majority of the total variation in plant height and flowering time was explained by differences between genotypes ( ∼91% and ∼85% respectively together on HN and LN; Fig. 2a). In the case of plant height none of variance was attributed to treatment or genotype by environment interaction. For flowering time, only ∼ 4% of variance were explain by treatment effect and ∼ 3% by GxE. The high degree of genetic control and low GxE effect is reflected in the high degree genetic correlation across treatment conditions for these traits: 0.86 for plant height and 0.8 for flowering time (Fig. S3). Variance in traits related to leaf (leaf number, leaf width, and leaf length) were also mostly explained by genetic factors (> 60% for each of these three traits). However, proportion of variance not explained by any of the factors in the model (e.g. the residual) was substantially greater for each of the three leaf related traits compared to plant height and flowering time, leading to lower correlations across treatment ( ∼0.6; Fig. S3). Traits estimated from hyperspectral data (CHL, N, SLA) were comparatively much more plastic across environments (Fig. 2a), but only modest amounts of variance was attributed to GxE for each of these traits. One explanation for this, is that although our PLSR models were accurate (R 2 > 0.6), it might be still not sufficient to precisely capture GxE.
Extensive plasticity of grain yield in response to nitrogen deficit stress was observed across the study population. Among analyzed traits, grain yield exhibited by far the largest proportion of variance attributable to GxE (Fig. 2a), resulting in only moderate genetic correlation between treatment (0.42, Fig. S2). Genotypes with high grain yield in the HN treatment tended to be somewhat more sensitive to low nitrogen stress than genotypes with low grain yield, even under the HN treatment ( Fig.  S4). However, the correlation between the responses of grain yield to nitrogen deficit stress and grain yield under HN was modest ( ∼-0.3; Fig. S4). This reflects the relatively larger GxE effect of nitrogen treatment on yield. Although highly yielding genotypes on HN are more sensitive to LN stress, this reaction is not consistent and yield of some genotypes are less affected by LN stress.
Coefficients of variation were calculated for each variance component for each trait, following the approach described in [19]. Plant height exhibited the largest relative variation, particularly variation attributed to genetic factors (Fig. 2b), likely reflecting the effects of multiple large effect dwarfing genes segregating for functionally distinct alleles among the lines of the sorghum association panel [20]. The second largest relative variance was observed for grain yield, in particular for genetic factor under HN. However, relative variance from treatment conditions and GxE were also large, and in fact larger than the variance for any component among the remaining traits.

Metabolomic changes in sorghum leaves under long-term low nitrogen stress
As morpho-physiological traits scored in this study did not appear to explain the plasticity of sorghum grain yield across different nitrogen availability treatments, we next sought to characterize the responses of a large suite of metabolic phenotypes to differential nitrogen availability in the adult leaves across a subset of sorghum genotypes of the SAP. A set of 24 genotypes were selected to represent the phenotypic and genetic diversity of the SAP (Fig. 1, S5; [21]). Sampling was timed to coincide with anthesis, with a total of 96 leaf samples collected from two independent plots per genotype per treatment. Each sample was quantified via liquid chromatography -high-resolution mass spectrometry (LC-HRMS) analysis. In order to maximize the number of metabolites detected and quantified each sample was analyzed using both RP (reverse phase) and HILIC (hydrophilic interaction liquid chromatography) separations in both positive and negative ion mode, resulting in the detection and quantification of 115,782 mass spectral features. After filtering out features that were detected in less than 80% of samples, and further manual quality control (as described in the methods section), the number of features was reduced to 3,496, of which 145 could be assigned high confidence annotations (Table S4).
No obvious differences were observed in the distribution of estimated abundance values for high confidence metabolites (n = 145) between HN and LN conditions (Fig. 3a). Samples collected from plants grown in HN or LN were not clearly separated by either of the first two principal components of variation for the abundance of this set of high confidence annotated metabolites although samples collected from plants grown in LN exhibited a tighter distribution of PC1 values than did samples collected from plants grown in HN (Fig. 3b). After correcting for multiple testing via false discovery rate (FDR, [22]), the abundance of 62 metabolites changed significantly between samples collected from  (Fig. 3c). The vast majority of statistically significant changes in metabolite abundance were modest in size, defined as a less than two fold change in abundance between treatments. The majority of observed free amino acids (17/33) were significantly more abundant in samples collected from plants grown under HN but the amino acids acetylcarnitine and L-carnitine were significantly more abundant in samples collected from plants grown under LN (Fig. 3d, Table S4). In contrast, half of the phenolic compounds confidently identified in this dataset, such as the eight flavonoids, were significantly more abundant in samples collected from plants grown under LN (Table S4).
Similar results to those observed with the set of annotated metabolites were observed when analysed all 3,496 identified mass features. Overall abundance of those compounds was similar across treatment ( Fig. S6a-b).
Although 337 compounds were significantly different across two treatment condition (FDR < 0.05; Fig. S6c), those changes were rather small, with only 28 compounds being changed larger than two fold between treatments. Finally, PCA based on 3,496 mass features did not separate two treatment conditions (Fig. S6d). Interestingly, many of these unidentified metabolites showed relatively high heritability, with mean value 0.6 under HN and 0.68 under LN (Fig. S7). This suggests that natural variation in the contents of these compounds is genetically controlled, which makes a good prospect for furthering their identification and uncovering their biological meaning through genetic studies. In case of known metabolites, variation in the abundance of individual flavonoid and flavonoid glycosides compounds tended to be the most heritable across independent field plots of the same genotype grown in the same environment (Fig. S8).
A similar variance partitioning strategy to that employed for morpho-physiological traits was used to partition variance for each annotated metabolite. For each metabolite a mixed model was fit, including terms for genotypes (genetic effects), differences between N treatments (environmental effects), and genetic differences in the degree of response to N supply (genotypeby-environment, GxE). Likely as a result of the much smaller overall number of datapoints for each metabolic trait relative to each morpho-physiological traits, this model could only be successfully fit for 46 of 145 metabolites. Differences between genotypes typically explained around half of the variance for different metabolites ( ∼28% on HN and ∼31% on LN), while the variance explained by environmental factor was much lower ∼ 2% (Fig. 4). The GxE effect explained on average of ∼ 7% of variance across the 46 metabolites where a mixed model was successfully fit. Despite the fact, that this value was not very high, it was higher than the average variance explained by the GxE effect for morpho-physiological traits ( ∼1%). The coefficient of variation for each variable for each metabolite vary, but no clear pattern can be observed across different classes of metabolites (Fig.  S9). A wide range of different patterns are exhibited by individual metabolites in response to LN stress across different genotypes. Glucose and sucrose both belong to the set of 83 metabolites which did not show any statistically significant differences in abundance between samples collected from plants grown in high nitrogen and plants grown in low nitrogen but which do exhibit consistent patterns of difference in abundance between genotypes across treatments (Fig. 5a-b). Serine, one the amino acids with a statistically significant difference in abundance between samples collected from plants grown in HN and LN exhibits a consistent decreased in abundance across genotypes with ∼15% of variance explained by the environmental factor (Fig. 5c). Glutamic acid and allantonin both exhibited large GxE effects of ∼15% and ∼ 30% variance for these two metabolites explained by GxE respectively (Fig. 5d-e). Genotypes with comparatively high glutamic acid content in HN saw larger reductions in glutamic acid content in LN. Genotypes with comparatively lower glutamic acid content in HN saw smaller reductions in LN. Previous study found a decrease in salicylic acid content under low nitrogen stress in sorghum root [23]. Here we found increase in salicylic acid content  (Fig. 5f ). This response is consistent across majority of genotypes, although strength of this reaction slightly vary, with genotypes with low salicylic acid content under HN indicating a higher increase under LN.

Correlation between metabolites and yield
The abundance of metabolites was correlated to some degree with observed grain yield values from the same plots ( Fig. 6; Table S5). The correlation coefficients between metabolite abundance and grain yield in individual environments (HN or LN) were positively correlated with each other (r = 0.36, p < 0.05; Fig. 6). However, in only a modest number of cases where the correlations between the abundance of individual metabolites and grain yield statistically significant including six metabolites in HN and eleven in LN. Five metabolites were statistically significantly correlated with grain yield in both environments: 4-hydroxymandelonitrile, aconitic acid, ascorbic acid, benzamide and glucose. The strength of the correlations between grain yield and metabolite abundance where relatively modest (r < 0.6) even for those metabolites where statistically significant relationships were observed.
Two machine learning approaches, elastic-net regression (GLMNET) and random forest (RF), were evaluated for their potential to predict variation in plot level grain yield from combined metabolite abundance data. Three sets of input data were evaluated with each of the two machine learning approaches. First, the set of all detected metabolites ( n = 3, 496 ). Second, the set of 145 metabolites with confident annotations. Finally, a set of 145 metabolites selected randomly from the complete set of 3,496 detected metabolites. Both algorithms achieved moderate prediction accuracy however, the accuracy of their predictions was either equivalent to or only modestly exceeded, the prediction accuracy of a simple linear model fit to only the treatment effect, which was able to predict 29% of the total variance in sorghum grain yield data (Fig. S10). Ascorbic acid showed the greatest contribution to the accuracy of the GLMNET model (   (Fig. 6).

Discussion
Natural variation in tolerance to nitrogen deficient growing conditions has been widely studied in both crops and other plant species. However, the majority of these studies occurred in controlled environments and imposed substantial nutrient deficits that produced visible phenotypic responses even at seedling stages. Here we examined natural variation in both the morpho-physiological and metabolomic impact of long term low intensity nitrogen deficit at a level sufficient to alter grain yield and fitness in sorghum but which does not produce obvious visible stress symptoms.
Grain yield is a complex phenotype that is determined by a number of different component phenotypes (e.g. yield component traits). In Arabidopsis branching number is correlated with yield and previous study found large plasticity of this trait in response to low nitrogen stress [19]. In case of rice, various traits such as tiller number, grain number per penile, or 1,000 -grain weight are associated with yield. Interestingly, only tiller number were affected by low nitrogen stress [10]. Finally, in case of maize 1,000 -kernel weight were also not affected by low nitrogen, but substantial decreases in kernel number per cob were observed [13]. This observation highlights the complexity of how plant yield can be affected by low nitrogen stress. In this study grain weight per panicle was used to represent sorghum yield, and consistently with research done on Arabidopsis [19], large plasticity in response to low nitrogen stress was observed in this trait.
While grain yield decreased substantially under nitrogen limited conditions for the vast majority of sorghum genotypes, rank order grain yield under high nitrogen conditions was only modestly correlated with rank order grain yield under low nitrogen conditions (Spearman correlation = 0.44; Fig. S3). This suggests efforts to increase grain yield under nitrogen limited conditions will require separate field trials, evaluations and selections from breeding efforts to increase grain yield under non-nitrogen limited conditions. Yield under non-nitrogen limited conditions was negatively correlated with the size of the decrease in yield observed when nitrogen was limited ( ∼-0.3; Fig. S4). However, because of large GxE effect, this reduction is not consistent across highly yielding genotypes. Some of the reductions are characterized by relatively low loss in yield under low nitrogen conditions. This indicates that it should be possible to produce varieties not only with high yield under high nitrogen condition but also more robust to low nitrogen stress. In contrast to grain yield, the morpho-physiological traits did not exhibit significant degrees of change in response to the degree of nitrogen limitation applied in this study, and of the traits which did exhibit significant effects -such as leaf nitrogen content and specific leaf area -the effects of treatment and genotype were largely independent of each other (Fig. 2). While changes in chlorophyll concentration were quantifiable using both handheld chlorophyll concentration meter and hyperspectral reflectance data, plants in the nitrogen limited field were not visibly chlorotic (personal observation). Metabolite abundance was characterized for a subset of sorghum genotypes in both conditions in an attempt to identify other phenotypes with potential value to predict how the grain yield of different sorghum varieties will respond to nitrogen limitation. The overall pattern of metabolite abundances did not exhibit substantial differences between nitrogen limited and non-nitrogen limited conditions (Fig. 3a-c). This is consistent with both the limited degree of change observed for morpho-physiological traits and the goal of imposing a degree of nitrogen limitation sufficient to alter fitness/grain yield but not so severe that it dramatically altered plant growth.
While overall differences in metabolite abundance between conditions were modest, the metabolites that did exhibit significant differences between treatments were consistent with expectations for nitrogen limited grown plants. Decreases in the abundance of many amino acids were observed ( Fig. 3d; Table S4). Consistent with reports from studies of nitrogen deficit experiments in seedlings and adult maize leaves [13], sorghum roots [23], and maize, sorghum, and Paspalum vaginatum seedlings [24]. Disturbance in serine metabolism was previously found to play key role in limiting maize yield under low nitrogen conditions [9]. In addition, serine plays an important role in photorespiration [25], although in plants utilizing the C4 photosynthetic pathway, including both maize and sorghum this pathway is much less active than in plants utilizing the C3 photosynthetic pathway. We did not observe significant variation in the degree of decreased serine abundance observed among sorghum genotypes (Fig. 5c). Genetically controlled diversity for this trait may still be discovered in profiling of a larger panel of diverse sorghum lines under nitrogen limited and non nitrogen limited conditions. If no such diversity is found it may prove impossible to reduce this response to nitrogen constrained growth via conventional breeding and selection strategies. In contrast, while the abundance of glutamic acid also declined in nitrogen limited conditions, the degree of decline varied significantly among sorghum genotypes (Fig. 5d). In previous field studies of maize, the abundance of glutamic acid was negatively correlated with yield under heat and water stress but not under control conditions [16]. We observed a similar negative correlation between glutamic acid abundance and yield under both control and nitrogen limited conditions (Fig. 6). One potential explanation for this relationship might be, that genotypes with higher yield remobilise more nitrogen resources, including those coming from glutamic acid to fill kernels, and therefore they have lower level of glutamic acid. This result highlight potential importance of glutamic acid metabolism on yield in C4 crops under stress conditions. Genes involved in glutamic acid metabolism were enriched among those exhibiting differential mRNA expression between older maize inbreds (pre-1960s) and maize inbreds developed and selected by breeders in the modern era [26]. These observations suggest that glutamic acid metabolism may already have been an indirect target of selection during crop improvement in maize. If so, the data presented here suggest that glutamic acid may also represent an interesting metabolic marker when selecting for better performing sorghum genotypes although further validation is certainly needed.
Overall, our results highlight that grain yield in sorghum, unlike many morpho-physiological traits, exhibits substantial variability of genotype specific responses to long term low severity nitrogen deficit stress. Differences in the eight morpho-physiological traits scored in this study explained only ∼ 9% of variance in yield. Metabolic responses to long term low severity nitrogen deficit stress exhibited a higher proportion of variability explained by genotype specific responses than did morpho-pysiological traits and a number of individual metabolites were associated with yield variation under one or both nitrogen treatments. It may be possible to build predictive models using metabolite abundance to estimate which sorghum genotypes will exhibit greater or lesser decreases in yield in response to nitrogen deficit, however data from a larger number of genotypes grown across multiple sites will be necessary to train and evaluate such models. Large scale metabolic profiling will likely require targeted metabolomics using feature selection approaches to identify an informative subset of the metabolites profiled in this study.

Field experiment, and phenotypic data and tissue collection
A replicated field trial was planted at the University of Nebraska-Lincoln's Havelock Farm Location (N 40.861, W 96.598) on June 08, 2020. The experiment was laid out in a RBCD design, initially with two blocks each under high nitrogen (80 lbs/acre) and low nitrogen (no supplemental nitrogen) treatment conditions and 416 plots per block, including 347 genotypes from the sorghum association panel [17], and BTx623 as a repeated check. Each plot consisted of a single 2.3 meter row of plants from a single genotype, with 0.76 meter spacing between parallel and sequential rows. Before supplementing soil with nitrogen, soil sample from each block were taken and sent to a commercial lab for nitrate content analysis (Ward Laboratories, Inc., Kearney, NE). The nitrate content in blocks which was later supplement with nitrogen (HN) were 19.5, and 19.9 ppm respectively. In blocks without fertilizer (LN) the nitrate content were 13.7 and 9.45 ppm.
A mixture of hand measured traits and traits predicted from hyperspectral data (see below) were employed to assess the response of sorghum to nitrogen deficit stress. The date of flowering for each plot was scored when 50% of surviving plants reached anthesis. Plant height was measured after flowering and was defined as the distance from the soil surface to the collar of the flag leaf. The number of panicles produced by each plot (panicles per plot) were counted by direct observation. For each plot, one to three panicles were hand harvested, dried, threshed, and the resulting grain weighed. The weight of grain per harvested panicle was multiplied by the count of panicles per plot to estimate grain yield per plot. The difference in number of collected panicles primary come from difference in germination rate of different genotypes. A small number of sorghum genotypes had extremely poor germination rate and, as a number of panicles from this field project were reserved for another experiment that was not consistent with collecting total grain mass per panicle, it was not always possible to collect three panicles per plot.
Between 5 and 12 August 2020, hyperspectral reflectance data was collected from the second leaf from top of the plant from single plant per block using a FieldSpec4 (Malvern Panalytical Ltd., formerly Analytical Spectral Devices), following the protocol outlined in [18]. A set of 265 leaf samples (130 from HN and 135 from LN) were selected for ground truth measurements. Leaf chlorophyll concentration (CHL) was measured with a handheld chlorophyll concentration meter (MC-100, Apogee Instruments, Inc., Logan, UT), and leaf area (LA) was measured with a leaf area meter (LI-3100, LI-COR Biosciences, Lincoln, NE). Next, samples were placed in a oven set to 50 • C and dried over 72 h. Dry weight (DW) of the leaves was then recorded with digital balance. Specific Leaf Area (SLA, m 2 /kg) was calculated as LA/ DW. Finally, dried plant leaves were sent to a commercial lab (Ward Laboratories, Inc., Kearney, NE) where the samples were ground, homogenized, and analyzed for analysis of nutrient content: nitrogen, potassium and phosphorus.
For 96 plots representing 24 genotypes replicated in two blocks each under sufficient and low nitrogen treatments, leaf tissue was collected for metabolomics analysis. For each plot, a single plant was selected, avoiding edge plants where possible. From this plant eight leaf punches of 0.33 cm 2 in area were collected from the middle section of the leaf below the flag leaf (e.g the penultimate leaf ) and immediately frozen in liquid nitrogen. Samples were collected between 9:00 AM and 1:00 PM on August 12 2020.

Modeling traits based on hyperspectral data
Five models were developed to predict chlorophyll, nitrogen, phosphorus, and potassium concentration as well as specific leaf area from hyperspectral reflectance data, following the approach described in [18]. Measured intensity values for each wavelength were zero centered and scaled to unit variance. Wavelengths below 450 nm and above 2400 nm were discarded. Predictive models were built separately for each trait using partial least squares regression implemented in the pls v.2.8.0 [27] and caret [28]. Prior to the modeling, data were split into training ( n = 185 ) and validation set ( n = 80 ). This was done to avoid the risk of misleadingly high prediction accuracy resulting from over fitting. Decisions regarding model tuning and performance evaluation were made based on root mean squared error (RMSE) of five-fold cross validation using training set ( n = 185 ). After final models were trained, their performance was evaluated using the validation set ( n = 80 ). Final models were applied to equivalently zero centered and scaled hyperspectral reflectance measurements collected from the remaining sorghum plots.

Untargeted metabolomics using LC-MS/MS
Samples were extracted using cold methanol:acetonitrile (50:50, v/v) spiked with 100 M of CUDA (12-[(cyclohexylcarbamoyl)amino]dodecanoic acid). The tissue samples were disrupted and homogenized by adding 2 stainless steel beads (SSB 32) using the TissueLyserII (Qiagen) at 20 Hz for 5 mins. After centrifugation at 16,000 g, the supernatants were collected and the same extraction was repeated on the pellet one more time. The supernatants were pooled and vacuum dried down using a SAVANT speed-vac. The pellets were re-dissolved in 100 µ L of 30% methanol. Blank tubes were extracted alongside the samples to remove contaminant background from the data analysis. In addition, an aliquot of the samples was pooled to make a quality control (QC) sample which was run between every 10 samples in order to correct for batch effect. Two separate LC-MS/MS workflows running on a Thermo Vanquish LC system interfaced with a Thermo QE-HF mass spectrometer were used to profile the metabolites. For the hydrophobic compounds, a ACCQ-TAG ULTRA C18 column (1.7 µ m , 2.1 mm Ã-100 mm, Waters) was used flowing at 0.3 mL/min at 40 • C. The gradient of the mobile phases A (0.1% formic acid in water) and B (0.1% formic acid in acetonitrile) was as follow: 2% B for 2 min, to 50% B in 11 min, to 90% B in 2 min, hold at 90% B for 1 min, to 2% B in 0.5 min. The QE-HF was run in a data-dependent acquisition mode triggering on single charge peaks using a mass range of 67 to 1000 m/z at 60,000 resolution, with an AGC target of 3e6 and a maximum ion time of 100 ms for both positive and negative ion scans. The isolated ions were further fragmented by HCD using isolation window of 1.6 m/z and scanned at a resolution of 15,000. For the polar compounds, a XBridge Amide 3.5(4.6 x 100 mm, Waters) was used flowing at 0.4 mL/min at 45 • C. The gradient of the mobile phases A (10 mM ammonium formate/0.125% formic acid in water) and B (10 mM ammonium formate/0.125 formic acid in 95% acetonitrile) was as follow: 100% B for 2 min, to 70% B in 5.7 min, to 40% B in 1.8 min, to 30% in 0.75 min, to 100% B in 2.5 min. The QE-HF was run in a data-dependent acquisition mode triggering on single charge peaks using a mass range of 60 to 900 m/z at 60,000 resolution, with an AGC target of 1e6 and a maximum ion time of 100 ms for both positive and negative ion scans. The isolated ions were further fragmented by HCD using isolation window of 1.6 m/z and scanned at a resolution of 15,000.

LC-MS/MS data analysis
Data from LC-MS/MS analysis were processed with MS-Dial software v4.70 for peak detection, deconvolution, alignment, quantification, normalization, and identification [29]. Background peaks detected in blank extracts were filtered out. Intensity drift was corrected using the local regression (LOESS) for QC batch normalization, and zero intensities were replaced by 10% of the minimum peak height. The identification was done using the curated mass spectral public libraries (http:// prime. psc. riken. jp/ compms/ msdial) for MS/MS positive (290,915 entries, April 2021) and MS/MS negative (36,848 entries, April 2021). Metabolites missing in more than 80% of the total samples were removed. The remaining 3,496 metabolites from all four analytical conditions, that is from separation on HILIC and RP column with negative and positive ion mode, were manually checked for Gaussian chromatographic peak and, peak alignment and MS/MS profile. Identified metabolites were classified either as level I when peak matched to m/z and retention from an in-house library prepared from authentic standards, or as level II based on their spectral similarities with public/commercial spectral libraries in accordance with the Metabolomics Standards Initiative guidelines [30].