Skip to main content

Identifying genetic determinants of forage sorghum [Sorghum bicolor (Moench)] adaptation through GWAS

Abstract

Background

Forage sorghum is a highly valued crop in livestock feed production due to its versatility, adaptability, high productivity, and resilience under adverse environmental conditions, making it a crucial option for sustainable forage production. This study aimed to investigate ninety-five forage sorghum genotypes and identify the marker – trait associations (MTAs) in adaptive traits, including yield and flowering through genome-wide association studies (GWAS).

Results

Using 41,854 polymorphic SNPs, a GWAS involving the GLM, MLM, and FarmCPU models was performed to analyse fourteen adaptive traits. The population structure revealed the presence of two subpopulation groups. Linkage disequilibrium (LD) plots showed varying degrees of LD decay across the chromosomes, with an average LD decay of 19.49 kbp. Twelve common significant QTNs, encoding 17 putative candidate genes, were simultaneously co-detected and studied by at least two or more GWAS methods. Three QTNs were associated to days to 50% flowering; two each to leaf-to-stem ratio and number of nodes per plant; and one each to plant height, leaf width, number of leaves per plant, stem girth, and internodal length. Six candidate genes were associated with days to 50% flowering, two each with leaf width, stem girth, leaf-to-stem ratio, and number of nodes per plant, and one each with plant height, number of leaves per plant, and internodal length.

Conclusion

FarmCPU was identified as the most suitable and effective among all the models for controlling both false positives and false negatives. Further in-depth analysis of the newly discovered QTNs may lead to the identification of new candidate genes for the trait of interest. These studies elucidate gene functions and could transform forage sorghum breeding through marker-assisted selection and transgenic approaches, accelerating the development of superior forage sorghum varieties and enhancing global food security.

Peer Review reports

Introduction

Sorghum (Sorghum bicolor L. Moench), a resilient C4 cereal, holds immense potential for cultivation in regions with limited water resources and poor soil conditions. where other cereals fail due to its resilience and natural diversity, make it a valuable resource for developing sustainable agricultural systems [1, 2]. Forage sorghum plays a crucial role in livestock production, especially in areas facing feed shortages [3]. 20% of the world’s population of Indian livestock depend on sorghum for health and nutrition [4] in the form of fresh chop, hay, and silage. The dairy and meat industries’ demand for high-quality green and dry feed is increasing, particularly in northeastern regions like Assam. Forage sorghum is suitable for these areas due to its rapid growth, high biomass production, and minimal resource input [5]. Early maturing cultivars provide quick biomass, essential for short growing seasons or unpredictable weather [6].

Adaptive traits are phenotypes optimized for specific environments and are predominantly governed by quantitative trait loci (QTLs) in forage sorghum [7, 8]. Forage sorghum, like other crops, exhibits adaptive traits such as forage yield, plant height, and flowering time, largely governed by numerous QTLs [9, 10]. Understanding the genetic structure of these complex traits under natural and artificial selection is crucial for effective breeding [11]. Sorghum’s resilience to harsh environments, diverse germplasm, small genome size, biofuel potential, and role as a model for tropical grasses with complex genomes make it an ideal subject for plant genomics [1, 12]. However, extensive research was done in grain and sweet sorghum but very limited genomic resources available in forage sorghum impede the determination of forage yield-attributing traits, slowing the breeding process [13]. However, the development of high-yielding and adaptable forage sorghum cultivars remains a significant challenge due to the lack of sufficient diverse genetic resources and a comprehensive understanding of the underlying genetic architecture of adaptive traits. The complexity of forage breeding necessitates modern tools for greater efficiency and accuracy, as traditional biparental mapping is hindered by limited recombination events and allelic diversity, restricting the definition of genetic architecture.

In recent years, association mapping (AM) has emerged as a pivotal tool in molecular breeding, uses historic recombination to uncover associations between genetic markers and traits of interest across diverse germplasm without the necessity of constructing a time-intensive mapping population [14]. Sorghum, with its small genome size and moderate linkage disequilibrium (LD), is an ideal species for identifying natural variations underlying complex agronomic traits through genome-wide association studies (GWAS) to pinpoint relationships between genetic markers and agronomic traits [15]. Leveraging high-density single nucleotide polymorphisms (SNPs), The success of association studies hinges on the careful selection of germplasm, the quality of genotypic and phenotypic data, and the implementation of robust statistical analyses for detecting and validating marker-phenotype associations [16]. GWAS leverage high-density SNP markers, historical recombination in diverse panels and advancements in high-throughput genotyping and computational capacity to overcome traditional linkage mapping limitations, revolutionizing the mapping of complex traits like adaptive traits [17, 18]. Sorghum, with its compact genome and moderate LD patterns, stands as an ideal candidate for dissecting natural genetic variations associated with complex adaptive traits [1, 19]. The resulting genome-wide SNP variation map accelerates molecular breeding efforts by expanding the pool of available germplasm for crop improvement initiatives, while concurrently refining the efficacy of GWAS, marker-assisted selection, and genomic selection strategies [20].

In this investigation, GWAS models including FarmCPU (Fixed and Random Model Circulating Probability Unification), GLM (general linear model), and MLM (mixed linear model) were employed to navigate the complexities of marker-trait associations (MTAs). The GLM integrated with principal component analysis (PCA) effectively mitigates false positives stemming from population structure, whereas the MLM, incorporating a kinship matrix alongside PCA, further diminishes spurious associations linked to familial relatedness [21, 22]. Despite its robust performance in minimizing false positives, the MLM method can occasionally yield false negatives, thereby impacting the accurate identification of true associations [23]. Despite MLM’s effectiveness in reducing false positives, it is limited as a single-locus model, inadequate for complex traits influenced by multiple loci, the FarmCPU model, using a modified MLM called the multiple loci linear mixed model (MLMM), mitigates confounding between testing markers and kinship through a stepwise approach, enhancing statistical power without compromising false positives and negatives [24, 25]. GWAS has revolutionized plant genetics and breeding by pinpointing genetic loci linked to traits without prior gene knowledge. Functional characterization of identified QTLs is essential for understanding their biological roles [26, 27]. Thus, the identification of candidate genes via GWAS holds transformative potential for advancing plant breeding practices and fortifying global food security efforts. Harnessing the power of GWAS enables researchers to decipher the genetic underpinnings of adaptive traits in forage sorghum, crucial for developing resilient, high-yielding cultivars tailored to diverse agricultural environments. Given the dearth of genomic resources specific to forage sorghum, this study endeavoured to identify quantitative trait loci (QTLs) and associated candidate genes for adaptive traits through diverse GWAS models, thereby augmenting the efficacy of genomics-assisted breeding strategies. To address these above-mentioned rationales, this study aimed to identify quantitative trait loci (QTLs) associated with adaptive traits in forage sorghum using genome-wide association studies (GWAS). The effectiveness and reliability of GWAS were evaluated by comparing the performance of different models, including FarmCPU, GLM, and MLM, aiming to deepen our understanding of the genetic architecture behind these complex traits. The goal is to provide valuable insights for developing improved forage sorghum cultivars tailored to specific agro-ecological conditions, optimizing sorghum as a feed crop in resource-limited regions, making breeding programs more efficient, and contributing to sustainable livestock production and food security.

Materials and methods

Plant material and field experiments

Initially, India lacked forage-specific sorghum lines, relying instead on Sudan-sorghum cross derivatives and Sudan-type sorghum for forage. The lines in this study were developed from grain-type sorghums found suitable for forage. Pedigree populations were created, and promising forage lines were selected, though they were not tested in multi-location trials (MLTs) for adaptation. However, pedigree and genetic diversity suggest adaptability. Several well-adapted varieties, including PC615, MP Chari, CSV 33MF, CSV 21 F, UMPC503, and SSG-59-3, are now cultivated across India. This study evaluated 95 forage sorghum genotypes (Supplementary Table 1) across five environments during the 2020–21 rabi, summer, and kharif seasons. Field trials were conducted at two locations: Assam Agricultural University (AAU), Jorhat, and the Indian Institute of Millet Research (IIMR), Hyderabad. The environments were: E1—Rabi 2020 at IIMR, E2—Rabi 2020 at AAU, E3—Summer 2021 at AAU, E4—Kharif 2021 at AAU, and E5—Rabi 2021 at AAU. While Hyderabad is ideal for sorghum forage, Assam is a non-traditional region for its cultivation. Pooled data, analysed using BLUP estimates, were used for further analysis. The experiment was conducted using a fully randomized block design, with two replications at each location. The genotypes were planted with a spacing of 45 × 20 cm. Each plot consisted of a single row that was 3 m long and contained 15 plants of each genotype. The experiment followed conventional agronomic practices and implemented protection measures.

Phenotyping

A total of fourteen yield traits were recorded for five randomly selected competitive plants of each genotype in each replicate. These yield traits included days to 50% flowering (FDF, in days), plant height at 50% flowering (PH, in cm), number of leaves per plant (NLP), leaf length (LFL, in cm), leaf width (LFW, in cm), leaf area index (LAI), leaf-to-stem ratio (LSR), stem girth (SGT, in mm), number of nodes per plant (NNP), internodal distance (IL, in cm), panicle length (PL, in cm), dry matter content (DMC, in %), dry fodder yield per plant (DFYP, in g) and green fodder yield per plant (GFYP, in g). The best linear unbiased prediction (BLUP) estimate provides accurate results in multiple environment trials. BLUP estimates of 95 forage genotypes for all the morphological traits were integrated with the SNP genotyping data in the VCF file using VCF tools during GWAS analysis and are presented in Supplementary Table 2.

Statistical analysis

The analyses were conducted in R Studio, utilizing R version 4.1.2 [28, 29]. The metan package [30] was used to estimate the BLUP value of the genotypes across the environments and further subjected for genetic variability and correlation analysis. The variability package [31] was utilized for analysing ANOVA, genetic variability parameters, and correlation studies. The ggplot2 package version 3.3.4 [32] was employed for visualizing ggplot.

The simplest linear model with interaction effects used to analyse data from Mult environment trials, as given by Olivoto and Lúcio [30], is

$$\:{y}_{ijk}=\mu\:+{\alpha\:}_{i}+{\tau\:}_{j}+{\left(\alpha\:\tau\:\right)}_{ij}+{\gamma\:}_{jk}+{\epsilon\:}_{ijk}$$

where yijk is the response variable observed in the kth block of the ith genotype in the jth environment. where µ is the grand mean; αi is the effect of the ith genotype; τj is the effect of the jth environment; (ατ)ij is the interaction effect of the ith genotype with the jth environment; γjk is the effect of the kth block within the jth environment; and εijk is the random error. In a mixed-effect model if αi and (ατ)ij are random effects, the formula for the best linear unbiased estimate (BLUP) of phenotype according to Henderson [33] is:

$$\:y=X\beta\:+Zu+\epsilon\:$$

where y, β, u, and ε represent the observed phenotype, fixed effect vector, random effect vector, and residual, respectively, and X and Z represent incidence matrices.

DNA isolation and genome sequencing

In 2019-20, sorghum seeds were sown in coffee pots. Tender young leaf samples were harvested for DNA isolation and exported to Nx-Gen Bio (NGB Diagnostics Pvt Ltd.), New Delhi, for sequencing using SNP markers. The genomic DNA from each of the forage sorghum lines under study was isolated using a Plant DNA Mini Kit following the manufacturer’s protocol. The sorghum DNA samples were assessed for quality by 0.8% agarose gel electrophoresis. After that, an outsourced Nx-Gen Bio (NGB Diagnostics Pvt Ltd.) New Delhi for Genotype by Sequencing (151 × 2) bp chemistry protocol was used for SNP discovery. The original sequencing data acquired by high-throughput sequencing platforms (Illumina Sequencers) were first transformed to sequence reads by base calling with rMVP software.

SNP genotyping

The FASTX Toolkit (version 0.0.13) was used to separate samples from combined data based on bar codes. fastx_barcode_splitter.pl (FASTX Toolkit), Barcode Splitter is a program that reads FASTA/FASTQ files and splits them into several smaller files based on barcode matching. FastQC (version 0.11.5, http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used for quality checking. The parameters included the base quality score distribution, sequence quality score distribution, average base content per read and GC distribution in the reads. The universal Illumina adapter (AGATCGGAAGAGC) was removed using trim galore (version 0.6.2, https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), a wrapper script used to automate quality and adapter trimming as well as for quality control. Reads were mapped against the reference genome (Sorghum) using the MEM algorithm of BWA (version 0.7.5). Picard (version 1.48) and SAMtools (v 0.1.19) were used to handle SAM files generated from bwa.

Variant calling was performed using the GATK pipeline (version v3.6, https://gatk.broadinstitute.org/hc/en-us) for all samples. Variants were filtered to include only SNPs (indels were eliminated). After alignment with the reference genome, 3,300,867 SNPs were found in 95 forage sorghum genotypes (Sorghum bicolor v3.1.1). Filtering was performed to remove high frequency missing loci, minor alleles, and SNPs that were not assigned to any chromosomes; reduce heterozygotes; and select homozygotes. BCFtools version 1.9 was used to merge all sample-specific SNPs, which were subsequently filtered by vcftools at an MAF of 0.05 and 90% missing data. Afterwards, all the SNPs were combined to create a raw SNP file and filtered based on a 20% threshold for missing data, ensuring that 80% of the samples had SNPs. Following a rigorous filtering process, a set of 41,054 high-quality filtered SNPs was selected for further analysis.

SNP diversity, population structure, principal component, and linkage disequilibrium analyses

SNP statistics were computed using VCFtools [34] on the webpage platform SNIplay [35]. SNP-based molecular diversity parameters were analysed using SNIplay. Population structure was determined using the STRUCTURE model [36]. The parameters were set to default in the program, with K values ranging from 1 to 10. Structure Harvester [37] was used to establish an optimum K value from the exploratory STRUCTURE analysis using the Evano “Delta-K” method [38]. Based on the first five main components of PCA across all cultivars, population structure was incorporated as a covariate for the fixed effects using EIGENSTRAT smartpca [21].

Genome-wide association analysis

The GWAS was conducted using the R package ‘rMVP’ [25]. GWAS was carried out using three models available in the ‘rMVP’ package: GLM (general linear model), MLM (mixed linear model), and FarmCPU (fixed and random model circulating probability unification). rMVP effectively estimates variance components using the efficient mixed-model association eXpedited (EMMAX) algorithm. Considering population structure and cryptic kinship, a mixed effects model was used to assess the association between each SNP and phenotypic traits across all cultivars. The polymorphic SNPs, along with the phenotypic information for the fourteen forage sorghum adaptation traits in various environments, were utilized to identify marker‒trait associations using BLUP values. The GWAS models for all cultivars incorporated fixed and random factors to account for population structure and kinship information. The calculation of the Bonferroni correction for the identified relationships was performed using the R language rMVP. To identify potential SNP associations, a significance threshold of 10% FDR was set [39]. The Manhattan plot in rMVP was used to plot the SNP marker sites against their corresponding p values to effectively visualize the significant SNP marker loci. The levels of marker‒trait associations were set at a significance threshold of above 3.0 log10 Prob. Only the peak SNP was considered and used to estimate the phenotypic variance.

Putative candidate gene analysis

The identified SNPs were aligned with the reference genome of Sorghum bicolor v3.1.1 using the Phytozome13 database [19, 40] to extract flanking sequences of a defined size (e.g., 10 kb upstream and downstream). These flanking sequences were then annotated using [Phytozome13] ([https://phytozome-next.jgi.doe.gov/info/Sbicolor_v3_1_1]) to identify potential candidate genes associated with the SNPs. Genes with known or predicted functions related to the studied adaptive traits were prioritized as candidates. We further compared the identified QTL regions with the Sorghum QTL Atlas ([41]) to assess if they overlapped with previously reported QTLs for similar forage sorghum traits. This integration helped us strengthen the credibility of our candidate gene selection.

Results

Phenotypic variations and correlations

The descriptive statistics and genetic parameters are presented in Table 1, Supplementary Tables 2 and Supplementary Fig. 2. The genotype and genotype x environment interactions effect were significant (P < 0.001) on the sorghum forage yield and attributing traits according to the likelihood ratio test (LRT) against the Chi-square value. The internodal length, dry forage yield per plant, green forage yield per plant, leaf-to-stem ratio, and leaf area index were the traits associated with a high GCV. High heritability was observed for all studied traits. The correlation analyses for 14 adaptive traits of 95 forage sorghum genotypes are presented in Fig. 1. The green forage yield per plant (g) exhibited a significant and positive genotypic association with the dry forage yield per plant, leaf area index, number of nodes per plant, leaf width, leaf length, stem girth, plant height and number of leaves per plant.

Fig. 1
figure 1

Correlation plot depicting inter trait relationships in forage sorghum. Notes: * significant at P < 0.05; *** significant at P < 0.001

Table 1 Genetic variability of the fourteen adaptive traits of 95 forage sorghum genotypes

Molecular markers, population structure, principal component analysis, and LD

A total of 41,854 high-quality SNP markers were identified across all 10 forage sorghum chromosomes. The minor allele frequency was in the range of 0.016 to 0.5, whereas the major allele frequency was between 0.5 and 0.98. The maximum and minimum numbers of SNP variants were found in the MAF ranges of 0.06 to 0.08 and 0.48, respectively (Fig. 2). The total chromosome length spanned by these SNPs was 684.4 Mb, with chromosomes 1 and 9 having the longest (81.00 Mb) and shortest (59.6 Mb) chromosome lengths, respectively (Fig. 3). Chromosome 1 had the greatest number of SNPs (5982), and chromosome 7 had the lowest number (2617 SNPs), with an average of 4105. The highest SNP density was recorded on chromosome 3 (0.78 per Mb), and the lowest was found on chromosome 7 (0.4 per Mb), with an average SNP density of 0.59 per Mb. The log-likelihood revealed by Structure (Fig. 4) and the highest delta K value (139.13) occurred at a K value of 2, indicating the presence of two optimal subpopulations (Supplementary Table 3). The marginal likelihood curve began to plateau at K = 2, with population subgroups consistent with the results obtained using STRUCTURE, as shown in Fig. 5, and this result was supported by Supplementary Table 4. The 95 genotypes were divided into 2 subgroups (pop1 and pop2), which was consistent with the results of the principal component analysis based on the contributions of the first two components (Fig. 6), and this result was supported by Supplementary Table 5. Among the 95 forage sorghum genotypes, 54 genotypes used to make the SNP database fell into group 1 (population 1), consisting of the greatest number of B-lines and very few varieties that act as restorers, whereas 22 genotypes were grouped into population 2, consisting of a few popular varieties and B-lines, and 19 genotypes were retained by admixture, having mixed characteristics of both populations. LD was estimated for 10 chromosomes of forage sorghum, and LD plots showed varying degrees of LD decay. The LD decay distance was greatest (approximately 21 kb) on chromosome 6 and smallest (approximately 18.5 kb) on chromosomes 1 and 8 (Table 2 & Supplementary Fig. 2). The average LD decay across all forage sorghum genotypes for all chromosomes was 19.49 kbp.

Fig. 2
figure 2

Minor allele frequency in 95 forage sorghum genotypes

Fig. 3
figure 3

Chromosome wise SNP distribution and SNP density in 95 forage sorghum genotypes

Fig. 4
figure 4

Calculation of optimum subpopulations (K) for 95 forage sorghum genotypes based on the magnitude of the delta K number

Fig. 5
figure 5

Population structure of 95 Sorghum accessions based on inferred ancestry and a graph of the estimated membership proportion for k = 2

Fig. 6
figure 6

Principal component analysis results for 95 forage sorghum genotypes

Table 2 LD decay as measured by the r2 averaged in distance intervals across the 10 sorghum chromosomes

Loci significantly associated with adaptive traits

The Q‒Q plot analysis showed the GLM exhibited the highest deviation, identifying the most significant SNPs, followed by FarmCPU and MLM (Figs. 1 and 2, Supplementary Figs. 16). The GLM suggested false positives for all traits except the leaf-to-stem ratio, which showed a uniform distribution. The MLM showed no deviation except for the number of nodes per plant, indicating a false negative. The FarmCPU model indicated controlled false positives and negatives, identifying true causal variants in days to 50% flowering, leaf width, and internodal length, but suggested false negatives for plant height and stem girth, and false positives for the number of leaves and nodes per plant, with a uniform distribution for the leaf-to-stem ratio. Using 41,854 polymorphic SNPs (missing rate ≤ 10% and MAF ≥ 5%), a GWAS involving the GLM, MLM, and FarmCPU models was performed to analyse fourteen adaptive traits. Different numbers of significant SNPs identified through different GWAS methods were detected based on the phenotypic data for the different environments and the BLUP values. The BLUP mean values of the adaptive traits are presented in Supplementary Table 2. The significant SNPs associated with adaptive traits are presented in Table 3. A total of eight adaptive traits out of fourteen traits had significant SNPs or QTNs (quantitative trait nucleotides), and they were days to 50% flowering, plant height, leaf width, number of leaves per plant, stem girth, leaf-to-stem ratio, number of nodes per plant and internodal length. A total of 41 significant common SNPs were detected through FarmCPU, 5 significant SNPs were detected through MLM, and 372 significant SNPs were detected through GLM based on eight adaptive traits. The common, significant SNPs were identified and selected by comparing all the methods to determine which were more reliable than the SNPs selected from a particular method. A total of 12 common significant QTNs encoding candidate genes were simultaneously co-detected and studied by at least two or more GWAS methods. These QTNs were considered as stable genotypes across the different GWAS methods and environments. The number of significant common SNPs or QTNs identified was three for days to 50% flowering; two for the leaf-to-stem ratio and number of nodes per plant; and one each for plant height, leaf width, number of leaves per plant, stem girth and internodal length.

Table 3 List of significant QTNs simultaneously co-detected by two or more GWAS methods for adaptive traits in forage Sorghum

All identified QTNs for eight adaptive traits are summarized in Table 3. The corresponding Manhattan and Q-Q plots for days to 50% flowering (FDF) and plant height (PH) are shown in Fig. 7; for number of leaves per plant (NLP) and leaf width (LFW) in Fig. 8; for stem girth (SGT) and leaf-to-stem ratio (LSR) in Fig. 9; and for number of nodes per plant (NNP) and internodal length (IL) in Fig. 10. For days to 50% flowering (FDF), the three most significant common QTNs were identified on Chr 4: 67,820,914 bp, Chr 2: 74,592,550 bp, and Chr 5: 55,175,640 bp. The QTN on Chr 4 increased FDF in MLM and GLM models (-7.75) but reduced it in FarmCPU (-4.34) due to T–G allele substitution. The QTN on Chr 2 increased FDF in GLM (-6.12) and MLM (-5.13) but reduced it in FarmCPU (-3.19) due to G–A allele substitution. The QTN on Chr 5 increased FDF in GLM (8.3) and MLM (8.1) but reduced it in FarmCPU (3.51) due to G-A allele substitution, indicating late flowering. For plant height (PH), a significant common SNP was found on Chr 2: 59,718,618 bp, which increased height in MLM and GLM but decreased it in FarmCPU due to A–G allele substitution. For the number of leaves per plant (NLP), a significant SNP on Chr 5: 67,306,771 bp had a reducing effect across all models due to G–C and G–A allele substitutions, leading to fewer leaves. For leaf width (LFW), a significant SNP on Chr 8: 59,331,815 bp reduced leaf width due to A–G and C–A allele substitutions. For stem girth (SGT), the SNP on Chr 9: 55,133,329 bp identified by FarmCPU and GLM had a reducing effect due to G–C allele substitution. For the leaf-to-stem ratio (LSR), the SNP on Chr 6: 51,978,249 bp had an increasing effect due to C–A allele substitution, while the SNP on Chr 7: 9,351,448 bp had varying effects across models due to T-C allele substitution. For the number of nodes per plant (NNP), the SNP on Chr 10: 4,831,595 bp increased NNP due to C-A allele substitution, and the SNP on Chr 2: 56,651,140 bp reduced it due to C-G allele substitution. For internodal length, the SNP on Chr 5: 67,306,771 bp identified by FarmCPU and GLM had a positive effect due to G-C allele substitution.

Fig. 7
figure 7

Manhattan and Q–Q plots displaying significant associations of QTNs with days to 50% flowering (FDF) and plant height (PH) using the FarmCPU, MLM and GLMs

Fig. 8
figure 8

Manhattan and Q–Q plots displaying significant QRTN associations with the number of leaves per plant (NLP) and leaf width (LFW) using the FarmCPU, MLM and GLMs

Fig. 9
figure 9

Manhattan and Q–Q plots displaying significant associations of QTNs with stem girth (SGT) and the leaf-to-stem ratio (LSR) using the FarmCPU, MLM and GLMs

Fig. 10
figure 10

Manhattan and Q–Q plots displaying significant associations of QTNs with the number of nodes per plant (NNP) and internodal length (IL) using the FarmCPU, MLM and GLM

Identification and functional characterization of candidate genes

In this study, the significant SNPs in each stable QTL associated with eight traits were regarded as potential candidate gene regions. A total of 17 putative candidate genes were identified by screening the annotated genes of forage sorghum for the studied adaptive traits through a reference genome (Sorghum bicolor v3.1.1) at “Phytozome13”. The list of candidate genes for adaptive traits in forage sorghum using the FarmCpU, MLM and GLMs is presented in Table 4. These candidate genes were identified at their respective associated loci, and their functions were annotated from the Sorghum bicolor genome [1] and its reference genome [19]. A total of six candidate genes were identified for days to 50% flowering, with two genes each associated with leaf width, stem girth, leaf-to-stem ratio, and number of nodes per plant. Additionally, one gene each was linked to plant height, number of leaves per plant, and internodal length.

Table 4 List of candidate genes for adaptive traits in forage sorghum using FarmCPU, MLM and GLM-based GWAS (Paterson et al. [1]; McCormick et al. [19])

For days to 50% flowering, the SNP at 67,820,914 bp on chromosome 4, associated with the genes Sobic.004G349175, Sobic.004G349300, and Sobic.004G349400, was identified. Sobic.004G349175 encodes an ethylene receptor-like protein 2, Sobic.004G349300 encodes a calmodulin-binding receptor-like cytoplasmic kinase 1, and Sobic.004G349400 encodes a RING-type domain-containing protein. Additionally, the SNP at 74,592,550 bp on chromosome 2 governs the gene Sobic.002G393000, which encodes pyrrolidone-carboxylate peptidase. On chromosome 5, the SNP at 55,175,640 bp affects the gene Sobic.005G126600, encoding a transport inhibitor response 1-like protein. For plant height, the SNP at 59,718,618 bp on chromosome 2, positioned in the genic region Sobic.002G205500, encodes cellulose synthase. For the number of leaves per plant, the gene Sobic.005G188400 on chromosome 5 encodes Remorin, a C-terminal region family protein. For leaf width (LFW), the SNP at 59,331,815 bp on chromosome 8 is associated with Sobic.008G159901, encoding SERINE/THREONINE-PROTEIN KINASE NEK4, and Sobic.008G160000, encoding a tetratricopeptide repeat protein. For stem girth, the SNP at 55,133,329 bp on chromosome 9 is linked to Sobic.009G201800, encoding nuclear pore complex protein Nup53, and Sobic.009G201900, encoding peptidyl-tRNAs. For the leaf-to-stem ratio, the SNP at 51,978,249 bp on chromosome 6 affects the gene Sobic.006G162000, encoding Kinesin heavy chain. On chromosome 7, the SNP at 9,351,448 bp is associated with Sobic.009G074300, encoding 3-epi-6-deoxocathasterone 23-monooxygenase. For the number of nodes per plant, the SNP at 4,831,595 bp on chromosome 10 is linked to Sobic.010G061800, encoding a nodulin-like protein. Another SNP at 56,651,140 bp on chromosome 2 is associated with Sobic.002G18370, encoding L-ascorbate oxidase. For internodal length, the SNP at 67,306,771 bp on chromosome 5 is positioned in the genic region Sobic.005G188400, encoding a Remorin_C domain-containing protein.

Discussion

The improvement of fodder productivity based on the integration of good traits into a single cultivar. However, compared with those of rice [41] and maize [42], the available sorghum genome resources are limited for genetic analysis of fodder yield traits. Systematic evaluation and rational utilization of germplasm resources play a key role in the improvement of sorghum forage germplasm resources. Through the rational utilization of breeding materials, ideal varieties with allelic variation can be obtained for fodder yield traits. There was sufficient genetic variation among the 95 forage sorghum genotypes based on LRT test and genetic variability studies, indicating the potential for developing high-yielding forage cultivars. In the present study, strong correlations among the individual traits were revealed by the BLUP data, providing a basis for direct or simultaneous selection for high forage yield improvement. These results were consistent with those of other recent studies [43, 44]. SNP markers are abundant and can be used to study genetic diversity in sorghum [45, 46]. Sorghum landraces might have experienced intense selection on agronomic and adaptive genes since domestication. Genetic diversity, population structure, and selection signature studies can enhance resource conservation and evolutionary genomics studies. In this study, 41,854 polymorphic SNPs were generated, creating an important genomic resource for Sorghum bicolor, phenotype research, molecular breeding and related species in the future. The population structure of 95 forage sorghum lines was characterized using genome-wide SNPs, which are a priority in decisions regarding selection strategies and plant breeding options for elite sorghum varieties and hybrids [47]. The whole-genome SNPs were used to analyse the population structure and genetics of the forage sorghum germplasms. The maximum and minimum numbers of SNP variants were found in the MAF range of 0.06 to 0.08 and 0.48, respectively, and these results were also reported by [48]. There was significant variation in SNP distribution and SNP density among the chromosomes, indicating that the possibility of obtaining the desired genes was greater with less effort and time on chromosomes with high SNP density. These findings are similar to those reported by Ruperao et al. [44], Mace et al. [49] and Yan et al. [50]. This study has established a valuable resource for future research on Sorghum bicolor, including phenotype studies, molecular breeding, and related species.

Population structure, which divides the population into kin-related subgroups, might generate false positives in association mapping and lower the power of association mapping [51]; hence, it is necessary to neutralize its impact [52]. The LD within a population forms the basis of GWAS, but it is often influenced by mutations, genomic rearrangements, genetic draft, population stratification, and natural selection [14]. The structure infers population groupings based on genotypes of sampled individuals, the populations of origin of the individuals, and allele frequencies in all populations [36]. An 80% threshold of group membership was used to classify cultivars into population subgroups. Genotypes with a ≥ 80% probability level were assigned to the respective subgroups, while genotypes with a < 80% probability level were considered to indicate admixture. In the present study, the ADMIXTURE and principal component analyses produced the same results regarding population structure, with the 95 accessions divided into two subgroups. Similar results were reported by Niu et al. [47], Chakrabarty et al. [53] and Morris et al. [54].

Characterizing linkage disequilibrium (LD) is essential for designing association studies, interpreting association peaks, and transferring alleles in marker-assisted selection (MAS) [54]. To assess mapping resolution for genome scans and GWAS, the average LD decay and localized LD patterns for each chromosome were measured. The overall LD decay across chromosomes averaged 19.49 kbp, consistent with findings in sorghum by Boatwright et al. [55] and Olatoye et al. [56]. Sorghum, a partially cross-pollinated crop, shows LD levels between those of self-pollinated species like rice (75–150 kb) and cross-pollinated plants like maize (~ 2 kb) [54]. Variations in LD reflect differences in outcrossing and recombination rates, with LD decay distance guiding candidate intervals for QTLs/genes. This study adds to our understanding of crop evolution across diverse agroclimatic regions.

Among 14 traits, 8 showed significant QTNs. Some genetic variants may have minimal effects, making it hard to detect significant associations without a large sample size [57]. Small sample sizes also reduce statistical power, hindering the detection of true associations [58]. Epistasis, population stratification, and weak LD between SNPs and causal variants can further obscure associations [57, 58]. This study identified desirable QTNs for early maturity on chromosomes 4 and 2, and an undesirable QTN on chromosome 5 for days to 50% flowering. For the number of nodes per plant, desirable QTNs were found on chromosome 10 and undesirable ones on chromosome 2. In the leaf-to-stem ratio, desirable QTNs were located on chromosome 6 and undesirable ones on chromosome 7. Additionally, desirable QTNs were identified for plant height and internodal length, while undesirable effects were noted for NNP and leaf width. These promising QTNs merit further evaluation for inclusion in breeding programs.

Significant SNPs associated with days to 50% flowering at chromosome 2 and different QTNs identified on the same chromosome were previously reported by Enyew et al. [43], Faye et al. [59], Tefera [60], Mace et al. [61] and Bai et al. [62]. The QTN on chromosome 4 was associated with flowering time and was previously reported on the same chromosome by Schaffasz et al. [63], Mace et al. [61] and Tefera [60]. The QTN on chromosome 5 was associated with flowering time and was previously reported on the same chromosome by Zhao et al. [64] and Nagaraja Reddy et al. [65]. Two significant SNPs (at positions 12262531 bp and 1279510 bp) were identified on chromosome 1 for plant height and previously reported on the same chromosome by Wang et al. [66], Enyew et al. [43] and Mace et al. [61]. The significant SNPs for plant height on chromosome 6 were also previously reported by Enyew et al. [43], Bouchet et al. [67] and Zhao et al. [64]. A significant SNP (at position 59288155 bp) associated with stem girth was identified on chromosome 6 and previously reported on the same chromosome by Schaffasz et al. [63]. Significant SNPs were identified on chromosomes 5 and 4 and were associated with the number of leaves per plant, and contradictory results were reported by Lopez et al. [68], Tefera [60], and Fakrudin et al. [69]. Brenton et al. [70] also reported the same candidate gene on chromosome 4 in sorghum for the same function for the number of nodes per plant. Desirable QTNs with desirable phenotypic effects were identified for days to 50% flowering, plant height, leaf-to-stem ratio and internodal length. Most of the QTNs/SNPs identified for the studied traits in the present investigation were previously unreported. Specifically, three novel QTNs related to plant height were identified, whereas two QTNs related to the number of leaves and one QTN related to stem girth were identified. Additionally, novel QTNs were identified for the remaining traits. However, further comprehensive studies using diverse large association populations are needed to validate and characterize these novel QTNs. This verification is essential for future breeding programmes. GWAS uncovers the genetic loci associated with target traits without prior knowledge of the genes or their functions. The identified QTLs are usually located in noncoding regions of the genome or in regions with unknown functions. Therefore, functional characterization of the QTLs is necessary to understand their biological roles and their contributions to the trait of interest, according to Huang et al. [41], Hao et al. [18] and Varshney et al. [26].

The candidate genes linked to flowering time in sorghum are summarized in Table 4. The Sorghum bicolor genome, published by Paterson et al. [1] and its reference by McCormick et al. [19], enabled the identification of gene functions. This study identified six key flowering time genes. The first, Sobic.004G349175 on chromosome 4, encodes an ethylene receptor-like protein 2, a two-component sensor histidine kinase that regulates flower opening via ethylene signaling [1, 19], affecting the genetic circuits that control flowering [71]. Ethylene has been shown to both inhibit and accelerate flowering in different species [72, 73]. The second gene, Sobic.004G276400 on chromosome 4, encodes PHOTOPERIOD-INDEPENDENT EARLY FLOWERING 1 (PIE1), which interacts with flowering locus C (FLC) and other pathways to influence flowering [74]. Another gene, Sobic.004G349300, encodes Calmodulin-binding receptor-like cytoplasmic kinase 1, a negative regulator of flowering [75]. The gene Sobic.004G349400 encodes a RING-type domain protein involved in photomorphogenesis and flowering time in Arabidopsis [76, 77]. On chromosome 2, Sobic.002G393000 encodes pyrrolidone-carboxylate peptidase, involved in proteolysis related to flowering [78]. A maturity gene (maturity2), located 6.61 Mb away from this gene, suggests potential colocalization [59]. Another gene, Sobic.005G126600, encodes a transport inhibitor response 1-like protein, crucial for floral processes such as circadian rhythms and development [79, 80]. A SNP at 59,718,618 bp in Sobic.002G205500, encoding cellulose synthase, is linked to plant height by influencing cell wall biosynthesis and biomass accumulation [81,82,83]. The gene Sobic.005G188400 on chromosome 5, affecting leaf number, encodes Remorin, which is essential for cell-to-cell biomolecule transport and carbohydrate accumulation [84, 85]. Sobic.008G159901, associated with leaf width, encodes SERINE/THREONINE-PROTEIN KINASE NEK4, important for cell cycle regulation [86], while Sobic.008G160000 encodes a tetratricopeptide repeat protein, vital for early chloroplast biogenesis and plant growth [87]. On chromosome 9, Sobic.009G201800, linked to stem girth, encodes the nuclear pore complex protein Nup53, involved in macromolecule trafficking and embryonic development [88, 89]. Sobic.009G201900, encoding Peptidyl-tRNA hydrolase 2, is critical for chloroplast development, though its role in sorghum requires validation [90]. The gene Sobic.006G162000, associated with leaf-to-stem ratio, encodes Kinesin heavy chain, playing roles in trichome development, cytoskeletal organization, and internodal elongation [91, 92]. Sobic.009G074300 on chromosome 9 encodes 3-epi-6-deoxocathasterone 23-monooxygenase, a key enzyme in brassinosteroid biosynthesis, which influences plant height and biomass [93]. The DW1 gene regulates brassinosteroid signaling, limiting internode cell proliferation, resulting in a dwarf phenotype, fewer nodes, and a reduced leaf-to-stem ratio [94, 95]. For node number, Sobic.010G061800 encodes a nodulin-like protein, involved in nutrient transport and plant development [96], while Sobic.002G18370 shows similarity to rice’s L-ascorbate oxidase, essential for ascorbic acid biosynthesis and cell growth [97, 98]. Sobic.005G188400, associated with internodal length, encodes a Remorin_C domain protein, which regulates plant development, photoassimilate translocation, and carbohydrate accumulation [24, 99]. Further research is needed to validate these genes for use in sorghum breeding.

Analysis of Q-Q plots for eight traits showed that the MLM method identified the fewest significant SNPs, indicating strict criteria but potential false negatives due to overfitting. The FarmCPU model was the most effective, balancing false positives and negatives in identifying QTNs, while the GLM method detected the most SNPs, suggesting a higher false-positive rate. These results are consistent with findings from Kaler et al. [100], Tamba et al. [101], and Wen et al. [102], with Liu et al. [24] highlighting FarmCPU’s reduction of overfitting, MLM’s conservatism leading to false negatives, and GLM’s broader detection increasing false positives. Overall, FarmCPU was the most balanced model for accurate QTN identification.

Conclusion

The genotypes exhibited substantial genetic diversity, as evidenced by population structure and principal component analysis using SNP markers. Forage sorghum, often overshadowed by grain and sweet sorghum, suffers from a scarcity of genomic resources crucial for identifying adaptive traits, thereby impeding breeding efforts. To address this, GWAS approach was employed to uncover candidate genes and associated markers for adaptive traits. The comparison of three GWAS analytical models highlighted the strengths and weaknesses of each and identify the most efficient model and reliable QTNs. The MLM method exhibited high stringency with minimal QTNs, it also suffered from potential false negatives. Conversely, the GLM model, although identifying the greatest number of QTNs, raised concerns regarding false positives. Ultimately, the FarmCPU model emerged as the most effective, balancing the identification of QTNs with reliability. The functional characterization of these candidate genes is essential for understanding the regulatory mechanisms of adaptive traits. Validation of the identified QTNs with diverse genetic markers across different populations is necessary. These studies elucidate gene functions and enable targeted trait improvements, advancing plant breeding through marker-assisted selection and transgenic plant development. This accelerates the creation of new forage sorghum varieties with essential traits, potentially transforming forage sorghum breeding and enhancing global food security.

Data availability

The data supporting the study’s findings are available from the authors upon reasonable request.

References

  1. Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J. The Sorghum bicolor genome and the diversification of grasses. Nature. 2009;457(7229):551–6.

    Article  CAS  PubMed  Google Scholar 

  2. Elias M, Chere D, Lule D, Serba D, Tirfessa A, Gelmesa D, Tesso T, Bantte K, Menamo TM. Multi-locus genome‐wide association study reveals genomic regions underlying root system architecture traits in Ethiopian sorghum germplasm. TPG. 2024; e20436.

  3. Ping J, Zhang F, Niu H, Yang H, Lv X, Du Z, Li H, Wang Y. Genetic diversity analysis of germplasm resources of forage Sorghum based on SSR marker. Mol. Plant Breed. 2018;14:4663–70.

    Google Scholar 

  4. Tonapi VA, Talwar HS, Are AK, Bhat BV, Reddy CR, Dalton TJ, editors. Sorghum in the 21st century: Food, fodder, feed, fuel for a rapidly changing world. Singapore: Springer; 2020.

    Google Scholar 

  5. Talukdar JN. (2006). Fodder cultivation in Assam. North–East Veterinarian. 2006; 5(4): 10–12.

  6. Bora SS, Sharma KK, Borah KA, Saud RK. Opportunities and challenges of forage cultivation in Assam-A Review. Forage Res. 2020;45(4):251–7.

    Google Scholar 

  7. Aruna C, Audilakshmi S. A strategy to identify potential germplasm for improving yield attributes using diversity analysis in sorghum. Plant Genet Res. 2008;6(3):187–94.

    Article  Google Scholar 

  8. Barrett RD, Hoekstra HE. Molecular spandrels: tests of adaptation at the genetic level. Nat Rev Genet. 2011;12(11):767–80.

    Article  CAS  PubMed  Google Scholar 

  9. Zou G, Zhai G, Feng Q, Yan S, Wang A, Zhao Q, Shao J, Zhang Z, Zou J, Han B, Tao Y. Identification of QTLs for eight agronomically important traits using an ultra-high-density map based on SNPs generated from high-throughput sequencing in sorghum under contrasting photoperiods. J Exp Bot. 2012;63(15):5451–62.

    Article  CAS  PubMed  Google Scholar 

  10. Boyles RE, Pfeiffer BK, Cooper EA, Zielinski KJ, Myers MT, Rooney WL, Kresovich S. Quantitative trait loci mapping of agronomic and yield traits in two grain sorghum biparental families. Crop Sci. 2017;57(5):2443–56.

    Article  CAS  Google Scholar 

  11. Timpson NJ, Greenwood CM, Soranzo N, Lawson DJ, Richards JB. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat Rev Genet. 2018;19(2):110–24.

    Article  CAS  PubMed  Google Scholar 

  12. Harris K, Subudhi PK, Borrell A, Jordan D, Rosenow D, Nguyen H, Klein P, Klein R, Mullet J. Sorghum stay-green QTL individually reduce post-flowering drought-induced leaf senescence. J Exp Bot. 2007;58(2):327–38.

    Article  CAS  PubMed  Google Scholar 

  13. Xin Z, Wang M, Cuevas HE, Chen J, Harrison M, Pugh NA, Morris G. Sorghum genetic, genomic, and breeding resources. Planta. 2021;254(6):114.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Hao S, Lou H, Wang H, Shi J, Liu D, Baogerile, Tao J, Miao S, Pei Q, Yu L, Wu M. Genome-wide association study reveals the genetic basis of five quality traits in Chinese wheat. Front Plant Sci. 2022;13:835306.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Hamblin MT, Salas Fernandez MG, Casa AM, Mitchell SE, Paterson AH, Kresovich S. Equilibrium processes cannot explain high levels of short-and medium-range linkage disequilibrium in the domesticated grass Sorghum bicolor. Genet. 2005;171(3):1247–56.

    Article  CAS  Google Scholar 

  16. Ibrahim AK, Zhang L, Niyitanga S, Afzal MZ, Xu Y, Zhang L, Zhang L, Qi J. Principles and approaches of association mapping in plant breeding. Trop. Plant Biol. 2020;13:212–24.

    Google Scholar 

  17. Zhao Y, Qiang C, Wang X, Chen Y, Deng J, Jiang C, Li J. New alleles for chlorophyll content and stay-green traits revealed by a genome wide association study in rice (Oryza sativa). Sci Rep. 2019;9(1):2541.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Hao H, Li Z, Leng C, Lu C, Luo H, Liu Y, Jing HC. Sorghum breeding in the genomic era: opportunities and challenges. Theor Appl Genet. 2021;134:1899–924.

    Article  PubMed  PubMed Central  Google Scholar 

  19. McCormick RF, Truong SK, Sreedasyam A, Jenkins J, Shu S, Sims D, Mullet JE. The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization. TPJ. 2018;93(2):338–54.

    CAS  Google Scholar 

  20. Morrell PL, Buckler ES, Ross-Ibarra J. (2012). Crop genomics: advances and applications. Nat. Rev. Genet.2012; 13(2): 85–96.

  21. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38(8):904–9.

    Article  CAS  PubMed  Google Scholar 

  22. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, Buckler ES. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38(2):203–8.

    Article  CAS  PubMed  Google Scholar 

  23. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Buckler ES. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010;42(4):355–60.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Liu X, Huang M, Fan B, Buckler ES, Zhang Z. Iterative usage of fixed and random effect models for powerful and efficient genome-wide association studies. PLoS Genet 2016; 12(2), e1005767.

  25. Yin L, Zhang H, Tang Z, Xu J, Yin D, Zhang Z, Liu X. rMVP: a memory-efficient, visualization-enhanced, and parallel-accelerated tool for genome-wide association study. GPB. 2021;19(4):619–28.

    PubMed  PubMed Central  Google Scholar 

  26. Varshney RK, Singh VK, Kumar A, Powell W, Sorrells ME. Can genomics deliver climate-change ready crops? Curr. Plant Biol. 2018;45:205–11.

    CAS  Google Scholar 

  27. Liu H, Prashar A, Jones G. Candidate genes and molecular markers associated with physiological traits for heat tolerance in chickpea. Plant Env Dev. 2017;40(8):1652–67.

    Google Scholar 

  28. R Core Team. 2021. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/

  29. .2022. RStudio: Integrated Development Environment for R. Posit Software, PBC, Posit team, Boston. MA. URL http://www.posit.co/

  30. Olivoto T, Lúcio ADC. Metan: an R package for multi-environment trial analysis. MEE. 2020;11(6):783–9.

    Google Scholar 

  31. Popat R, Patel R, Parmar D. 2020. Variability: genetic variability analysis for plant breeding research. R package version 0.1. 0.

  32. Wickham H. 2016. ggplot2: Elegant graphics for data analysis (New York: Springer-Verlag). https://ggplot2.tidyverse.org

  33. Henderson CR. Use of all relatives in intra herd prediction of breeding values and producing abilities. JDS. 1975;58(12):1910–6.

    Google Scholar 

  34. Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo. MA.1000 Genomes Project Analysis Group. The variant call format and VCFtools. Bioinfo. 2011; 27(15): 2156–2158.

  35. Dereeper A, Nicolas S, Le Cunff L, Bacilieri R, Doligez A, Peros JP, This P. SNiPlay: a web-based tool for detection, management and analysis of SNPs. Application to grapevine diversity projects. BMC Bioinfo. 2011;12:1–14.

    Article  Google Scholar 

  36. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genet. 2000;155(2):945–59.

    Article  CAS  Google Scholar 

  37. Earl DA, VonHoldt BM. STRUCTURE HARVESTER: a website and program for visualizing STRUCTURE output and implementing the Evanno method. Conserv Genet Resour. 2012;4:359–61.

    Article  Google Scholar 

  38. Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: a simulation study. Mol Ecol. 2005;14(8):2611–20.

    Article  CAS  PubMed  Google Scholar 

  39. Mc Couch SR, Wright MH, Tung CW, Maron LG, McNally KL, Fitzgerald M, Mezey J. Open access resources for genome-wide association mapping in rice. Nat Commun. 2016;7(1):10532.

    Article  CAS  PubMed  Google Scholar 

  40. Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Rokhsar DS. Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res. 2012;40(D1):D1178–86.

    Article  CAS  PubMed  Google Scholar 

  41. Huang X, Wei X, Sang T, Zhao Q, Feng Q, Zhao Y, Han B. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat Genet. 2010;42(11):961–7.

    Article  CAS  PubMed  Google Scholar 

  42. Jiao Y, Zhao H, Ren L, Song W, Zeng B, Guo J, Lai J. Genome-wide genetic changes during modern breeding of maize. Nat Genet. 2012;44(7):812–5.

    Article  CAS  PubMed  Google Scholar 

  43. Enyew M, Feyissa T, Carlsson AS, Tesfaye K, Hammenhag C, Seyoum A, Geleta M. Genome-wide analyses using multilocus models revealed marker–trait associations for major agronomic traits in Sorghum bicolor. Front Plant Sci. 2022a;13:999692.

    Article  PubMed  PubMed Central  Google Scholar 

  44. Ruperao P, Gandham P, Odeny DA, Selvanayagam S, Thirunavukkarasu N, Das RR, Rathore A. DeepVariant calling provides insights into race diversity and its implication for sorghum breeding. bioRxiv. 2022; 09.

  45. Elangovan M, Kiran Babu P, Seetharama N, Patil JV. Genetic diversity and heritability characters associated in sweet sorghum [Sorghum bicolor (L.) Moench]. Sugar Tech. 2014;16(2):200–10.

    Article  Google Scholar 

  46. Silva KJD, Pastina MM, Guimarães CT, Magalhães JV, Pimentel LD, Schaffert RE, Menezes CBD. Genetic diversity and heterotic grouping of sorghum lines using SNP markers. Sci Agric. 2020;78:e20200039.

    Article  Google Scholar 

  47. Niu H, Ping J, Wang Y, Lv X, Li H, Zhang F, Han Y. Population genomic and genome-wide association analysis of lignin content in a global collection of 206 forage sorghum accessions. Mol Breed. 2020;40(8):1–13.

    Article  Google Scholar 

  48. Enyew M, Feyissa T, Carlsson AS, Tesfaye K, Hammenhag C, Geleta M. Genetic diversity and population structure of sorghum [Sorghum bicolor (L.) moench] accessions as revealed by single nucleotide polymorphism markers. Front Plant Sci. 2022b;12:799482.

    Article  PubMed  PubMed Central  Google Scholar 

  49. Mace ES, Cruickshank AW, Tao Y, Hunt CH, Jordan DR. A global resource for exploring and exploiting genetic variation in sorghum crop wild relatives. Crop Sci. 2021;61(1):150–62.

    Article  CAS  Google Scholar 

  50. Yan S, Wang L, Zhao L, Wang H, Wang D. Evaluation of genetic variation among sorghum varieties from southwest China via genome resequencing. TPG. 2018;11(3):170098.

    CAS  Google Scholar 

  51. Balding DJ. A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006;7(10):781–91.

    Article  CAS  PubMed  Google Scholar 

  52. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genet. 2003;164(4):1567–87.

    Article  CAS  Google Scholar 

  53. Chakrabarty S, Mufumbo R, Windpassinger S, Jordan D, Mace E, Snowdon RJ, Hathorn A. (2022). Genetic diversity analysis and characterization of Ugandan sorghum. bioRxiv. 2022; 01.

  54. Morris GP, Ramu P, Deshpande SP, Hash CT, Shah T, Upadhyaya HD, Kresovich S. Population genomic and genome-wide association studies of agroclimatic traits in sorghum. PNAS. 2013;110(2):453–8.

    Article  CAS  PubMed  Google Scholar 

  55. Boatwright JL, Sapkota S, Jin H, Schnable JC, Brenton Z, Boyles R, Kresovich S. Sorghum Association Panel whole-genome sequencing establishes cornerstone resource for dissecting genomic diversity. TPG. 2022;111(3):888–904.

    CAS  Google Scholar 

  56. Olatoye MO, Hu Z, Maina F, Morris GP. Genomic signatures of adaptation to a precipitation gradient in Nigerian sorghum. G3: genes. Genome Genet. 2018;8(10):3269–81.

    CAS  Google Scholar 

  57. Korte A, Farlow A. The advantages and limitations of trait analysis with GWAS: a review. Plant Meth. 2013;9:1–9.

    Article  Google Scholar 

  58. Alseekh S, Kostova D, Bulut M, Fernie AR. Genome-wide association studies: assessing trait characteristics in model and crop plants. CMLS. 2021;78:5743–54.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Faye JM, Akata EA, Sine B, Diatta C, Cisse N, Fonceka D, Morris GP. Quantitative and population genomics suggest a broad role of stay-green loci in the drought adaptation of sorghum. TPG. 2022;15(1):e20176.

    CAS  Google Scholar 

  60. Tefera G. Evaluation and Genome Wide Association Mapping of Ethiopian Sorghum Landraces (Sorghum Bicolor (L.) Moench) Under Moisture Stress conditions at Miesso, Eastern Ethiopia (Doctoral dissertation, Jimma University), 2019.

  61. Mace E, Innes D, Hunt C, Wang X, Tao Y, Baxter J, Jordan D. The Sorghum QTL Atlas: a powerful tool for trait dissection, comparative genomics and crop improvement. Theor Appl Genet. 2019;132:751–66.

    Article  PubMed  Google Scholar 

  62. Bai C, Wang C, Wang P, Zhu Z, Cong L, Li D, Lu. X. QTL mapping of agronomically important traits in sorghum (Sorghum bicolor L). Euphyt. 2017;213:1–12.

    Article  CAS  Google Scholar 

  63. Schaffasz A, Windpassinger S, Friedt W, Snowdon R, Wittkop B. Sorghum as a novel crop for Central Europe: using a broad diversity set to dissect temperate adaptation. Agron. 2019;9(9):535.

    Article  Google Scholar 

  64. Zhao J, Mantilla Perez MB, Hu J, Salas Fernandez MG. Genome-wide association study for nine plant architecture traits in Sorghum. TPG. 2016;9(2):06.

    Google Scholar 

  65. Nagaraja Reddy R, Madhusudhana R, Murali Mohan S, Chakravarthi DVN, Seetharama N. Characterization, development and mapping of Unigene-derived microsatellite markers in sorghum [Sorghum bicolor (L.) Moench]. Mol Breed. 2012;29(3):543–64.

    Article  CAS  Google Scholar 

  66. Wang Y, Li J, Li M, Li Y, Zhao Z, Li C, Yue J. Genome-wide characterization of remorin genes in terms of their evolution and expression in response to hormone signals and abiotic stresses in foxtail millet (Setaria italica). Diversity. 2022;14(9):711.

    Article  CAS  Google Scholar 

  67. Bouchet S, Olatoye MO, Marla SR, Perumal R, Tesso T, Yu J, Morris GP. Increased power to dissect adaptive traits in global sorghum diversity using a nested association mapping population. Genet. 2017;206(2):573–85.

    Article  Google Scholar 

  68. Lopez JR, Erickson JE, Munoz P, Saballos A, Felderhoff TJ. Vermerris, W. QTLs associated with crown root angle, stomatal conductance, and maturity in sorghum. TPG. 2017;10(2):04.

    Google Scholar 

  69. Fakurdin B, Kavil SP, Girma Y, Arun SS, Dadakhalandar D, Gurusiddesh BH, Kamatar MY. Molecular mapping of genomic regions harbouring QTLs for root and yield traits in sorghum (Sorghum bicolor L. Moench). PMBP. 2011;1–11.

  70. Brenton ZW, Juengst, B. T., Cooper, E. A., Myers, M. T., Jordan, K. E., Dale, S. M.,… Kresovich, S. Species-specific duplication event associated with elevated levels of nonstructural carbohydrates in sorghum bicolor. G3: Genes, Genomes, Genetics. 2020;10(5):1511–1520.

  71. Ogawara T, Higashi K, Kamada H, Ezura H. Ethylene advances the transition from vegetative growth to flowering in Arabidopsis thaliana. J Plant Physiol. 2003;160(11):1335–40.

    Article  CAS  PubMed  Google Scholar 

  72. Wang Q, Zhang W, Yin Z, Wen CK. Rice CONSTITUTIVE TRIPLE-RESPONSE2 is involved in the ethylene-receptor signalling and regulation of various aspects of rice growth and development. J Exp Bot. 2013;64(16):4863–75.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Li P, Mace ES, Guo Y, Han L, Wang M, He Y, Cai H. Fine mapping of qDor7, a major qtl affecting seed dormancy in sorghum (Sorghum bicolor (L.) Moench). Trop. Plant Biol. 2016;9:109–16.

    CAS  Google Scholar 

  74. Choi J, Hyun Y, Kang MJ. In Yun H, Yun JY, Lister C, Choi Y. Resetting and regulation of FLOWERING LOCUS C expression during Arabidopsis reproductive development. TPG. 2009; 57(5), 918–931.

  75. Arefian M, Bhagya N, Prasad TK. Phosphorylation-mediated signalling in flowering: prospects and retrospects of phosphoproteomics in crops. Biol Rev. 2021;96(5):2164–91.

    Article  CAS  PubMed  Google Scholar 

  76. Kim JH, Lee HJ, Park CM. HOS1 acts as a key modulator of hypocotyl photomorphogenesis. Plant Signal Behav. 2017;12(5):e1315497.

    Article  PubMed  PubMed Central  Google Scholar 

  77. Sun J, Wang H, Ren L, Chen S, Chen F, Jiang J. CmFTL2 is involved in the photoperiod-and sucrose-mediated control of flowering time in chrysanthemum. Hort Res. 2017; 4.

  78. Hou H, Lin Y, Hou X. Ectopic expression of a pak-choi YABBY gene, BcYAB3, causes leaf curvature and flowering stage delay in Arabidopsis thaliana. Genes. 2020;11(4):370.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Schultz TF, Kiyosue T, Yanovsky M, Wada M, Kay SA. A role for LKP2 in the circadian clock of Arabidopsis. Plant Cell. 2001;13(12):2659–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Hong MJ, Kim JB, Seo YW, Kim DY. F-box genes in the wheat genome and expression profiling in wheat at different developmental stages. Genes. 2020;11(10):1154.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Muthamilarasan M, Khan Y, Jaishankar J, Shweta S, Lata C, Prasad M. Integrative analysis and expression profiling of secondary cell wall genes in C4 biofuel model Setaria italica reveals targets for lignocellulose bioengineering. Front Plant Sci. 2015;6:158655.

    Article  Google Scholar 

  82. Petti C, Hirano K, Stork J, DeBolt S. Mapping of a cellulose-deficient mutant named dwarf1-1 in Sorghum bicolor to the green revolution gene gibberellin20-oxidase reveals a positive regulatory association between gibberellin and cellulose biosynthesis. Plant Physiol. 2015;169(1):705–16.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Desprez T, Vernhettes S, Fagard M, Refrégier G, Desnos T, Aletti E, Höfte H. Resistance against herbicide isoxaben and cellulose deficiency caused by distinct mutations in same cellulose synthase isoform CESA6. Plant Physiol. 2002;128(2):482–90.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Wei Z, Tan S, Liu T, Wu Y, Lei JG, Chen Z, Liao K. Plasmodesmata-like intercellular connections by plant remorin in animal cells. Biorxiv.2019; 791137.

  85. Abel B, Buschle CA, Hernandez-Ryes C, Burkart SS, Deroubaix AF, Mergner J, aOtt T. A hetero-oligomeric remorin-receptor complex regulates plant development. BioRxiv. 2021; 2021-01.

  86. Pan Z, Baerson SR, Wang M, Bajsa-Hirschel J, Rimando AM, Wang X, Duke SO. A cytochrome P450 CYP 71 enzyme expressed in Sorghum bicolor root hair cells participates in the biosynthesis of the benzoquinone allelochemical sorgoleone. New Phytol. 2018;218(2):616–29.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Zhang B, Munske GR, Timokhin VI, Ralph J, Davydov DR, Vermerris W, Kang C. Functional and structural insight into the flexibility of cytochrome P450 reductases from Sorghum bicolor and its implications for lignin composition. JBC. 2022; 298(4).

  88. Tamura K. Nuclear pore complex-mediated gene expression in Arabidopsis thaliana. J Plant Res. 2020;133(4):449–55.

    Article  PubMed  Google Scholar 

  89. Tang Y, Huang A, Gu Y. Global profiling of plant nuclear membrane proteome in Arabidopsis. Nat Plants. 2020;6(7):838–47.

    Article  CAS  PubMed  Google Scholar 

  90. Zhang Q, Wang Y, Shen L, Ren D, Hu J, Zhu L, Qian Q. OsCRS2 encoding a peptidyl-tRNA hydrolase protein is essential for chloroplast development in rice. Plant Growth Regul. 2020;92:535–45.

    Article  CAS  Google Scholar 

  91. Ali A, Veeranki SN, Chinchole A, Tyagi S. MLL/WDR5 complex regulates Kif2A localization to ensure chromosome congression and proper spindle assembly during mitosis. Dev Cell. 2017;41(6):605–22.

    Article  CAS  PubMed  Google Scholar 

  92. Oliver J, Fan M, McKinley B, Zemelis-Durfee S, Brandizzi F, Wilkerson C, Mullet JE. The AGCVIII kinase Dw2 modulates cell proliferation, endomembrane trafficking, and MLG/xylan cell wall localization in elongating stem internodes of Sorghum bicolor. TPJ. 2021;105(4):1053–71.

    CAS  Google Scholar 

  93. Mantilla Perez MB, Zhao J, Yin Y, Hu J, Salas Fernandez MG. Association mapping of brassino steroid candidate genes and plant architecture in a diverse panel of Sorghum bicolor. Theor Appl Genet. 2014;127:2645–62.

    Article  CAS  PubMed  Google Scholar 

  94. Hirano K, Kawamura M, Araki-Nakamura S, Fujimoto H, Ohmae-Shinohara K, Yamaguchi M, Sazuka T. Sorghum DW1 positively regulates brassinosteroid signalling by inhibiting the nuclear localization of BRASSINOSTEROID INSENSITIVE 2. Sci Rep. 2017;7(1):126.

    Article  PubMed  PubMed Central  Google Scholar 

  95. Zhiponova MK, Vanhoutte I, Boudolf V, Betti C, Dhondt S, Coppens F, Russinova E. Brassinosteroid production and signalling differentially control cell division and expansion in the leaf. New Phytol. 2013;197(2):490–502.

    Article  CAS  PubMed  Google Scholar 

  96. Denancé N, Szurek B, Noël LD. Emerging functions of nodulin-like proteins in nonnodulating plant species. PCP. 2014;55(3):469–74.

    Google Scholar 

  97. Veljović-Jovanović S, Vidović M, Morina F. Ascorbate as a key player in plant abiotic stress response and tolerance. AsA. 2017;47:109.

    Google Scholar 

  98. Viviani A, Verma BC, Giordani T, Fambrini M. L-Ascorbic acid in plants: from biosynthesis to its role in plant development and stress response. Agrochimica: Int J Plant chem Soil Sci Plant Nutr. 2021;65(2):151–71.

    Article  CAS  Google Scholar 

  99. Gui J, Liu C, Shen J, Li L. Grain setting defect1, encoding a remorin protein, affects the grain setting in rice through regulating plasmodesmatal conductance. Plant Physiol. 2014;166(3):1463–78.

    Article  PubMed  PubMed Central  Google Scholar 

  100. Kaler AS, Gillman JD, Beissinger T, Purcell LC. Comparing different statistical models and multiple testing corrections for association mapping in soybean and maize. Front Plant Sci. 2020;10:486047.

    Article  Google Scholar 

  101. Tamba CL, Ni YL, Zhang YM. Iterative sure independence screening EM-Bayesian LASSO algorithm for multilocus genome-wide association studies. PLoS Comput Biol. 2017;13(1):e1005357.

    Article  PubMed  PubMed Central  Google Scholar 

  102. Wen YJ, Zhang H, Ni YL, Huang B, Zhang J, Feng JY, Wu R. Methodological implementation of mixed linear models in multilocus genome-wide association studies. Brief Bioinform. 2018;19(4):700–12.

    Article  PubMed  Google Scholar 

Download references

Acknowledgements

Assam Agricultural University, Jorhat, Assam and the Indian Institute of Millet Research, Hyderabad, are acknowledged for providing the necessary facilities and materials.

Funding

The authors affirm that the study did not receive any financial support for the publication.

Author information

Authors and Affiliations

Authors

Contributions

PPB, RNS and AS designed the research and performed the experiments. BVB and AS contributed experimental materials for the field experiments and provided critical support during the laboratory-based analysis. RNS and PPB provided support during the data analysis. PPB, HV, PS and NB wrote the manuscript with help from all the other authors. All the authors have read and approved the manuscript.

Corresponding author

Correspondence to Ramendra Nath Sarma.

Ethics declarations

Ethics approval and consent for participation

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Behera, P.P., Singode, A., Bhat, B.V. et al. Identifying genetic determinants of forage sorghum [Sorghum bicolor (Moench)] adaptation through GWAS. BMC Plant Biol 24, 1043 (2024). https://doi.org/10.1186/s12870-024-05754-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12870-024-05754-6

Keywords