The pear genomics database (PGDB): a comprehensive multi-omics research platform for Pyrus spp.

Background Pears are among the most important temperate fruit trees in the world, with significant research efforts increasing over the last years. However, available omics data for pear cannot be easily and quickly retrieved to enable further studies using these biological data. Description Here, we present a publicly accessible multi-omics pear resource platform, the Pear Genomics Database (PGDB). We collected and collated data on genomic sequences, genome structure, functional annotation, transcription factor predictions, comparative genomics, and transcriptomics. We provide user-friendly functional modules to facilitate querying, browsing and usage of these data. The platform also includes basic and useful tools, including JBrowse, BLAST, phylogenetic tree building, and additional resources providing the possibility for bulk data download and quick usage guide services. Conclusions The Pear Genomics Database (PGDB, http://pyrusgdb.sdau.edu.cn) is an online data analysis and query resource that integrates comprehensive multi-omics data for pear. This database is equipped with user-friendly interactive functional modules and data visualization tools, and constitutes a convenient platform for integrated research on pear. Supplementary Information The online version contains supplementary material available at 10.1186/s12870-023-04406-5.

The pear genomics database (PGDB): a comprehensive multi-omics research platform for Pyrus spp.
Shulin Chen 1 , Manyi Sun 2 , Shaozhuo Xu 1 , Cheng Xue 1 , Shuwei Wei 3 , Pengfei Zheng 1 , Kaidi Gu 1 , Zhiwen Qiao 1 , Zhiying Liu 1 , Mingyue Zhang 1* and Jun Wu 2* sequenced and assembled using the HiSeq Illumina technology combined with a BAC-by-BAC strategy [5].After this, the western variety 'Bartlett' (P.communis) was sequenced with Roche's 454 Sequencing Technology [6].In recent years, several more pear reference genomes were published owing the rapid development of sequencing technologies [7][8][9][10][11][12][13].These developments further led to the generation of a large number of transcriptome and population DNA re-sequencing data, allowing mining key genes responsible for important agronomic traits and studying the domestication history of pears [14,15].At present, pear genome and resequencing data have been collected in the Rosaceae Genome database GDR, but transcriptome data is lacking.Therefore, there is an urgent need for a database that can effectively integrate, analyze and disseminate pear multiomics data, and provide a platform for researchers to quickly access and utilize these resources.These resources are already available for a variety of plants, such as bayberry and pineapple [16,17].Therefore, we integrated the advantages of the above-mentioned databases and constructed the Pear Genomics Database (PGDB).In this study, a total of nine genome sequences, 35 transcription group datasets, and re-sequencing data from 30 pear accessions were collected.We also included commonly used tools, such as BLAST, JBrowse, phylogenetic tree building in the PGDB which will facilitate the future development of pear functional genomics and molecular biology approaches.

Database construction and content
The PGDB collected and processed data on genome sequences, annotation, expression, synteny, and resequencing, which are stored in the MySQL database server (5.7.34).The web interface mainly uses the front-end framework Twitter Bootstrap based on HTML5 (Hyper-Text Markup Language 5), CSS (Cascading Style Sheets) and JavaScript, and allows users to connect various levels of information, query the data and generate results.The data can be downloaded through a PHP protocol (7.4.21).The entire website was developed using the Web server software Apache (2.4.48), and implemented in the Linux (CentOS 7.6) operating system (Fig. 1).

Synteny data
We identified synteny blocks and homologous gene pairs from 9 pear genome data.The protein sequences were aligned against each other and themselves using BLASTP (E-value ≤ 1e-10).The MCScanX [27] software was then employed with default parameters to determine the synteny blocks and homologous gene pairs from the BLASTP results.

Marker data
The Krait tool [28] was used to mine simple sequence repeat (SSR) resources in nine pear genome data.A total of 386,779 SSR markers were identified and divided into five categories, namely dinucleotides to hexanucleotides, with the minimum number of repeats of 6,5,4,4,4 for each SSR type.Primer3 software (58) [29] implemented in Krait tool was used to design SSR primers.The specific parameters are: the size range of polymerase chain reaction (PCR) product is 100-300 bp, the length of primer is 20-25 bases, the best is 22 bases, the best annealing temperature is 50-60 °C, the GC content is 40-60%, the best is 50%.Retain the default values for other parameters.In addition, 579 pairs of SSR markers were collected from reported literatures [30][31][32][33][34][35].

Database content
The homepage of the PGDB database is mainly composed of three parts.The top navigation bar is a fast link entry of each module, including: 'Tools' , 'JBrowse' , 'Species' , 'Download' , and ' About' .The middle part contains a brief introduction to the database and the fast link to the 'Tools' and 'Species' modules.The bottom portion includes the website's launch date and other information.

Available tools Search
The 'Search' page provides two retrieval modules (Fig. 2a).In the 'Quick searching' module, users can first search for detailed annotations on genes by simply selecting the cultivar genome and inputting the gene ID.The results page includes information on the sequences (including gene, CDS, and protein), functional annotations (GO, KEGG and InterPro) and the existence of homologous genes (Fig. 2b).Users can select which information should be displayed by clicking on different dropdown box options.In addition, users can also employ Bedtools [39] to retrieve genomic sequences by entering the reference genome coordinates.The results can be visualized online or downloaded for local storage.The 'Sequence fetch' module provides a batch search function for gene, CDS, and protein sequences.

Gene expression
The 'Gene Expression' page provides a search function for genes with annotated RPKM values.Users can find this function in the navigation bar or the 'Tools' module in the middle section of the home page.The results are presented as line or bar charts drawn by Echarts [40] to display RPKM values at different development stages in pear fruits.The query results support online browsing and downloading to facilitate researchers conducting indepth analyses.

Synteny
In this page, comparative genomic information between different pear varieties is provided to facilitate quick retrieval of genomic collinearity and homologous gene pairs (Fig. 3a).In the 'Synteny Block' module, users can obtain synteny blocks by selecting the pear genome and chromosome of choice.The top half of the results page contains an image showing the quantitative relationship between synteny blocks of the query and compared genomes.This is implemented by HighCharts.The bottom half of the page provides complete synteny block information (block ID, location, source, e-value) in the form of a list (Fig. 3b).By clicking on different synteny blocks, users will be linked to detailed information on homologous gene pairs within synteny blocks (Fig. 3c).In the 'Synteny Image' module, synteny images can be constructed between the chromosomes of any two genomes, and downloaded for further study (Fig. 3d).

BLAST
This page provides a user-friendly BLAST tool for sequence alignment with ViroBlast [41].Nucleotide and amino acid sequence similarity searches can be performed through a user-friendly input-output interface.We provide three types of query databases for genomic sequences, CDS sequences and protein sequences (Fig. 4a, b).Users can search the nucleotide sequence and the protein sequence databases by query sequences in BLASTN or BLASTX, and TBLASTN and BLASTP, respectively.In addition, users can choose TBLASTX to translate nucleotide sequences into protein sequences before comparison.

SSR markers
PGDB provides a query page for two types of SSR markers based on genomic prediction and literature reports.Users search for molecular marker data by filling in SSR IDs or selecting special items.Users can submit the search criteria to obtain detailed information including variety, SSR ID, scaffold, motif, type, repeat, start, end, and length.In addition, for genomic SSR markers, detailed information related to primers can be obtained by clicking SSR ID, such as forward sequence, reverse sequence, Tm (temperature), GC content and product size, etc.

Phylogenetic tree building
This page provides a simple and quick tool for constructing phylogenetic trees.Users can input FASTA formatted sequences, with alignment performed with MAFFT (V7.158) [42].IQ-Tree, a stochastic algorithm to infer phylogenetic trees by maximum likelihood, is then used to assemble these sequences [43,44].Both the aligned sequence file and the NWK file containing the phylogenetic tree can be downloaded.Finally the Phylo.io[45] tool was used for the visual presentation of the phylogenetic trees.

Transcription factor
This page provides a search function for predicted TF and TR families in the 9 pear genomes.The search form allows users to retrieve additional TF families by entering a specific gene ID or, instead, the family name for a complete list of genes in specific families.We also provide a list of 94 families at the bottom of the search page to serve as reference.

Genome browser
The genome browser is an important tool for visualization of high-throughput sequencing data.JBrowse [46] is a genome browser based on HTML5 and JavaScript, which contains a fully dynamic AJAX interface.We collected genome and annotation information for 9 pear varieties, as well as genome re-sequencing data for 30 pear cultivars, which were mapped to the 'Dangshansuli' v1.0 genome [1,47,48].In addition, we mapped transcriptome data from five pear cultivars of seven stages to nine pear reference genomes.These data can all be viewed in JBrowse.On the left-hand side of the genome browser, the ' Available Tracks' option provides all displayable file options.After choosing which files to display, the information will appear on a window located in the right-hand side (Fig. 4c).Clicking on the different parts of the sequences will display detailed data information and allows users to browse gene sequences, structure and annotations (Fig. 4d).

Other options
The 'Species' page contains a brief introduction to the 9 pear genomes available and provides links to the relevant literature.The ' About' module contains three parts: the 'Download' page allows users to download genomic information, including FASTA files of the genome assembly, gene, CDS, and protein sequences, and gene structure data in the GFF format.The 'Link' page provides quick links to other plant-related databases and resources.The 'Contact' option allows users to contact the administrators of the PGDB.

Conclusion
PGDB currently includes genomic, transcriptomic and re-sequencing data for pear, which can be displayed through a user-friendly platform that is functionally practical.This can help researchers quickly retrieving, browsing and analyzing multi-omics data and promote in-depth studies and development of pear omics.

Fig. 2
Fig. 2 Search page of the PGDB.(a) Search for genetic information and sequences.(b) Genetic information search results, including gene details, GO ①, KEGG ② and InterPro ③ functional annotations, sequences ④, and homologous genes ⑤

Fig. 3
Fig. 3 Synteny page of the PGDB.(a) Querying synteny blocks between genomes and drawing synteny images.(b) The synteny blocks in the query and compared chromosomes.(c) The genes contained in each synteny block.(d) The collinear image drawn online