The bEST-DRRD interface
The bEST-DRRD website (http://www.best.us.edu.pl) home page interface contains several links. By choosing ‘Project’, the user can find general information about the project 'Mutational Analysis of Genes Involved in DNA Repair in Barley', which is implemented in the Department of Genetics, University of Silesia and coordinated by the International Atomic Energy Agency (IAEA) in Vienna, Austria. The above-mentioned project is aimed at the identification of barley sequences homologous to the characterised Arabidopsis genes involved in the mechanisms of DNA repair as well as at the analysis of the identified sequences with the TILLING (Targeting Induced Local Lesions IN Genomes) strategy in order to isolate mutants that carry defects in this process. The link ‘Genes’ provides the list of barley genes that have been identified, cloned and characterised during the implementation of the project. These genes were published in NCBI GenBank database and are now being analysed functionally with TILLING. The link ‘BLAST’ directs the user to the ViroBLAST tool, where various databases of barley genomic and ESTs (derived from HarvEST project), rice and Brachypodioum genomic databases may be browsed using query sequences. It should be emphasised that in this step any input sequence (not necessarily related to DNA metabolism) may serve as a query. The HarvEST data source has been incorporated in the presented database because it is the main resource for barley ESTs and assemblies and many resources (Affymetrix GeneChip, Illumina Golden Gate Assay) are based on it. It contains highly curated ESTs (all based on raw data - not only FASTA). The barley ESTs derived from HarvEST are combined into unigenes and BLASTed relative to rice, Arabidopsis and UniProt. The link ‘Search’ enables the database to be browsed in order to find all Arabidopsis genes that are to be used as the queries. The user will find a short instruction on how the database may be screened and which categories are available (Table 1). The interface also provides links to the website addresses related to the project and the bEST-DRRD itself (‘Links’) and allows for feedback with the authors of the database (‘Contact’). The link ‘Team’ introduces the individuals involved in various tasks of the project, which are also listed.
Browsing the database
The bEST-DRRD may be browsed using several different options (Figure 1). All the Arabidopsis genes from bEST-DRRD may be shown in the table in alphabetical order or the Arabidopsis genes, that are involved in DNA replication and DNA damage repair, may be displayed separately (also in a table and in alphabetical order). For each process (DNA replication and repair), the genes involved in distinct pathways, like Origin recognition or Base Excision Repair, may be displayed separately. The repository of Arabidopsis genes involved in DNA replication and repair may also be browsed using gene and/or protein names as well as the accession numbers from TAIR (The Arabidopsis Information Resource). For each Arabidopsis gene listed in table the name, short name, function of the gene and the NCBI GenBank accession numbers of the transcript and encoded protein are provided. The NCBI GenBank accession numbers are directly linked to the corresponding entries in the NCBI database. The option ‘search’ in the penultimate column of the table allows the retrieval of the barley EST sharing a similarity with the query and derived from TIGR, CR-EST and Gene Index Project resources. The results are depicted in the form of bars aligned with the query. For each of these, the database provides detailed positions of the query-EST alignment (the numbers above the ESTs, which are depicted as blue bars, refer to the nucleotide positions within the EST), the total length of EST (in parenthesis), the Expect value of the alignment together with identities (percentage of similarity) between the query and EST. For each EST the accession number is provided, which is directly linked with the sequence of EST in the FASTA format, orientation of the strands in the query-EST alignment and a view of the alignment with the positions of the nucleotides in the query and the subject (EST). In the last column, ‘details’ about each gene and encoded protein are provided. For each gene, apart from its name and function, the gene structure, sequences of mRNA, CDS and encoded protein are provided (in the FASTA format) together with the total length of the protein sequence and their NCBI GenBank and TAIR accession numbers. The ‘Toolbox’ options are also available for each gene. The ‘Publications’ link provides a comprehensive list of papers on the given gene from the PubMed and Google Scholar. The sub-cellular localisation of the gene product as well as the spatial and temporal expression profile of each gene are provided through the Arabidopsis eFP Browser (from http://bar.utoronto.ca). Two additional BLAST tools allow the mRNA and/or protein sequence to be used as the queries to search against NCBI GenBank database (BLASTN and BLASTP, respectively) for potentially homologous sequences from other species. The ‘Toolbox’ also provides models of conserved domains for proteins, derived from the Conserved Domains source of the NCBI database, and the putative secondary-structure models of the proteins from ModBase: the Database of Comparative Protein Structure Models (http://modbase.compbio.ucsf.edu/modbase-cgi/index.cgi). The ‘Toolbox’ also contains a description of the pathway that is mediated by the protein of interest. The data is derived from the BioSystems repository of the NCBI database (http://www.ncbi.nlm.nih.gov/biosystems).
The bEST-DRRD as a source of information on sequences related to DNA replication and repair in plants
The presented database contains the barley coding sequences that were identified using the database as a tool. The sequences of these barley genes had been confirmed after gene cloning. For each of the above genes additional information and options have been provided, that allow among others for a rapid search for the most conserved Eukaryotic homologs using the ‘HomoloGene’ tool of the NCBI database. Additionally, the ‘Toolbox’ provides a model of the conserved domains, for each barley protein, derived from the Conserved Domains source of the NCBI GenBank database. Similar to the Arabidopsis ‘Toolbox’, two additional BLAST tools allow the mRNA and/or protein sequence to be used as queries to search against the NCBI GenBank database (BLASTN and BLASTP, respectively) for any potentially homologous sequences. Moreover, the sequences of barley ESTs which were used as a basis for the gene cloning are available, together with the PCR primers applied during the procedure. The alignments of homologous protein sequences from barley, rice and Arabidopsis are provided, where conserved functional domains are depicted in colors with their respective domain codes. The database also includes models of secondary structure predictions performed using the PSIPRED Protein Structure Prediction Server [13] for barley, rice and Arabidopsis protein homologs.
The database is not intended merely as a repository of barley ESTs and therefore it may serve as a source of information on the genes, proteins and mechanisms of DNA-related processes in Arabidopsis as well. The presented database is based on query sequences derived from Arabidopsis, because in this species the mechanisms underlying DNA replication and repair have been described to the greatest degree. Only a few genes involved in the DNA repair process have been characterised and their functions have been functionally validated in monocot crops, including rice [14]. Therefore, the Arabidopsis sequences involved in DNA repair that have been identified so far can serve as the basis for the retrieval of sequences collected in other species databases in order to identify homologous genes. Moreover, the contents of the open-access databases (i.e. eFP Browser), which provide information about gene expression profiles (including DNA replication and repair-related genes), are by far more extensive for Arabidopsis than for any other plant species. This makes Arabidopsis the most suitable model for the computational characterisation of any group of genes, especially because DNA replication and repair mechanisms are highly conserved across many evolutionarily divergent phylogenetic groups. The data concerning the functional characterisation and expression profiles of Arabidopsis genes may therefore serve as cues for identifying the same features in other plant species.
Mutagenic techniques are very efficient tools that are required to develop necessary germplasm collections in model and crop species that facilitate the discovery of desired loci and alleles. Various mutation techniques are applied for the analysis of gene function. One of the powerful strategies of functional genomics is TILLING approach, which is currently applied for analysis of the cloned barley genes. TILLING generates an allelic series of mutations and provides a range of phenotypic severity, therefore it is often preferable in basic research because it allows a more informative insight into the function of the gene and its product than insertional mutagenesis [15, 16]. Induction of mutations within the genes involved in DNA repair may alter the efficiency of this process and shed light on the molecular mechanism of DNA repair in plants. The bEST-DRRD is the first database, which is designed to provide data on functional characterisation of genes related to DNA replication and repair in monocot crop species.