Sign In

Quick Links

  • Feature Search:Search for features by type. Filter on an additional text query if desired.
  • Gene Expression Publications: Search and browse publications with gene expression data that are stored in TBDB.
  • Samples and Conditions:
    Search for a gene over all published samples and explore the expression significance of the indicated gene.
  • Gene Profiles:
    Search for a gene in the published data sets. You will be able to explore the gene expression of the indicated gene in each publication.
  • My Repository (sign in required): Access your saved microarray data sets for further analysis.
  • BLAST - search all genomes by sequencing similarity
  • Genes - Gene annotations by functional categories
  • Download - download data files for sequence and annotation
  • New User Guide

     

    This guide is here to provide you with a walk-through of the site by searching for a particular gene, viewing the annotation information for that gene, and retrieving the corresponding expression data.

    Outline

     

    What is TBDB?

    The Tuberculosis Database (TBDB) is an integrated database providing access to TB genomic data and resources, relevant to the discovery and development of TB drugs, vaccines and biomarkers. The current release of TBDB houses genome sequence data and annotations for 28+ different Mycobacterium tuberculosis strains and related bacteria. TBDB stores pre- and post-publication gene-expression data from M. tuberculosis and its close relatives. TBDB currently hosts data for nearly 1500 public tuberculosis microarrays and 260 arrays for Streptomyces. In addition, TBDB provides access to a suite of comparative genomics and microarray analysis software.

    Finding a Gene with Quicksearch

    Suppose you are studying DosR, the transcription factor known to regulate the hypoxic response of Mycobacterium tuberculosis, Park HD et al., Mol Microbiol. 2003 May;48(3):833-43. Simply enter 'DosR' into either search field on the TB Database home page.



    You will see the search results as below:



    Alternatively, enter the ORF identifier, 'Rv3133c':

    screenshot

    Clicking the first item takes you to the returned results from the Mtb strain H37Rv:



    Click on the entry with the highest relevance score, and you will see the gene details page for Rv3133c:

     

    The Gene Detail Page

    What information is provided for each gene in TBDB?

     

    Downloading Sequence Data and Gene Annotation

    The page where you download data can be accessed from the main navigation at the top of the screen: from the "Genomic Data" menu, select the last item "Download".

     

    Now you have reached the page with links to data files for seven strains of TB and 26 related organisms. At the top of the page you find links to raw sequence data in fasta format, with one row for each organism. Each column represents a different file format. Choose your preferred compression scheme, then click on the arrow symbol to start the download.

     

    For each genome, files are presented in a number of formats to facilitate various analyses:

    • .fasta: a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
    • .gtf: The Gene transfer format (GTF) is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information.
    • .agp: A file that describes how primary sequences can be assembled to make a non-redundant, contiguous sequence. The sequence being assembled may be a contig or a chromosome. For more information about the file specifiction, see the format definition page.
    • .txt: tab-delimited text file, best viewed in a spreadsheet program to allow easy sorting.
      For example, the file "annotation_summary_per_gene.txt" shows one row for each gene, with all associated features grouped together by category (such as PFAM domain or KEGG pathway) where multiple features in any category are separated by commas; in contrast, the file "annotation_summary.txt" shows multiple rows for genes that have multiple features in any category (e.g., a gene that is associated with four KEGG pathways will be listed with one row for each pathway, so you can sort by KEGG ID).