Sign In

Quick Links

  • Feature Search:Search for features by type. Filter on an additional text query if desired.
  • Gene Expression Publications: Search and browse publications with gene expression data that are stored in TBDB.
  • Samples and Conditions:
    Search for a gene over all published samples and explore the expression significance of the indicated gene.
  • Gene Profiles:
    Search for a gene in the published data sets. You will be able to explore the gene expression of the indicated gene in each publication.
  • My Repository (sign in required): Access your saved microarray data sets for further analysis.
  • BLAST - search all genomes by sequencing similarity
  • Genes - Gene annotations by functional categories
  • Download - download data files for sequence and annotation
  • Tutorial: Downloading Sequence Data and Gene Annotation

    The page where you download sequence data can be accessed from the main navigation at the top of the screen: from the "Genomic Data" menu, select the last item "Download".

    home page screen shot

    Now you have reached the page with links to data files for seven strains of TB and 26 related organisms. At the top of the page you find links to raw sequence data in fasta format, with one row for each organism. Each column represents a different file format. Choose your preferred compression scheme, then click on the arrow symbol to start the download.

    downloads screenshot

     

    For each genome, files are presented in a number of formats to facilitate various analyses:

    • .fasta: a text-based format for representing either nucleic acid sequences or peptide sequences, in which base pairs or amino acids are represented using single-letter codes. The format also allows for sequence names and comments to precede the sequences.
    • .gtf: The Gene transfer format (GTF) is a file format used to hold information about gene structure. It is a tab-delimited text format based on the general feature format (GFF), but contains some additional conventions specific to gene information.
    • .agp: A file that describes how primary sequences can be assembled to make a non-redundant, contiguous sequence. The sequence being assembled may be a contig or a chromosome. For more information about the file specifiction, see the format definition page.
    • .txt: tab-delimited text file, best viewed in a spreadsheet program to allow easy sorting.
      For example, the file "annotation_summary_per_gene.txt" shows one row for each gene, with all associated features grouped together by category (such as PFAM domain or KEGG pathway) where multiple features in any category are separated by commas; in contrast, the file "annotation_summary.txt" shows multiple rows for genes that have multiple features in any category (e.g., a gene that is associated with four KEGG pathways will be listed with one row for each pathway, so you can sort by KEGG ID).