What is the major purpose of the KEGG database? What does KEGG stand for, what kind of data does it contain, and what unique visualization tools does it offer? Write a comprehensive paragraph about this. Is KEGG curated?
KEGG stands for “Kyoto encyclopedia of genes and genomes”. KEGG was developed in 1995 by Kenehisa Laboratories. KEGG is a database resource that allows large-scale molecular data sets to be generated from genome sequencing, which helps us understand the functions of the biological system of a cell, the organism, and its ecosystem. It is a computer representation of the biological system which consists of genetic information like genes and proteins and chemical information. There is also health information which provides disease and drug information. KEGG sections the data that it contains into different categories:
- Systems Information-Pathway, Brite, and Module
- Genomic information- Genome, Genes, and SSDB
- Chemical Information- Compound, Glycan, Reaction, RClass, and Enzyme
- Health Information- Drug, Disease, and Environ
KEGG offers unique visualization tools such as KegHier which is an application for browsing BRITE hierarchy files, KehArray an application for microarry data analysis, and KegDraw an application for drawing compound and glycan structures. KEGG is curated.
Describe the large sequence-holding and managing institutions that exist in the US, Europe, and Japan. What is each called, what sort of resources does each possess, what kind of data does each contain? What is the specific database repository within each for genomic DNA data?
Located in Japan is the National Institute of Genetics (NIG). Resources of this institution:
Mouse– Genetic resources with over 100 laboratory mouse stains, contains a genomic database. Database repository (NIG Mouse Genome Database)
Zebrafish– ZTrap resource, Zebrafish Gene Trap and Enhancer Trap database
Hydra– Strain database
Drosophila– NIG FLY resource, Segmentation Antibodies data
C. Elegans– Gene Expression Database (NEXTDB)
E.coli– Strain/Vector/Antibody, Genome database, TEC database. Database repository: PEC (Profiling of E.coli Chromosome)
In Europe, the large sequence-holding instition is called The European Bioinformatics Institute (EMBL-EBI). Some resources they use are Clustal Omega, Inter Proscan, BLAST, and HMMER. The most used databases at this institution are Ensembl, Uniprot, PDBE, Europe PMC, Expression Atlas, and chEMBL.
In the US we have an institute called National Center for Biotechnology Information.
NCBI has many different resources that are categorized and have subcategories within them.
- Chemicals and Bioassays
- DNA & RNA
- Data and Software
- Domains and Structure
- Genes and Expression
- Genetics and Medicine
- Genomes and Maps
- Sequence Analysis
- Training and Tutorials
Some of the common databases include BLAST, ch3D, CD Search, E-utilities, GenBank, Genome Workbench, ProSplign, Pubchem, SNP, and VAST