| Votes | By | Price | Discipline | Year Launched |
| GenBank | OPEN SOURCE | Interdisciplinary |
Description
Features
Offers
Reviews
GenBank is the public nucleotide sequence database maintained by the National Center for Biotechnology Information (NCBI) in the US. It contains an annotated collection of all publicly available DNA (and relevant RNA) sequences.
It is one of the core components of the International Nucleotide Sequence Database Collaboration (INSDC), which includes GenBank (USA), the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ).
Why it matters
- It serves as a global reference repository of nucleotide sequences: researchers around the world deposit sequences here for access, comparison, reference and reuse.
- It supports transparency, reproducibility and cumulative science: when you publish a sequence (gene, genome, transcript) you usually deposit it in GenBank and get an accession number which is cited in your paper.
- It enables bioinformatics tools and databases: many downstream resources (annotation pipelines, comparative genomics, metagenomics) rely on sequences from GenBank. For example, the database underpins search tools like BLAST at NCBI.
How it works (in broad strokes)
- Researchers (or sequencing centres) prepare nucleotide sequence data plus annotation/metadata (organism, gene name, features, etc) and submit it via tools such as Web BankIt or the submission portal.
- NCBI processes the submission (automated & manual checks) and assigns an accession number to the sequence record.
- The record becomes publicly accessible via search (Entrez Nucleotide), via FTP/download or programmatic access (e-utilities).
- The database is periodically released (e.g., every two months) and data are exchanged daily among INSDC partners to maintain global synchrony.
Key features & advantages
- Open access: GenBank data are publicly available and broadly redistributable.
- Extensive coverage: It includes sequences from many species, many projects, from small genes to full genomes.
- Standardised identifiers: Each sequence has an accession number, versioning is supported, making referencing more robust.
- Integration: GenBank links to other NCBI databases (taxonomy, protein, gene, literature) allowing rich cross-resource queries.
Limitations & things to watch
- Variable annotation quality: Since submissions come from many sources, the completeness/accuracy of annotation may vary.
- Sequence mis-assignment or errors: Some records may have wrong species attribution or other errors. Researchers have flagged issues in GenBank in certain cases.
- Huge scale and growth: The database grows rapidly, which can pose challenges in curation, version control and data handling.
- Submission obligations: Many journals and funders require deposition of sequence data in GenBank (or equivalent). Users must ensure correct metadata, embargoes etc.
Why your lab/institution might use it
- If your lab generates sequence data (genes, transcripts, microbial genomes, barcodes, metagenomes) you’ll likely need to deposit those sequences in GenBank to enable publishing and sharing.
- For any sequence-based analysis, you’ll use GenBank as a reference: retrieving homologous sequences, comparing your data to known sequences, or aligning via BLAST.
- In pitch decks or grant proposals: you might highlight that “all sequence data will be deposited in GenBank (accession numbers will be provided)”, which supports good data management practices and transparency.
- For training students: using GenBank records can teach how to inspect sequence metadata, feature tables, versions, annotation nuance.
- For open science / FAIR data: depositing sequences in GenBank helps your work be Findable and Accessible to others, ensuring proper metadata will help Interoperability and Reusability.
Discover References, Open Access Search, Field Specific Alerts, Alerts for Search Terms, Annotating, Search Engine
