Unity
Unity
About
News
Events
Docs
Contact Us
code
search
login
Unity
Unity
About
News
Events
Docs
Contact Us
dark_mode
light_mode
code login
search

Documentation

  • Requesting An Account
  • Get Started
    • Common Terms
    • HPC Resources
    • Quick Start
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
  • Cluster Specifications
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • Node List
    • Partition List
      • Gypsum
    • Storage
  • Frequently Asked Questions
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • Code Llama
      • Imagenet
      • Imagenet 1K
      • LAION
      • Llama2
      • mixtral
    • Bioinformatics
      • AlphaFold Databases
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • dfam
      • EggNOG
      • GTDB
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • PDB70
      • PDB70 for ColabFold
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef100 BLAST database
      • UniRef30
      • UniRef90

Documentation

  • Requesting An Account
  • Get Started
    • Common Terms
    • HPC Resources
    • Quick Start
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
  • Cluster Specifications
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • Node List
    • Partition List
      • Gypsum
    • Storage
  • Frequently Asked Questions
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • Code Llama
      • Imagenet
      • Imagenet 1K
      • LAION
      • Llama2
      • mixtral
    • Bioinformatics
      • AlphaFold Databases
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • dfam
      • EggNOG
      • GTDB
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • PDB70
      • PDB70 for ColabFold
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef100 BLAST database
      • UniRef30
      • UniRef90
  1. Unity
  2. Documentation
  3. Datasets
  4. Bioinformatics
  5. NCBI BLAST databases

NCBI BLAST databases

The NCBI databases are downloaded every Sunday to a directory with that date. The file /datasets/bio/ncbi-db/.ncbirc is then updated to point to the new copy once the download has been verified. This allows running jobs to have a consistent database throughout the run.

Note that other tools that can use the NCBI database but do not read this configuration file can use the output of blastdb_path to find the current copy, as shown in the following example:

module load blast-plus/2.13.0+py3.8.12 diamond/2.0.15+py2.7.18
NR=$(blastdb_path -db nr -dbtype prot)
diamond blastp --db "$NR" -q query.fasta -o matches.tsv
Path:/datasets/bio/ncbi-db/
URL:https://ftp.ncbi.nlm.nih.gov/blast/db/
Downloaded:weekly
Cite:https://support.nlm.nih.gov/knowledgebase/article/KA-03391/en-us
Last modified: Sunday, July 21, 2024 at 10:09 PM. See the commit on GitLab.
University of Massachusetts Amherst University of Massachusetts Amherst University of Rhode Island University of Rhode Island University of Massachusetts Dartmouth University of Massachusetts Dartmouth University of Massachusetts Lowell University of Massachusetts Lowell University of Massachusetts Boston University of Massachusetts Boston Mount Holyoke College Mount Holyoke College
search
close