GenBank/RefSeq Download Step

The download step retrieves files from the NCBI ftp area and store the results in the download/ directory.

Algorithm

Note that this process isn't mirroring; it doesn't overwrite existing files. This minimizes the danger of leaving the data in an indeterminate state. Only the required subset of files are downloaded.

Directory structure

The directory structure for GenBank and RefSeq are a subset of the directories at the NCBI ftp site. Release version numbers are added to the database directory names to allow keeping multiple versions. Since RefSeq doesn't actually do versioned releases, the GenBank version number is used for RefSeq as well. This implies that a full snapshot of RefSeq is obtained when a new GenBank release is made. This fake versioning of RefSeq supports both small, incremental downloads and allowing older files to be periodically removed.