PDB Beta Archive

Introduction

wwPDB anticipates four character PDB accession code (PDB ID) will be consumed by 2028. With the continuous growth of PDB archive, wwPDB has revised PDB accession code by extending its length and prepending "PDB" (e.g., "1abc" will become "pdb_00001abc"). This new ID format will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.

PDB Beta Archive is provided to help community adopt extended PDB ID and PDBx/mmCIF format during the transition phase. All files at this archive are re-organized with extended PDB ID (including file naming and directories) at entry level, mirroring the same data organization of the PDB Versioned Archive.

All data files for a particular entry are stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code, i.e., https://files-beta.org/pub/wwpdb/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>.

The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry PDB_1abc5678 will be under /67/. This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.

File naming is standardized such that the file type is used for the extension. For example, file naming is changed from r116dsf.ent.gz to pdb_0000116d-sf.cif.gz for the structure factor file and from pdb318d.ent.gz to pdb_0000318d.pdb.gz for the legacy PDB formatted coordinate file.

When four character PDB IDs are consumed, this PDB Beta Archive will replace the current PDB Archive and entries with extended PDB IDs issued are not compatible with PDB format. Please note that all existing legacy PDB format files will be frozen (will not be updated) at this PDB Beta archive, wwPDB encourages scientific journals, PDB community and users to transition to PDBx/mmCIF format and adopt new PDB ID format as earlier as possible.

For more information, see FAQ.

File Download

The PDB Beta archive is updated every Wednesday at 00:00 UTC.

wwPDB: https://files-beta.wwpdb.org, rsync://rsync.wwpdb.org
RCSB PDB (US): https://files-beta.rcsb.org, rsync://rsync.rcsb.org (see the download protocol below)
PDBe (UK): https://ftp.ebi.ac.uk/pub/databases/wwpdb/
PDBj (Japan): ftp://ftp-beta.pdbj.org, https://files-beta.pdbj.org, rsync://rsync-beta.pdbj.org

New Sequence and InChI

Every Saturday by 3:00 UTC, for every new entry the wwPDB website provides:

Data Structure and Content

Primary data (atomic coordinates and experimental data) are stored at entry level using a hash directory.

Data types File formats Location
Atomic coordinates PDBx/mmCIF, XML, and PDB https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
X-ray data: Structure Factors PDBx/mmCIF https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
NMR data: Restraints and chemical shifts NEF (/nmr_data), NMR-STAR, and native refinement program formats https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
Small molecules references: CCD
BIRD
Other derived data:
CCD holdings
SMILES, InChI, InChIKey Variants
PDBx/mmCIF
JSON
SDF, smi, inch
https://files-beta.wwpdb.org/pub/wwpdb/refdata/
CCD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/chem_comp/[last-character-hash]/[CCD ID]/
BIRD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/bird
Other derived data: https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/
CCD holdings: A list of released chemical reference entries, their content types (e.g., Chemical Component, BIRD), and the most recent modification date of the reference file. https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/refdata_id_list.json.gz
Assemblies PDBx/mmCIF https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/assemblies/
Archive holdings JSON https://files-beta.wwpdb.org/pub/wwpdb/pdb/holdings/

List of archive holdings

These inventory data files at /pub/pdb/holdings/ offer a quick overview of data in the archive.

current_file_holdings.json.gz a list of released PDB entries and the file types present for each in the PDB Core Archive (e.g. coordinate data, experimental data, validation report).
released_structures_last_modified_dates.json.gz a list of released PDB entries with the most recent modification date of the PDBx/mmCIF file.
released_experimental_data_last_modified_dates.json.gz a list of released experimental data files with the most recent modification date.
obsolete_structures_last_modified_dates.json.gz a list of obsoleted PDB entries with the most recent modification date of the PDBx/mmCIF file.
obsolete_experimental_data_last_modified_dates.json.gz a list of obsoleted experimental data files with the most recent modification date.
all_removed_entries.json.gz a list of obsoleted PDB entries including information for entry authors, entry title, release date, obsolete date, and superseding PDB ID, if any.
unreleased_entries.json.gz a list of on-hold PDB entries, their entry status, deposition date, and pre-release sequence information, where available.

Download Protocols

Every Wednesday from 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB repository sites. The PDB archive is quite large, requiring over 1TB of storage, and continues to grow with each weekly update.

All files mentioned above are available via 3 different protocols: ftp, https and rsync. For individual file downloads we recommend https. The ftp protocol will be gradually phased out. For bulk file downloads we recommend rsync, see more instructions about rsync below.

RCSB PDB:

Using http protocol:

Download coordinate files in PDB Exchange Format (mmCIF):

https://files-beta.wwpdb.org/download

          For example, https://files-beta.wwpdb.org/download/pdb_00001abc.cif.gz

Download coordinate files in PDBML format:

https://files-beta.wwpdb.org/download

          For example, https://files-beta.wwpdb.org/download/pdb_00001abc.xml.gz

Download EMDB data files:

https://files.rcsb.org/pub/emdb/structures

Download the experimental data files:

https://files-beta.wwpdb.org/download

        For example, https://files-beta.wwpdb.org/download/pdb_00001abc-sf.cif.gz (for structure factors)

Download the assembly files:

https://files-beta.wwpdb.org/download

        For example, https://files-beta.wwpdb.org/download/pdb_00001abc-assembly1.cif.gz (for assembly 1 in cif format)

Download the validation report files:

https://files-beta.wwpdb.org/download

        For example, https://files-beta.wwpdb.org/download/pdb_00001abc_validation.pdf.gz

Using rsync protocol:

rsync --port=33444 rsync.rcsb.org::
ftp             Top level of PDB ftp tree ( /pub/wwpdb/pdb )
ftp_data        Data directory within PDB ftp archive ( /pub/wwpdb/pdb/data )
ftp_refdata     Small molecule data directory within PDB ftp archive ( /pub/wwpdb/refdata )
emdb            Top level of EMDB ftp tree ( /pub/emdb )

Download coordinate files in PDB Exchange Format (mmCIF):

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz ./mmCIF
[]
rsync -rlpt -v -z --delete --port=33444 \
"rsync.rcsb.org::ftp_data/entries/*/*/structures/*.cif.gz" ./mmCIF

Download coordinate files in PDBML Format (xml):

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz ./XML

rsync -rlpt -v -z --delete --port=33444 \
"rsync.rcsb.org::ftp_data/entries/*/*/structures/*.xml.gz ./XML

Download chemical component (CCD) files:

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_refdata/chem_comp/ ./mmCIF

Download EMDB metadata map header files (xml):

rsync -rlpt -v -z --delete --port=33444 --include "emd-*.xml" \
"rsync.rcsb.org::emdb/structures/EMD-*/header/" ./header

Download directories/files for EMDB entry EMD-5001:

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::emdb/structures/EMD-5001/ ./EMD-5001

Download the validation report files:

rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/validaton_reports/

Need further help with the US site: Please contact info@rcsb.org if you have any problems with file download.

PDBe:

Using http protocol:

Download coordinate files in PDB Exchange Format (mmCIF):

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz

Download coordinate files in PDBML format:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz

Download coordinate files in PDB format:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz

Access the full PDB ftp tree:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/

Download EMDB data files:

https://ftp.ebi.ac.uk/pub/databases/emdb/structures

Download the validation report files:

https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports

Using rsync protocol:

rsync rsync://rsync.ebi.ac.uk::    pub                ftp.ebi.ac.uk /pub area

Download coordinate files in PDB Exchange Format (mmCIF):

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz \
./mmCIF

Download coordinate files in PDBML Format (xml):

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz \
./XML

Download coordinate files in PDB Format:

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz \
./pdb

Download EMDB map metadata header files (xml):

rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-*/header/" ./header

Download directories/files for EMDB entry EMD-1003:

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-1003/ ./EMD-1003

Download the validation report files:

rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ \
./validation_reports

Using ftp protocol:

ftp ftp.ebi.ac.uk

will connect to an anonymous ftp server containing the remediated wwPDB repository. Use the user 'anonymous' when prompted. Alternatively, use lftp as below

lftp http://ftp.ebi.ac.uk

The archive files are available in pub/databases/wwpdb
cd pub/databases/wwpdb

Need further help with the PDBe site: Please contact PDBe (http://www.ebi.ac.uk/pdbe/about/contact or e-mail pdbehelp@ebi.ac.uk) if you have any problems connecting to Index of /.

PDBj:

Using http protocol:

Download coordinate files in PDB Exchange Format (mmCIF):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz

Download coordinate files in PDBML format (all):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/[extended PDB ID].xml.gz

Download coordinate files in PDBML format (no-atom site information):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-noatom.xml.gz

Download coordinate files in PDBML format (atom site information only):

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-exatom.xml.gz

Download coordinate files in PDB format:

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz

Download EMDB data files:

https://files-beta.pdbj.org/pub/emdb/structures

Download the validation report files:

https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/

Using rsync protocol:

rsync rsync-beta.pdbj.org::

ftp             Top level of PDB ftp tree ( /pub/wwpdb/pdb )
ftp_data        Data directory within PDB ftp archive ( /pub/wwpdb/pdb/data )
ftp_refdata     Small molecule data directory within PDB ftp archive ( /pub/wwpdb/refdata )
emdb            Top level of EMDB ftp tree ( /pub/emdb )

Download coordinate files in PDB Exchange Format (mmCIF):

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz ./mmCIF

Download coordinate files in PDBML Format (xml):

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz ./XML

Download chemical component (CCD) files:

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_refdata/chem_comp/ ./chem_comp

Download EMDB map metadata header files (xml):

rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync-beta.pdbj.org::emdb/structures/EMD-*/header/" ./header

Download directories/files for EMDB entry EMD-5001:

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::emdb/structures/EMD-5001/ ./EMD-5001

Download the validation report files:

rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ ./validation_reports

Using ftp protocol:

ftp ftp-beta.pdbj.org

will connect to an anonymous ftp server at PDBj containing the remediated wwPDB repository.

Need further help with the PDBj site: Please contact PDBj https://pdbj.org/contact if you have any problems with file download.

Archive Snapshots

The annual archive snapshots provide the data in the archive at the start of each year or at selected milestone moments. These data may be used to provide a stable set of entries for analysis and allow users to see changes introduced due to remediation efforts by wwPDB.

Access to these snapshots is available through HTTP, rsync, FTP, and AWS sync protocols.

HTTP Protocol

RCSB PDB (US/AWS): AWS S3 Explorer

PDBj (Japan): PDB Snapshot Archive

RSYNC Protocol

PDBj (Japan): rsync -avz snapshots.pdbj.org:: .

FTP Protocol

PDBj (Japan): ftp://snapshots.pdbj.org

AWS SYNC Protocol

RCSB PDB (US/AWS): s3://pdbsnapshots/

AWS SYNC Instruction:

  1. Install AWS CLI tool
    • List all PDB Snapshot objects

    • aws s3 ls s3://pdbsnapshots/ --no-sign-request
    • When you see a result like:

    • 20250101/ PRE 20250101/
      • PRE: This indicates that the listed item is a "prefix" or "folder" in the S3 bucket, not a file object. This is how S3 organizes files logically into folders.
        • 20250101/: This is the prefix (or "folder") in the S3 bucket. It's not an actual folder in the traditional sense, but rather a common prefix used to group objects.
    • Sync PDB Snapshots

    • All PDB Snapshot objects

    • aws s3 sync s3://pdbsnapshots/ ./local-directory/ --no-sign-request
    • Specific PDB snapshot object (e.g., /20250101)

    • aws s3 sync s3://pdbsnapshots/20250101/ ./local-directory/20250101 --no-sign-request