PDB Beta Archive
Introduction
wwPDB anticipates four character PDB accession code (PDB ID) will be consumed by 2028. With the continuous growth of PDB archive, wwPDB has revised PDB accession code by extending its length and prepending "PDB" (e.g., "1abc" will become "pdb_00001abc"). This new ID format will enable text mining detection of PDB entries in the published literature and allow for more informative and transparent delivery of revised data files.
PDB Beta Archive is provided to help community adopt extended PDB ID and PDBx/mmCIF format during the transition phase. All files at this archive are re-organized with extended PDB ID (including file naming and directories) at entry level, mirroring the same data organization of the PDB Versioned Archive.
All data files for a particular entry are stored in a single directory, labeled based on a two-character hash generated from the penultimate two characters of the PDB code, i.e., https://files-beta.org/pub/wwpdb/pdb/data/entries/<two-letter-hash>/<pdb_accession_code>/<entry_data_File_names>.
The two-letter hash will be based on the second and third characters from the last character. For example, PDB entry PDB_1abc5678 will be under /67/. This will maintain consistency with the current PDB archive: PDB entry 1abc is under /ab.
File naming is standardized such that the file type is used for the extension. For example, file naming is changed from r116dsf.ent.gz to pdb_0000116d-sf.cif.gz for the structure factor file and from pdb318d.ent.gz to pdb_0000318d.pdb.gz for the legacy PDB formatted coordinate file.
When four character PDB IDs are consumed, this PDB Beta Archive will replace the current PDB Archive and entries with extended PDB IDs issued are not compatible with PDB format. Please note that all existing legacy PDB format files will be frozen (will not be updated) at this PDB Beta archive, wwPDB encourages scientific journals, PDB community and users to transition to PDBx/mmCIF format and adopt new PDB ID format as earlier as possible.
For more information, see FAQ.
File Download
The PDB Beta archive is updated every Wednesday at 00:00 UTC.
wwPDB: https://files-beta.wwpdb.org, rsync://rsync.wwpdb.org
RCSB PDB (US): https://files-beta.rcsb.org, rsync://rsync.rcsb.org (see the download protocol below)
PDBe (UK): https://ftp.ebi.ac.uk/pub/databases/wwpdb/
PDBj (Japan): ftp://ftp-beta.pdbj.org, https://files-beta.pdbj.org, rsync://rsync-beta.pdbj.org
New Sequence and InChI
Every Saturday by 3:00 UTC, for every new entry the wwPDB website provides:
Data Structure and Content
Primary data (atomic coordinates and experimental data) are stored at entry level using a hash directory.
| Data types |
File formats |
Location |
| Atomic coordinates |
PDBx/mmCIF, XML, and PDB |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
|
| X-ray data: Structure Factors |
PDBx/mmCIF |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
|
| NMR data: Restraints and chemical shifts |
NEF (/nmr_data), NMR-STAR, and native refinement program formats |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/structures/
|
Small molecules references: CCD
BIRD
Other derived data:
CCD holdings
SMILES, InChI, InChIKey
Variants
|
PDBx/mmCIF
JSON
SDF, smi, inch
|
https://files-beta.wwpdb.org/pub/wwpdb/refdata/
CCD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/chem_comp/[last-character-hash]/[CCD ID]/
BIRD: https://files-beta.wwpdb.org/pub/wwpdb/refdata/bird
Other derived data: https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/
CCD holdings: A list of released chemical reference entries, their content types (e.g., Chemical Component, BIRD), and the most recent modification date of the reference file.
https://files-beta.wwpdb.org/pub/wwpdb/refdata/derived_data/refdata_id_list.json.gz
|
| Assemblies |
PDBx/mmCIF |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/
https://files-beta.wwpdb.org/pub/wwpdb/pdb/data/entries/[2-letter-hash]/[extended-PDB-ID]/assemblies/
|
| Archive holdings |
JSON |
https://files-beta.wwpdb.org/pub/wwpdb/pdb/holdings/ |
List of archive holdings
These inventory data files at /pub/pdb/holdings/ offer a quick overview of data in the archive.
| current_file_holdings.json.gz |
a list of released PDB entries and the file types present for each in the PDB Core Archive (e.g. coordinate data, experimental data, validation report). |
| released_structures_last_modified_dates.json.gz |
a list of released PDB entries with the most recent modification date of the PDBx/mmCIF file. |
| released_experimental_data_last_modified_dates.json.gz |
a list of released experimental data files with the most recent modification date. |
| obsolete_structures_last_modified_dates.json.gz |
a list of obsoleted PDB entries with the most recent modification date of the PDBx/mmCIF file. |
| obsolete_experimental_data_last_modified_dates.json.gz |
a list of obsoleted experimental data files with the most recent modification date. |
| all_removed_entries.json.gz |
a list of obsoleted PDB entries including information for entry authors, entry title, release date, obsolete date, and superseding PDB ID, if any. |
| unreleased_entries.json.gz |
a list of on-hold PDB entries, their entry status, deposition date, and pre-release sequence information, where available. |
Download Protocols
Every Wednesday from 00:00 UTC, all new and modified data entries will be updated at each of the wwPDB repository sites. The PDB archive is quite large, requiring over 1TB of storage, and continues to grow with each weekly update.
All files mentioned above are available via 3 different protocols: ftp, https and rsync. For individual file downloads we recommend https. The ftp protocol will be gradually phased out. For bulk file downloads we recommend rsync, see more instructions about rsync below.
RCSB PDB:
Using http protocol:
Download coordinate files in PDB Exchange Format (mmCIF):
https://files-beta.wwpdb.org/download
For example, https://files-beta.wwpdb.org/download/pdb_00001abc.cif.gz
Download coordinate files in PDBML format:
https://files-beta.wwpdb.org/download
For example, https://files-beta.wwpdb.org/download/pdb_00001abc.xml.gz
Download EMDB data files:
https://files.rcsb.org/pub/emdb/structures
Download the experimental data files:
https://files-beta.wwpdb.org/download
For example, https://files-beta.wwpdb.org/download/pdb_00001abc-sf.cif.gz (for structure factors)
Download the assembly files:
https://files-beta.wwpdb.org/download
For example, https://files-beta.wwpdb.org/download/pdb_00001abc-assembly1.cif.gz (for assembly 1 in cif format)
Download the validation report files:
https://files-beta.wwpdb.org/download
For example, https://files-beta.wwpdb.org/download/pdb_00001abc_validation.pdf.gz
Using rsync protocol:
rsync --port=33444 rsync.rcsb.org::
ftp Top level of PDB ftp tree ( /pub/wwpdb/pdb )
ftp_data Data directory within PDB ftp archive ( /pub/wwpdb/pdb/data )
ftp_refdata Small molecule data directory within PDB ftp archive ( /pub/wwpdb/refdata )
emdb Top level of EMDB ftp tree ( /pub/emdb )
Download coordinate files in PDB Exchange Format (mmCIF):
rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz ./mmCIF
[]
rsync -rlpt -v -z --delete --port=33444 \
"rsync.rcsb.org::ftp_data/entries/*/*/structures/*.cif.gz" ./mmCIF
Download coordinate files in PDBML Format (xml):
rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz ./XML
rsync -rlpt -v -z --delete --port=33444 \
"rsync.rcsb.org::ftp_data/entries/*/*/structures/*.xml.gz ./XML
Download chemical component (CCD) files:
rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_refdata/chem_comp/ ./mmCIF
Download EMDB metadata map header files (xml):
rsync -rlpt -v -z --delete --port=33444 --include "emd-*.xml" \
"rsync.rcsb.org::emdb/structures/EMD-*/header/" ./header
Download directories/files for EMDB entry EMD-5001:
rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::emdb/structures/EMD-5001/ ./EMD-5001
Download the validation report files:
rsync -rlpt -v -z --delete --port=33444 \
rsync.rcsb.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/validaton_reports/
Need further help with the US site: Please contact
info@rcsb.org if you have any problems with file download.
PDBe:
Using http protocol:
Download coordinate files in PDB Exchange Format (mmCIF):
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz
Download coordinate files in PDBML format:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz
Download coordinate files in PDB format:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz
Access the full PDB ftp tree:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/
Download EMDB data files:
https://ftp.ebi.ac.uk/pub/databases/emdb/structures
Download the validation report files:
https://ftp.ebi.ac.uk/pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports
Using rsync protocol:
rsync rsync://rsync.ebi.ac.uk:: pub ftp.ebi.ac.uk /pub area
Download coordinate files in PDB Exchange Format (mmCIF):
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz \
./mmCIF
Download coordinate files in PDBML Format (xml):
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz \
./XML
Download coordinate files in PDB Format:
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz \
./pdb
Download EMDB map metadata header files (xml):
rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-*/header/" ./header
Download directories/files for EMDB entry EMD-1003:
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/emdb/structures/EMD-1003/ ./EMD-1003
Download the validation report files:
rsync -rlpt -v -z --delete \
rsync.ebi.ac.uk::pub/databases/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ \
./validation_reports
Using ftp protocol:
ftp ftp.ebi.ac.uk
will connect to an anonymous ftp server containing the remediated wwPDB repository. Use the user 'anonymous' when prompted. Alternatively, use lftp as below
lftp http://ftp.ebi.ac.uk
The archive files are available in pub/databases/wwpdb
cd pub/databases/wwpdb
Need further help with the PDBe site: Please contact PDBe (http://www.ebi.ac.uk/pdbe/about/contact or e-mail
pdbehelp@ebi.ac.uk) if you have any problems connecting to Index of /.
PDBj:
Using http protocol:
Download coordinate files in PDB Exchange Format (mmCIF):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz
Download coordinate files in PDBML format (all):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/[extended PDB ID].xml.gz
Download coordinate files in PDBML format (no-atom site information):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-noatom.xml.gz
Download coordinate files in PDBML format (atom site information only):
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID]-exatom.xml.gz
Download coordinate files in PDB format:
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].pdb.gz
Download EMDB data files:
https://files-beta.pdbj.org/pub/emdb/structures
Download the validation report files:
https://files-beta.pdbj.org/pub/wwpdb/pdb/data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/
Using rsync protocol:
rsync rsync-beta.pdbj.org::
ftp Top level of PDB ftp tree ( /pub/wwpdb/pdb )
ftp_data Data directory within PDB ftp archive ( /pub/wwpdb/pdb/data )
ftp_refdata Small molecule data directory within PDB ftp archive ( /pub/wwpdb/refdata )
emdb Top level of EMDB ftp tree ( /pub/emdb )
Download coordinate files in PDB Exchange Format (mmCIF):
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].cif.gz ./mmCIF
Download coordinate files in PDBML Format (xml):
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/structures/[extended PDB ID].xml.gz ./XML
Download chemical component (CCD) files:
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_refdata/chem_comp/ ./chem_comp
Download EMDB map metadata header files (xml):
rsync -rlpt -v -z --delete --include "emd-*.xml" \
"rsync-beta.pdbj.org::emdb/structures/EMD-*/header/" ./header
Download directories/files for EMDB entry EMD-5001:
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::emdb/structures/EMD-5001/ ./EMD-5001
Download the validation report files:
rsync -rlpt -v -z --delete \
rsync-beta.pdbj.org::ftp_data/entries/[2-letter hash]/[extended PDB ID]/validation_reports/ ./validation_reports
Using ftp protocol:
ftp ftp-beta.pdbj.org
will connect to an anonymous ftp server at PDBj containing the remediated wwPDB repository.
Need further help with the PDBj site: Please contact PDBj
https://pdbj.org/contact if you have any problems with file download.
Archive Snapshots
The annual archive snapshots provide the data in the archive at the start of each year or at selected milestone moments. These data may be used to provide a stable set of entries for analysis and allow users to see changes introduced due to remediation efforts by wwPDB.
Access to these snapshots is available through HTTP, rsync, FTP, and AWS sync protocols.
HTTP Protocol
RCSB PDB (US/AWS): AWS S3 Explorer
PDBj (Japan): PDB Snapshot Archive
RSYNC Protocol
PDBj (Japan): rsync -avz snapshots.pdbj.org:: .
FTP Protocol
PDBj (Japan): ftp://snapshots.pdbj.org
AWS SYNC Protocol
RCSB PDB (US/AWS): s3://pdbsnapshots/
AWS SYNC Instruction:
-
Install AWS CLI tool
-
-