Databases and websites useful to REFOLD
DATABASES AND WEBSITES
Websites directly relevant to REFOLD: UniProt database, SCOP database, PubMed database
Useful websites to assist REFOLD data entry: ExPASy website, PDB database, Pfam database
The Universal Protein resource database (UniProt). Each protein in the UniProt database has its own unique ID,
based on
the protein and the organism in which it is naturally expressed. Each entry provides information of relevant publications,
sequence, function, secondary structure, molecular weight, length and links to other database and websites, including Pfam, Pubmed and ExPASy. You
can search UniProt using the protein name and organism from which it comes (eg. antitrypsin homo sapiens OR antitrypsin human). Upon the query results
appearing, you may need to browse the results further to pinpoint exactly which entry pertains to your protein.
You can view an entry in more detail by selecting the ID/Accession no. in the left hand side of the table.
The UniProt ID is the first ID listed in this field (eg.A1AT_human).
The Structural Classification of Proteins (SCOP) database. Proteins which have had their structures
solved are grouped into families with homologues and other proteins with similar structures. Proteins in the SCOP database are sorted at several
hierarchical levels:
Class: refers to general structural classification of the protein, according to the dominant structural components of the protein
eg. Alpha, beta, alpha/beta, small proteins, multi-domain proteins
Fold: refers to the major tertiary fold and arrangement of secondary structural components
within the protein (eg. In β-sheets, barrels)
Superfamily: relates proteins which have a “probable common evolutionary origin” – they share the same fold,
similar functional and structural features but low sequence homology
Family: relates proteins with a “clear evolutionary relationship”. Generally this suggests that they share
relatively high sequence homology and/or very similar functions and structures
Protein: refers to the individual proteins and structures, taking into account the organism of origin and specific sequence
Each individual grouping within the Class, Fold, Superfamily, Family and Protein groups has its own specific ID allocated.
The ID is listed after the grouping name in square brackets [ ].
When searching the SCOP Database, it is best to search on single words terms (eg.”sapiens” rather than “homo sapiens”),
then select the appropriate choice from the list provided. The SCOP Database may also be searched using reference codes
from the PDB Database (see PDB database for more details). Also note that some proteins have not been structurally
characterized, in which case the SCOP class and family should be listed as “unknown”.
The ExPASy website has a lot of useful tools and links for protein and DNA analysis. For REFOLD, you will mainly
need to use the “Primary structural analysis” tools found in the “Tools and software packages” box. You can select the
either “ProtParam” or the “pI/MW” tool to calculate the theoretical pI and/or molecular weight of your protein. After
selecting the tool, paste your protein sequence into the box provided (don’t worry about removing numbers, the program
will automatically ignore them). Click the appropriate option and the program will calculate the relevant data for you.
The Protein Data Bank (PDB) database is a listing of all proteins for which the structure has been solved. You can
search on both protein or organism names. As with the SCOP database, be careful when searching on multiple terms – sometimes
specifying too many terms may produce no results at all, whereas it may be more productive to search on one term only and then browse through the query results.
Individual entries in the PDB database may be viewed by clicking on the unique PDB entry ID at the left hand side of each
listing (4-character ID). Once you have identified a structure as being relevant to your protein, you can search the SCOP
Database using the 4-character PDB entry ID, and this should produce the relevant SCOP entry for the protein. Alternatively,
a direct link to the relevant entry in the SCOP database is also provided at the bottom of each PDB entry (in summary information).
The Protein Families database sorts proteins in clusters of multiple alignments, linking families of proteins together.
Pfam also contains information about domain boundaries and disulfide bonds. Contains links to PDB and UniProt Databases.
You can search Pfam for specific proteins using UniProt ID numbers (Search by “Protein name or sequence”).