Laboratory of Mass Spectrometry and Gaseous Ion Chemistry
PROWL
ProFound
ProteinInfo
PeptideMap
X! Tandem
X! Hunter
GPMDB
   
PROWL
Chait Lab

The Rockefeller University
The Rockefeller University
1230 York Avenue,
New York, NY 10021
(212) 327-8000


National Center of Research Resources
National Resource
for the Mass Spectrometric
Analysis of Biological
Macromolecules

ProFound - Help

 

 

Average Masses

An average mass is obtained, when a peak in the mass spectrum is not isotopically resolved (i.e. in the case of low-resolution mass measurement), or when the A0 component of the isotopic distribution can not be unambiguously identified in high-resolution measurement. When only some of the peaks in the mass spectrum are isotopically resolved, data consist of both average and monoisotopic masses. List average masses in the window for "Average Masses" and monoisotopic masses in the window for "Monoisotopic Masses".

Candidate Rank

Candidates proteins are ranked by the probability that "the candidate protein is the sample protein". Similar proteins will be assigned to the same rank, and they are not treated as separate candidates.

Charge State

Specify the type of peptide masses: M for neutral peptides and MH+ for singly protonated peptides.

Chemical Formula for Modifications

Specify the chemical formula for modification. Only plus (+), minus (-), digits (2-9), and alphabets (A-Z,a-z) are recognized. The format for the formula is:

    [+|-]element[[#occurrence]..[element[#occurrence]]]

An example is: -CH4S (remove one carbon, four hydrogen, and a sulfur)

Cleavage Coupled Modification

Specify associated modification for the specified cleavage. For instance, a modification (-CH4S) is associated with cyanogen bromide (CNBr) cleavage.

Cleavage Site

Specify the amino acid at which cleavage occurs as "user-defined" cleavage. Multiple cleavage sites can be specified. To select multiple cleavage sites, hold down Ctrl key while clicking on the intended amino acids.

Complete Modifications

Choose complete modifications when modifications go complete at all susceptible sites (amino acid).

Coverage

Coverage is defined as the ratio of the portion of protein sequence covered by matched peptides to the whole length of protein sequence. Hyperlinked output pages show graphical and text representations of the matched peptides from the protein candidates. For each candidate protein, graphs are provided to allow the user to quickly assess the experimental peptide mass coverage of the protein and the mass measurement errors.

Database

Specify the sequence database to be searched. Currently only NCBI's nr database is available

DIGEST CHEMISTRY

Specify the cleavage either by choosing the enzyme or chemical from the "pre-defined" list or by setting up a "user-defined" cleavage (See User Defined Cleavage). To set up and use "user-defined" cleavage, you must explicitly press the radio button in front of the text of "User-defined".

Excluded Cleavage

Specify the amino acid(s) next to which a cleavage is excluded. To select multiple cleavage sites, hold down Ctrl key while clicking on the intended amino acids.

Extra Settings

Extra setting allows user to specify the number of particular amino acids in given peptides, if known. For example, if the total number of Y or W is known to be 3 for a mass, specify "2+" in the Y/W column for the mass. When such information is not available, specify "n/a".

The knowledge of the presence (or absence) and the number of particular amino acids contained within a given peptide provides constraints in database searching to reduce the occurrence of database peptides that randomly match the experimental mass spectral data, thereby improving the confidence level for identifications. Experimentally, such information can be obtained in a number of ways. For example, number of methionine residues in a peptide can be inferred by observation of pairs of peaks separated by the mass of oxygen (because methionine residues contained in proteolytic peptides are frequently found to be partially oxidized).

Expectation value

The number that is entered into this box represents the maximum expectation value allowed for sequences in the displayed results.

Profound does not use raw search engine scores to rate matches between sequences and spectra. Instead it uses expectation values calculated from these scores. The simple interpretation of an expectation value is the number of matches that would be expected to have a particular score, if the matches were completely random. Therefore, the smaller the expectation value, the more likely that a particular match is a true match, rather than a random one.

For example, an expectation value of 1 means that at least 1 similar match would be expected when search a database that did not include the protein sequence that truly matches your MS data. An expectation value of 0.0001 means that a similar match would be found approximately once in every 10000 similar sized databases that did not contain the sequence that truly matches your MS data.

This system of estimating the risk of a random match versus a true match is used in most conventional sequence homology matching systems, such as BLAST. It has the distinct advantage of being independent of the scoring system: the expectation value is calculated from a distribution of scored sequences, rather than on a particular result.

GENERAL

This section specifies "general" information about a search. Please see specific parameters for details.

Grouping Similar Proteins

The program treats proteins with similar sequences as a single candidate in the probability calculation.

When a group of proteins are found to be similar, they are grouped together under a single rank, marked with a "+" sign. Currently, Internet Explorer users can expand or contract the display of the protein group by clicking on the "+" sign.

Mass Tolerance

Specify the error of peptide mass measurement. The errors for average and monoisotopic masses are specified independently.

Mass Tolerance Unit

Specify the type of mass tolerance: Dalton (Da) for absolute mass error, and percentage (%) or parts per million (ppm) for relative error.

MASSES

This is the place to enter measured proteolytic peptide masses. There are three alternative methods for specifying the masses of peptides used to search the database. These are average mass, monoisotopic mass, and a combination of average and monoisotopic masses (this latter alternative is useful when only some of the peaks in the mass spectrum are isotopically resolved.) For a given peptide, either the average or monoisotopic mass is entered. Only numbers and decimal points are recognized. Other characters will be discarded.

Max Number of Missed Cleavage Sites

Specify the maximum number of missed cleavage sites within the peptide (yielding incomplete cleavage peptides). Unless you have a good reason, do not use large number for this parameter. A large number may degrade the quality of the search result.

Modification Site

Specify the amino acid at which a modification occurs. Modifications with the same chemical formula can be specified for multiple amino acids. To select multiple amino acids, hold down Ctrl key while clicking on the intended amino acids. The chemical formula of the associated modifications is to be entered explicitly (see Chemical Formula for Modifications). Two "user-defined" modifications (Modification 1 and Modification 2) with different chemical formulas can be specified.

MODIFICATIONS

Specify modification(s) by either choosing from the "pre-defined" list or setting up "user-defined" modifications. The modifications can be set either as complete or as partial. Setting up partial modifications should be done with caution and is highly discouraged. In most cases, choosing partial modifications only increase random matches, and thus degrades the quality of the search result.

Monoisotopic Masses

A monoisotopic mass is obtained in high-resolution mass measurement, when the A0 component of the isotopic distribution can be unambiguously identified. When only some of the peaks in the mass spectrum are isotopically resolved, data consist of both average and monoisotopic masses. List average masses in the window for "Average Masses" and monoisotopic masses in the window for "Monoisotopic Masses".

Partial Modifications

Partial modifications (or incomplete modifications) are modifications that do not occur all the time at susceptible sites (amino acids). Setting up partial modifications should be done with caution and is highly discouraged. In most cases, choosing partial modifications only increase random matches, and thus degrades the quality of the search result.

Pre-defined Cleavage

The following pre-defined cleavages are supported in the current version of ProFound:

  Name                   Cleave  Don't Cleave  Cleave At   Modification
                                   Next To
  Endoproteinase Arg C      R          P       C-terminal
  Endoproteinase Asp N      D                  N-terminal
  Endoproteinase Glu C      E          P       C-terminal
  Endoproteinase Lys C      K          P       C-terminal
  CNBr                      M                  C-terminal     +CH3S
  Trypsin                  KR          P       C-terminal
  V8 (D,E)                 DE          P       C-terminal
  V8 (E)                    E          P       C-terminal

Pre-defined Modifications

The following pre-defined modifications are supported in the current version of ProFound:

Complete Modifications:
  4-vinyl-pyridine (C)
  Acrylamide (C)
  Iodoacetamide (C)
  Iodoacetic acid (C)
  Performic acid (C+O3)
  Performic acid (M+O2)
  Performic acid (M+O)
 
Partial Modifications:
  4-vinyl-pyridine (C)
  Acrylamide (C)
  Iodoacetamide (C)
  Iodoacetic acid (C)
  Nitration (Y)
  Oxidation (M)
  Performic acid (C+O3)
  Performic acid (M+O2)
  Performic acid (M+O)
  Phosphorylation (S,T,Y)
  Phosphorylation (S,T)
  Phosphorylation (Y)

Probability

ProFound computes the normalized probability that a protein in a database is the protein being analyzed based on data, experimental conditions and other background information.

Protein Information and Sequence Analyse Tools

Click on symbol T to use sequence analysis tools contained in PROWL to further analyse the candidate protein sequence. Click on protein name to retrieve protein information from NCBI web site.

Protein Mass Range

Specify the estimated protein mass range of the sample protein. A narrower range provides a constraint to a search. When the molecular weight information is correct, it often improves the performance of the search.

Protein pI Range

Specify the estimated protein pI range of the sample protein. A narrower PI range provides a constraint to a search. When the PI information is correct, it is helpful to a search.

Sample ID

This is the place for you to put a note for the search.

Search Again Using Same Canditions with Unmatched Masses

Click on symbol ® to further identify proteins using the same set of search parameters with unmatched masses. This "subtraction method" is another way to search for multiple protein components in a sample (See also ProFound's mixture search).

Search Mode of Protein Components

Frequently, a protein sample contains a mixture of proteins. ProFound provides a variety of methods for the identification of the protein components in a mixture. These methods can be used in combination in searches.

Method A: ProFound can be explicitly specified to identify single protein, or multiple protein components in a single search. The current version of ProFound allows simultaneous identification of up to four protein components in a mixture. Method B: After a search, ProFound can search for the additional protein component(s) using unmatched peptide masses.

Taxonomic Category

A representation of a phylogenic tree is provided through which the user can specify the origin of the sample protein, if known. The taxonomic categories are based on NCBI's tree. Use of the correct taxonomic information can increase search speed and make the search result more reliable. The following are the taxonomic categories implemented.

    All taxa
    Archaea (Archaebacteria)
    Bacteria (Eubacteria)
    Firmicutes (gram-positive bacteria)
    Bacillus subtilis
    Mycoplasma
    Other Firmicutes (gram-positive bacteria)
    Proteobacteria (purple non-sulfur bacteria)
    Enterobacteria
    Escherichia coli
    Other Enterobacteria
    Other Proteobacteria (purple non-sulfur bacteria)
    Other Bacteria
    Eukaryota (eucaryotes)
    Dictyostelium discoideum
    Fungi
    Pneumocystis carinii
    Saccharomyces cerevisiae (baker's yeast)
    Schizosaccharomyces pombe (fission yeast)
    Other Fungi
    Metazoa (animal)
    Caenorhabditis elegans
    Chordata (chordates)
    Fugu rubripes (pufferfish)
    Danio rerio (zebrafish)
    Mammalia (mammals)
    Primates
    Homo sapiens (human)
    Other Primates
    Rodentia (rodents)
    Mus musculus (house mouse)
    Rattus
    Other Rodentia (rodents)
    Other Mammalia (mammals)
    Xenopus laevis (African clawed frog)
    Other Chordata (chordates)
    Drosophila (fruit flies)
    Other Metazoa (animal)
    Plasmodium falciparum (malaria parasite P. falciparum)
    Viridiplantae (green plants)
    Arabidopsis thaliana (thale cress)
    Oryza sativa (rice)
    Other Viridiplantae (green plants)
    Other Eukaryotes (eucaryotes)
    Viroids
    Viruses
    Hepatitis C virus
    Other Viruses
    Others
    Unclassified

Top Candidates

Specify the number of top candidate proteins in the output display.

User Defined Cleavage

When the undergone cleavage applied is not on the list of "pre-defined" cleavages, , you can set up a user-defined cleavage. To do so, you must explicitly press the radio button in front of the text of "User-defined". Then, you should choose cleavage site(s) (amino acid) and the position (N or C terminal). You can also set site(s) next to which a cleavage should NOT occur. And, finally, you can set modification that is associated with this cleavage by typing in the chemical formula for the modification (see Chemical Formula for Modifications).

User Defined Modifications

Up to two "user-defined" modifications with different chemical formulas can be set as a supplement to the "pre-defined" modifications. A "user-defined" modification includes modification site(s), a modification formula, and a flag to indicate whether it is a complete or partial modification. There is a limit on the number of partial modifications: four (4) - sum of "pre-defined" and "user-defined" partial modifications.

Z Score

ProFound calculates the probability that a candidate in a database search is the protein being analyzed. However, it is not easy to cast the calculated probability into the common language of traditional statistics. Here, as an indicator of the quality of the search result, a Z score is estimated when the search result is compared against an estimated random match population. Z score is the distance to the population mean in unit of standard deviation. It also corresponds to the percentile of the search in the random match population. For instance, a Z score of 1.65 for a search means that the search is in the 95th percentile. In other words, there are about 5% of random matches that could yield higher Z scores than this search. Conceptually, this “95th percentile” is different from “95% confidence” that the search is a correct identification.

The following is a list for Z score and its corresponding percentile in an estimated random match population:

       Z    percentile
     1.282     90.0
     1.645     95.0
     2.326     99.0
     3.090     99.9
 

 

.