NIST Peptide Mass Spectral Libraries

:: Biochemical Reference Data for Protein and Proteome Analysis

peptide MS/MS spectrum


Goal of This Project

Purpose of This Site

  1. Provide download access to the libraries.
  2. Provide download access to the latest NIST software for use with the libraries.
  3. Provide links to other related projects and collaborators.

Project Description

Peptide fragmentation mass spectra are used to identify peptides, and by inference proteins in biological samples. These spectra are commonly produced in large numbers by researchers in the field of proteomics using mass spectrometers to catalog (or sequence) proteins present in a sample. Equally as common are experiments designed to compare two or more samples for differentially expressed proteins. An example of the latter would be a so-called "biomarker discovery" experiment, in which the differentially expressed proteins (e.g. expressed in a cancer sample but not a healthy control) might be pursued as candidates with diagnostic or therapeutic potential.

Currently, interpretation of peptide fragmentation mass spectra is accomplished using numerous computer algorithms which rely on protein sequence to generate a theoretical spectrum for comparison. While this approach has worked well in the absence of libraries of empirical spectra, the goal of this project is to develop such libraries for all biologically relevant peptides, thereby providing a common basis for interpretation of unknown peptide mass spectra and improving the reproducibility of those interpretations.

Background Summary

This project was initiated in 2004 as an offshoot of the NIST/EPA/NIH Mass Spectral Library, a widely used product for the identification of small molecules by GC-MS analysis. The underlying parallels between an EI library and a peptide MS/MS library make them amenable to similar data analysis methods and workflows. The contents of these libraries have been derived from the interpretation of millions of peptide tandem mass spectra from many sources.

The primary motivation for this project was to enable the advantages of spectral library searching for peptide identification (see advantages below). It is hypothesized that improving the sensitivity and robustness of peptide spectrum interpretation by computer algorithms will reduce variability, leading to a reduction in both false positive and false negative identifications.

The rapid development of the peptide libraries is enabled by:
  1. The predictable fragmentation of peptides in a mass spectrometer.
  2. The availability of peptide tandem mass from community and internal sources.
  3. Computer algorithmsexternal weblink that automate the interpretation of the mass spectra.
The use of a spectral library for identification of peptides has advantages over current sequence-based search methods.
  1. Speed because the reference spectra are known a priori thus removing the need to calculate them from protein sequence before matching. The search space is also limited to previously identified peptides, reducing the number of candidate spectra per query.
  2. 'Spectrum-spectrum' comparisons can be more discriminating, leading to improved separation of true and false matches. This is due in part to the use of intensity values (i.e., peak heights), the occurence of non-cannonical peaks, and the use of spectrum annotations during scoring.

Intended Use

This collection is intended primarily to demonstrate the utility of peptide ion fragmentation libraries by enabling the development of software applications that use them. However, in some cases (E. coli, human and yeast) the libraries are extensive enough to be employed for practical use. Emphasis has been placed on quality control, since an error in a library can be widely propagated. For example, library spectra containing significant impurity peaks can falsely identify the impurity in an analysis as the library peptide. On the other hand, to maximize completeness and minimize false negative results (peptide ions not in library), extensive extraction methods were employed. These methods combined results of several sequence search engines for initial peptide identification. Spectra were annotated in detail to aid spectrum library scoring as well as to document the origin of the spectrum. It is important to note that none of these collections are complete. Additional measurements are needed to increase both the coverage and the quality of the library.


Get involved!

This is a community-based program, and all of the peptide libraries are distributed free of charge. If you wish to contribute spectra to an existing library or suggest we build a new library, drop us a line. At the moment, we are seeking donations of raw data from all human sample types, as well as data from model organisms.


Peptide Mass Spectral Libraries (Rel. 3.0, Latest)

Be sure to download the library corresponding to the instrument type you are using.

Copyright information: "These peptide mass spectral libraries are protected by copyright law and may not be re-distributed without a valid Distribution Agreement. To receive such an agreement, contact the Standard Reference Data Program at the National Institute of Standards and Technology by email data at nist.gov or call 301-975-2008."

How to reference these libraries.

Build Date Download* Seq. File† Library
BrowserNEW
Library Organism Instrument Exps.‡ Spectra Peptides Coverage
By species
Feb. 04, 2009 ASCII NIST SpectraST FASTA Browse human H. sapiens it 348 261,778 158,522 16%
Jul. 22, 2008 ASCII NIST SpectraST FASTA Browse human H. sapiens qtof 61 12,473 10,139 1%
Apr. 29, 2009 ASCII NIST SpectraST FASTA Browse mouse M. musculus it 117 131,628 78,613 8%
Jul. 14, 2008 ASCII NIST SpectraST FASTA Browse drosophila D. melanogaster it 97 96,542 62,162 13%
Jun. 26, 2009 ASCII NIST SpectraST FASTA Browse C.elegans C.elegans it 2 81,177 49,754 12%
May 04, 2009 ASCII NIST SpectraST FASTA Browse yeast S. cerevisiae it 63 87,676 52,076 23%
May 06, 2009 ASCII NIST SpectraST FASTA Browse yeast S. cerevisiae qtof 5 3,176 2,960 2%
May 21, 2009 ASCII NIST SpectraST FASTA Browse E. coli E. coli it 42 54,479 32,480 23%
May 21, 2009 ASCII NIST SpectraST FASTA Browse rat R. norvegicus it 25 20,992 15,206 2%
Individual Protein Standards and Mixtures
Sept. 26, 2008 ASCII NIST SpectraST FASTA Browse NCI20 Std. Proteins it 301 3,903 1,892 35%
Apr. 15, 2009 ASCII NIST SpectraST FASTA Browse chicken (egg) G. gallus it 42 4,472 2,387 0%
Jul. 09, 2008 ASCII NIST SpectraST FASTA Browse Sigma UPS1 Std. Proteins it 20 3,542 1,838 89%
May 06, 2009 ASCII NIST SpectraST FASTA Browse serum albumin H. sapiens it 26 2,487 1,099 100%
May 06, 2009 ASCII NIST SpectraST FASTA Browse BSA B. tarus it 8 969 458 99%
May 06, 2009 ASCII NIST SpectraST FASTA Browse beta-2-micoglobulin H. sapiens it 2 179 122 100%
Jul. 15, 2008 ASCII NIST SpectraST FASTA Browse aurum Std. Proteins tof-tof 1 596 545 9%
May 06, 2009 ASCII NIST SpectraST FASTA Browse c-reactive protein H. sapiens it 1 94 57 77%

*See the 'Search Engines' section below for selecting the appropriate format for your software.
FASTA (protein sequence) files used to construct the spectral libraries. These are for REFERENCE ONLY.
An experiment is often represented by many LC-MS/MS data files.

A note on modifications: Post-translational modifications (or biological PTMs) and other chemical derivatives of peptides have been limited to the following (PSI-MS names) in the above mass spectral libraries:

We are currently evaluating candidate peptide spectra containing additional mods for the libraries.

Documentation and Technical Information


Search Engines and Software


About Us


link to NIST homepage link to CSTL homepage NIST statement on privacy, security, and accessibility.


Inquires concerning this site and/or its contents should be addressed to paul.rudnick@nist.gov.