NIST Libraries of Peptide Tandem Mass Spectra

  Frequently Asked Questions (FAQs)

  1. I already use Mascot, SEQUEST, OMSSA, X!Tandem, etc. why should I be interested in spectral libraries?
  2. Does your website allow me to search the libraries with my unidentified spectra?
  3. I already know the peptide I am interested in, can you show me the spectrum so I can design a targeted assay?
  4. What do I need to search the library once I've downloaded one?
  5. Where do you get the data to build the libraries?
  6. How do I calculate a false discovery rate (FDR) for my set of identifications by spectral library searching?
  7. I downloaded MSPepSearch and am looking at my first result set. Since I haven't yet done an FDR analysis, what is a reasonable Score threshold?
  8. What is the coverage of your libraries?
  9. Your human library has only 22% coverage and I'm doing biomarker discovery, how do I know my peptides are in your library?
  10. I want to donate my spectra in support of more complete libraries. What do I need to do?

  1. I already use Mascot, SEQUEST, OMSSA, X!Tandem, etc. why should I be interested in spectral libraries?

    Spectral library searching uses an algorithm that can take advantage of measured peak heights (empirical data). The spectral libraries are also smaller than an in silico digest of a whole proteome sequence file. These factors may make spectral library searching more sensitive and much faster than traditional sequence searching for your particular applications. Mass spectral library searching has also been around longer than proteomics and is the "gold standard" for interpreting unknown spectra in other fields of mass spectrometry (e.g., GC-MS); only recently has enough data been compiled and analyzed to generate reference libraries for peptides.

  2. Does your website allow me to search the libraries with my unidentified spectra?
  3. No. You will need to download a library search software package to do that. Look here for some choices. The website offers download and browsing of the libraries by protein or peptide but does not allow searching of spectra. The SpectraST siteexternal weblink at the Institute for Systems Biology does allow you to do that.

  4. I already know the peptide I am interested in, can you show me the spectrum so I can design a targeted assay?
  5. Yes, by using the web-based library browser utility for the organism and instrument type you are interested in (e.g., human [ion trap]).

  6. What do I need to search the library once I've downloaded one?

    You will additionally need a spectral library search engine; a few are listed here. We'd reccomend either MSPepSearch (NIST) or SpectraST (ISB)external weblink since these were designed for use with the NIST libraries. These programs take a large set of MS/MS spectra (e.g., in mzXML or MGF format), such as those generated by LC-MS/MS analysis of a tryptic digest, and score them against the library spectra in "batch" mode, returning zero or more matches per unknown spectrum. These tools typically operate on the command-line and are therefore good for integrating into data analysis pipelines, or have been wrapped by web or other graphical user interfaces.

  7. Where do you get the data to build the libraries?
  8. We collect data files from both internal and external sources. The libraries were built from a large collection of >30 million spectra collected by analyzing a wide range of sample types. We annotate each dataset and store the information in a database and within data records to maintain traceability.

    You can view selected annotation information by clicking on the Exps./Samples link on the main page for the library of interest. For example, here is the human ion trap library annotation page. You can also download text files with this information from our ftp site.

  9. How do I calculate a false discovery rate (FDR) for my set of identifications by spectral library searching?
  10. You may use a "target-decoy" approach similar to what is used for sequence searching (Elias and Gygi, Nat. Meth., Mar. 2007external weblink). This has been demonstrated by generating decoy spectra for spectral library searches by Lam et al, J. Prot. Res., Jan. 2010external weblink. Additionally, any set of non-overlapping spectra (e.g, from another organism) may be used as decoy by adjusting for any "target-decoy" bias. To generate the figure below, a set of human spectra were searched against a combined library of human spectra and non-overlapping, non-human spectra chosen at random from other libraries. The black bars represent the fraction of matches to the human spectra and the red to decoy spectra. At ranks >3 for any set of matches, the matches are expected to be near random. Therefore, any deviation from 50:50 can be described as a "target-decoy" bias. In the example below the bias factor would be roughly 62/38 or 1.6 in favor of the target spectra. This value can then be used to scale the number of decoy (false positive) matches when calculating a FDR.

  11. I downloaded MSPepSearch and am looking at my first result set. Since I haven't yet done an FDR analysis, what is a reasonable Score threshold?
  12. While the absolute Score threshold is subject to change based on the size and quality of your dataset, a Score of 450 is a reasonable starting point. This value will frequently approximate a false discovery rate of ~1% for a routine shotgun analysis of tryptic peptides on an ion trap mass spectrometer.

  13. What is the coverage of your libraries (for a given whole proteome fasta file)?
  14. Coverage value, calculated by mapping all of the peptides exhaustively to the fasta file used to build a library, can be viewed in the browser by clicking on 'Library statistics' on the upper navigation bar of the on-line Browser.

  15. Your human library has only 22% coverage and I'm doing biomarker discovery, how do I know my peptides are in your library?
  16. While there is no guarantee that a particular peptide will be in the library (and therefore not found when searching), the libraries have been populated by many mass spec experiments (>500). And while the coverage seems "low" in relation to all protein sequences, the limits of the mass spec to sample low abundance peptides is the current bottle neck. Or it may be true that your data represent a tissue for which we have little or no data. In these cases, the tissue specific peptides may not be in the library. We are working hard to collect data from a variety of new sources to remedy the "false negative" problem.

    However, for routine 1D or 2D analyses of human plasma, yeast or E. coli (well populated libraries) you will likely find very few peptides that are not in the library and may be surprised to find more peptides than seen with traditional sequence searching because of the differing search methods. To be safe, we reccommend combining library searching with sequence searching until you can evaluate the performance of library searching in your experimental workflow.

  17. I want to donate my spectra in support of more complete libraries. What do I need to do?
  18. Peptide tandem mass spectra can be donated in RAW or peaklist format for almost any instrument type and for many organisms. All you have to do is contact us and we'll send you simple instructions and a few brief questions. The answers to your questions will help us search your data. We will be glad to reference your contribution in the next edition of the library, or, if you prefer, you may keep your name and sample information anonymous.


link to NIST homepage link to CSTL homepage NIST statement on privacy, security, and accessibility.