We describe a strategy for de novo peptide sequencing based on matched pairs of tandem mass spectra (MS/MS) obtained by collision induced dissociation (CID) and 351 nm ultraviolet photodissociation (UVPD). Each precursor ion is isolated twice with the mass spectrometer switching between CID and UVPD activation modes to obtain a complementary MS/MS pair. To interpret these paired spectra, we modified the UVnovo de novo sequencing software to automatically learn from and interpret fragmentation spectra, provided a representative set of training data. This machine learning procedure, using random forests, synthesizes information from one or multiple complementary spectra, such as the CID/UVPD pairs, into peptide fragmentation site predictions. In doing so, the burden of fragmentation model definition shifts from programmer to machine and opens up the model parameter space for inclusion of nonobvious features and interactions. This spectral synthesis also serves to transform distinct types of spectra into a common representation for subsequent activation-independent processing steps. Then, independent from precursor activation constraints, UVnovo's de novo sequencing procedure generates and scores sequence candidates for each precursor. We demonstrate the combined experimental and computational approach for de novo sequencing using whole cell E. coli lysate. In benchmarks on the CID/UVPD data, UVnovo assigned correct full-length sequences to 83% of the spectral pairs of doubly charged ions with high-confidence database identifications. Considering only top-ranked de novo predictions, 70% of the pairs were deciphered correctly. This de novo sequencing performance exceeds that of PEAKS and PepNovo on the CID spectra and that of UVnovo on CID or UVPD spectra alone. As presented here, the methods for paired CID/UVPD spectral acquisition and interpretation constitute a powerful workflow for high-throughput and accurate de novo peptide sequencing. (Figure Presented).
ASJC Scopus subject areas
- Analytical Chemistry