Proteomics produces enormous amounts of data, which can be very complex to analyze and interpret. The free software platform MaxQuant has proven to be invaluable for data analysis of shotgun proteomics over the past decade. Now, Jürgen Cox, group leader at the Max Planck Institute of Biochemistry, and his team present the new version 2.0. It provides an improved computational workflow for data-independent acquisition (DIA) proteomics, called MaxDIA. MaxDIA includes library-based and library-free DIA proteomics and permits highly sensitive and accurate data analysis. Uniting data-dependent and data-independent acquisition into one world, MaxQuant 2.0 is a big step towards improving applications for personalized medicine.
Proteins are essential for our cells to function, yet many questions about their synthesis, abundance, functions, and defects still remain unanswered. High-throughput techniques can help improve our understanding of these molecules. For analysis by liquid chromatography followed by mass spectrometry (MS), proteins are broken down into smaller peptides, in a process referred to as "shotgun proteomics". The mass-to-charge ratio of these peptides is subsequently determined with a mass spectrometer, resulting in MS spectra. From these spectra, information about the identity of the analyzed proteins can be reconstructed. However, the enormous amount and complexity of data make data analysis and interpretation challenging.
Two ways to analyze proteins with mass spectrometry
Two main methods are used in shotgun proteomics: Data-dependent acquisition (DDA) and data-independent acquisition (DIA). In DDA, the most abundant peptides of a sample are preselected for fragmentation and measurement. This allows to reconstruct the sequences of these few preselected peptides, making analysis simpler and faster. However, this method induces a bias towards highly abundant peptides. DIA, in contrast, is more robust and sensitive. All peptides from a certain mass range are fragmented and measured at once, without preselection by abundance.
As a result, this method generates large amounts of data, and the complexity of the obtained information increases considerably. Up to now, identification of the original proteins was only possible by matching the newly measured spectra against spectra in libraries that comprise previously measured spectra.
Combining DDA and DIA into one world
Jürgen Cox and his team have now developed a software that provides a complete computational workflow for DIA data. It allows, for the first time, to apply algorithms to DDA and DIA data in the same way. Consequently, studies based on either DDA or DIA will now become more easily comparable. MaxDIA analyzes proteomics data with and without spectral libraries. Using machine learning, the software predicts peptide fragmentation and spectral intensities. Hence, it creates precise MS spectral libraries in silico. In this way, MaxDIA includes a library-free discovery mode with reliable control of false positive protein identifications.
Furthermore, the software supports new technologies such as bootstrap DIA, BoxCar DIA and trapped ion mobility spectrometry DIA. What are the next steps? The team is already working on further improving the software. Several extensions are being developed, for instance for improving the analysis of posttranslational modifications and identification of cross-linked peptides.
Enabling researchers to conduct complex proteomics data analysis
MaxDIA is a free software available to scientists all over the world. It is embedded in the established software environment MaxQuant. "We would like to make proteomics data analysis accessible to all researchers", says Pavel Sinitcyn, first author of the paper that introduces MaxDIA. Thus, at the MaxQuant summer school, Cox and his team offer hands-on training in this software for all interested researchers. They thereby help bridging the gap between wet lab work and complex data analysis.
Sinitcyn states that the aim is to "bring mass spectrometry from the Max Planck Institute of Biochemistry to the clinics". Instead of measuring only a few proteins, thousands of proteins can now be measured and analyzed. This opens up new possibilities for medical applications, especially in the field of personalized medicine.