Proteomics data analysis using R data.table

In a recent Twitter poll posted by the Young Proteomics Investigator Club, YPIC was asking for which topic the early career researchers wanted to see at a proteomics educational day – and the answer was clear: an introduction to data analysis in R

Of course, this was by no means a representative survey – and yet, it confirmed that there’s an explosion of interest in how to analyze data using powerful scripting and coding languages such as R and Python. The reasons for that are broad – in contrast to Excel, Perseus or other GUI-based analysis software, data analysis in R allows user to:

  • Generate reproducible code that can easily be rerun after slight adjustments of input data or upstream processing steps
  • Gain flexibility in their analysis tools, either by using one of the many powerful packages, code shared online, or by just scripting it yourself
  • Process big data that causes other software to crash or cannot even be opened

Eventually, all of these were reasons that I finally forced myself to get familiar with R in the 2nd year of my PhD. Until then, I had traveled well with the fantastic statistical analysis software Perseus, developed by the Cox lab with the specific purpose of analyzing proteomics data. But after countless times of having to redo the same analyses steps after changing a single parameter; after countless screams of fury when too many chained calculations caused the software to crash; and after the final realization that I needed to do linear modeling for a project, I knew the time had come. I signed up for an intensive course in R that taught how to perform statistical analysis and visualization in R in an example-oriented manner. Little did I know that this decision would lock me onto a path of becoming a pawn in a battle of epic proportions…

Going forward, I hope to share insights here on my journey into proteomics analysis with R, with a specific focus on data table. Are there specific topics you would like to see addressed? Drop me a message via the contact form or on Twitter. And for now – stay tuned!

One Reply to “Proteomics data analysis using R data.table”

Leave a Reply

Your email address will not be published. Required fields are marked *