Programming tools: Adventures with R
- sangwoo74
- 2015년 1월 4일
- 3분 분량
Programming tools: Adventures with R
A guide to the popular, free statistics and visualization software that gives scientists control of their own data analysis. (http://www.nature.com/news/programming-tools-adventures-with-r-1.16609)

For years, geneticist Helene Royo used commercial software to analyse her work. She would extract DNA from the developing sperm cells of mice, send it for analysis and then fire up a package called GeneSpring to study the results. “As a scientist, I wanted to understand everything I was doing,” she says. “But this kind of analysis didn’t allow that: I just pressed buttons and got answers.” And as Royo’s studies comparing genetic activity on different chromosomes became more involved, she realized that the commercial tool could not keep up with her data-processing demands.
With the results of her first genomic sequencing experiments in hand at the start of a new postdoc, Royo had a choice: pass the sequences over to the experts or learn to analyse the data herself. She took the plunge, and began learning how to parse data in the free, open-source software package R. It helped that the centre she had joined — the Friedrich Miescher Institute for Biomedical Research in Basel, Switzerland — ran regular courses on the software. But she was also following a wider trend: for many academics seeking to wean themselves off commercial software, R is the data-analysis tool of choice.
Besides being free, R is popular partly because it presents different faces to different users. It is, first and foremost, a programming language — requiring input through a command line, which may seem forbidding to non-coders. But beginners can surf over the complexities and call up preset software packages, which come ready-made with commands for statistical analysis and data visualization. These packages create a welcoming middle ground between the comfort of commercial ‘black-box’ solutions and the expert world of code. “R made it very easy,” says Rojo. “It did everything for me.”
That, indeed, is what R’s developers intended when they designed it in the 1990s. Ross Ihaka and Robert Gentleman, statisticians at the University of Auckland in New Zealand, had an interest in computing but lacked practical software for their needs. So they developed a programming language with which they could perform data analysis themselves. R got its name in part from its developers’ initials, although it was also a reference to the most widely used coding language at the time, S.
.............................
“It’s common to see someone post a question and the person who developed the package answer within half an hour,” he says. This rapid response is key for scientists in basic research. “I can find an answer to almost any question online,” says Royo. She can confidently do most of her day-to-day data analysis herself, and she helps out less proficient colleagues. Still, “I google things every day”, she adds. Learning R, says Royo, has not only taught her coding skills, but has also made her more critical about other scientists’ analyses.
Not every scientist is enthusiastic about learning the necessary programming — even though, says Ram, R is less intimidating than languages such as Python (let alone Perl or C). “There are going to be far more scientists that will be comfortable with click-and-drop interfaces than will ever learn to program at any time,” Muenchen says. Geneticist Rabih Murr, for example, took the same R course as Royo when he was a postdoc, but he did not invest as much time in practising. To get started and develop research-specific skills in R definitely requires a commitment: “It’s a matter of priorities,” he says. But after becoming a lab head at the University of Geneva in Switzerland this year, he is planning to hire someone with R experience.
Like any other skill, learning R cannot be done overnight. But Jennings says that it is worth it. “Make that time. Set it aside as an investment: for saving time later, and for building skills that can be used across multiple problems we face as scientists.”
Nature
517,
109–110
(01 January 2015)
doi:10.1038/517109a
Comments