Introducing Renku: a platform for reproducible, reusable and collaborative data science
Recreating exactly the result of a data analysis should be easy because, after all, it has been done once before — by a computer! In reality, those of us who work with data often cannot replicate even our own results several months later, much less reproduce work done by others. The reason for this is also clear: repeating an analysis requires keeping careful records of the steps and the exact execution environment.
This record keeping can be tedious if done manually, but, in recent years, a rich ecosystem of freely available tools facilitating reproducibility has flourished. The adoption of these tools has been slow, however, since a particular technical skill set is necessary to use them, and for many scientists and data analysts, the time investment required is simply too large.
At the Swiss Data Science Center, we are building a platform called Renku that aims to lower the barriers to leveraging this ecosystem. Our vision is to provide a platform where data analysis projects can be discussed, repeated, and verified, and split up into components which can be individually shared, reused, and recombined; we aim to do all this while imposing as small of a disruption as possible to the user’s established way of working. We will try to illuminate the concepts and technology behind Renku and demonstrate its use through a series of articles, starting with this one.