I tried opening an RStudio project from 2017 for which I used {packrat}, and I was disappointed—but not surprised—that it didn't work. I'm becoming convinced that "reproducible analyses" are a pipe dream.

Before anybody recommends {renv} I'm going to save you the energy; I can't even get that package working, on a new project, right now. I can't even begin to imagine an analysis built on that working five years later.

@eamon are we really gonna end up Dockerizing every scientific computation. Is that where we're headed

@clayote Yes, we are, with all the bloat and friction that entails.

And then, we're going to deal with all of the ways that the average Dockerfile blows up (if I had a nickel for every "pip install" I've seen inside one of those I'd be able to buy a cup of coffee at today's prices).

@eamon @clayote Not containers if I have a say in the matter!

I encourage folks to distinguish analysis scripts (preprocess a particular dataset, make a graph) and software (general purpose, users). Software can be fine in a container: it's used as-is, not altered each time. Scripts are specialized, so should be transparent and stable (base R, minimal dependencies). Ideally the key parts of scripts (eg, a model specification) can be found & understood by others, even without ruining the script.

@JosetAEtzel @eamon @clayote If only there were ISO- or ANSI-standard compiled languages in wide usage for scientific computing... :/

Configuration management and managing the software lifecycle is not easy or particularly fun, especially for subject-matter-experts who write code only as a means to an end. Worse still when nobody in an organization has any experience with formal V&V or code qualification. I don't have a solution that academics and researchers and engineers will pay attention to because it requires rigor in an area they just don't care that much about.

Follow

@arclight @JosetAEtzel @eamon @clayote Could you explain why you think language is a problem? I think it's the need to rely on large amounts of ever changing packages, which I don't think is something we can realistically change. It's all fun and nice to keep dependencies to a minimum until you can't anymore because otherwise your project won't be finished.

@nicolaromano @arclight @JosetAEtzel @clayote 100%. I think we should approach this like good engineers and acknowledge that there is a tradeoff being made: by delegating the uninteresting parts of our analyses to packages, we can do more complex things more quickly, but this comes at the cost of robustness. Web people are making the same decision when they adopt large JS frameworks and a bunch of stuff from npm.

@nicolaromano @arclight @JosetAEtzel @clayote there's nothing wrong with prioritizing speed of development or robustness, they're both right for certain situations. We only really get ourselves into trouble when we "choose" one path without acknowledging that we've actually made a choice.

@nicolaromano @arclight @JosetAEtzel @clayote and I think that's my issue with packrat, renv, conda, and even Docker: they're all promising shortcuts, telling you that you can have everything, and you can get it for free. But there is always a tradeoff. It might introduce an abstraction that leaks in inconvenient ways. It might just be a performance penalty. Just make sure you understand the cost before you sign a contract.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.