@eamon are we really gonna end up Dockerizing every scientific computation. Is that where we're headed
@clayote Yes, we are, with all the bloat and friction that entails.
And then, we're going to deal with all of the ways that the average Dockerfile blows up (if I had a nickel for every "pip install" I've seen inside one of those I'd be able to buy a cup of coffee at today's prices).
@eamon @clayote Not containers if I have a say in the matter!
I encourage folks to distinguish analysis scripts (preprocess a particular dataset, make a graph) and software (general purpose, users). Software can be fine in a container: it's used as-is, not altered each time. Scripts are specialized, so should be transparent and stable (base R, minimal dependencies). Ideally the key parts of scripts (eg, a model specification) can be found & understood by others, even without ruining the script.
@arclight @JosetAEtzel @eamon @clayote Could you explain why you think language is a problem? I think it's the need to rely on large amounts of ever changing packages, which I don't think is something we can realistically change. It's all fun and nice to keep dependencies to a minimum until you can't anymore because otherwise your project won't be finished.
@nicolaromano @arclight @JosetAEtzel @clayote there's nothing wrong with prioritizing speed of development or robustness, they're both right for certain situations. We only really get ourselves into trouble when we "choose" one path without acknowledging that we've actually made a choice.
@nicolaromano @arclight @JosetAEtzel @clayote and I think that's my issue with packrat, renv, conda, and even Docker: they're all promising shortcuts, telling you that you can have everything, and you can get it for free. But there is always a tradeoff. It might introduce an abstraction that leaks in inconvenient ways. It might just be a performance penalty. Just make sure you understand the cost before you sign a contract.
@nicolaromano @arclight @JosetAEtzel @clayote 100%. I think we should approach this like good engineers and acknowledge that there is a tradeoff being made: by delegating the uninteresting parts of our analyses to packages, we can do more complex things more quickly, but this comes at the cost of robustness. Web people are making the same decision when they adopt large JS frameworks and a bunch of stuff from npm.