I'm pleased to share a beta version of SciDataFlow, a command-line tool to track changes to data in research projects, push and pull data to remote repositories like Zenodo and FigShare, concurrently download tons of data, and more! github.com/vsbuffalo/scidatafl

Follow

@vsbuffalo This is such a cool idea I'm looking into getting it going on my current project!

I was wondering how you would recommend treating symbolic links to data. I have some large, locally stored datasets that I am using for different projects. To save disk space while making it clear what datasets are being used in each project, I put symbolic links to the dataset into a data subdirectory of each project where the data is needed.

I'm guessing that I should add the original locations of the data to each SDF repository that uses the data? I am also thinking that SDF will help a lot with this workflow!

@askennard one feature I want to add is local caching for shared servers! I think it would handle this issue well. It’s a bit of work to add this feature though so may be some time until I get to it.

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.