#surveycomparison #representationbias
New #R-package out now!
"sampcompR" provides functions to easily compare surveys against benchmark surveys (e.g. for bias estimation) on a univariate, bivariate, and multivariate level.
By Björn Rohr & Barbara Felderer
https://bjoernrohr.github.io/sampcompR/
Confidence intervals and p-values can be either calculated with normal, parametric methods or as bootstrap confidence intervals and p-values and additionally adjusted for multiple testing.
@tehstu I don't know how to get Microsoft to fix that, but I *do* know my solution to that whole fiasco last year... was to load Linux and never looking back since.
@jpeelle The Outlook mail REST API is probably a good place to start
https://learn.microsoft.com/en-us/graph/api/resources/mail-api-overview?view=graph-rest-1.0
Edit: sorry didn't notice you were looking for a premade software... this requires a bit of programming maybe but it shouldn't be overly complex
A few weeks ago I asked everyone for some examples of fictional maths teachers 👩🏫✖️➗
Earlier this week we released a @sci_burst episode all about maths anxiety & maths teachers in #popculture, touching on some of your suggestions!
✨ Link in replies ✨
How a misinterpretation of a BMJ publication from 1996 caused (and yes, I do mean caused) the explosion in rates of peanut allergy.
In a nutshell: for decades the guidance issued in the USA and UK was the opposite of what it should have been and left a generation with preventable, life-threatening allergies.
https://news.harvard.edu/gazette/story/2024/10/excerpt-from-blind-spots-by-marty-makary/
@pyconasia @mariatta
I was told that being a woman is just being a bride.
But I was rebellious and I went to study.
A question about data management during (bioinformatics) analysis:
do you use any of:
- git annex
- DataLad
- something else? (git LFS...)
Why?
I see several hurdles with directly using git-annex, such as the need to unlock files before modifying them (useful for raw data, inconvenient for intermediate data).
@janeadams I disagree with that (to a point). Maybe it doesn't count towards your graduation requirements, but doing a PhD is first of all about growing as an independent researcher, developing your critical thinking and finding your way in (work) life.
Requirements are there and you should of course complete them, but I suspect those other things will be helpful in the future (of course if you spend 90% of your time on things outside your PhD project, maybe you should change your PhD topic!)
@michaele @amckinstry Those machines also have a phone number you can call to pay if you don't want to use an app.
Which scientific publishers/journals are worst affected by fraudulent or dubious research papers, and which have done least to clean up their portfolio?
A science-integrity startup called Argos says it has answers.
There are quite a few integrity tools now that look for red flags in papers, but this is the first to go public with what it's finding across journals and publishers.
Here's my exclusive look at their figures.
https://www.nature.com/articles/d41586-024-03427-w
“tinytable is a small but powerful R package to draw beautiful tables in a variety of formats: HTML, LaTeX, Word, PDF, PNG, Markdown, and Typst. The user interface is minimalist and easy to learn, while giving users access to powerful frameworks to create endlessly customizable tables.” - @vincentab
@terence There's quite a lot of data available in the wild, for example those from this paper https://pmc.ncbi.nlm.nih.gov/articles/PMC10088239/ which you can find here https://zenodo.org/records/6025935
[New paper]: Two subtle problems with over-representation analysis.
ORA is a type of enrichment analysis that analyses over-represented functional categories in gene lists. These tools have accumulated ~190k citations, but they have subtly different behaviours. Here we unpack the differences and investigate two subtle problems in some implementations, which may have negatively impacted those 190k research papers.
https://doi.org/10.1093/bioadv/vbae159
#genomics #bioinformatics
I’m a software developer with a bunch of industry experience. I’m also a comp sci professor, and whenever a CS alum working in industry comes to talk to the students, I always like to ask, “What do you wish you’d taken more of in college?”
Almost without exception, they answer, “Writing.”
One of them said, “I do more writing at Google now than I did when I was in college.”
I am therefore begging, begging you to listen to @stephstephking: https://mstdn.social/@stephstephking/113336270193370876
I've created a little schematic on basic Git/GitHub usage.
Feel free to reuse! (CC-BY-NC-4.0)
Six tips for going public with your lab’s software: https://www.nature.com/articles/d41586-024-03344-y
1) make time for maintenance
2) simplify installation
3) add a GUI or good CLI
4) good documentation
5) use github/git
6) automated testing
Any other tips people have? #SoftwareEngineering #opensource #openscience
@computingnature
These cover a lot of common problems I have with scientific code, one minor one to add is "build more, smaller things" - splitting up a very large monolithic code base into several smaller pieces can work with all the above tips to make the software much more useful over time. Being able to re-use pieces between projects without needing to make future projects direct dependents of humongous prior packages is a huge deal for labs that might make many tools.
@mccarthymg @jimbob @djnavarro Yes, I teach a lot of biologists who are completely scared of programming. A student the other day told me they get anxious whenever they get a red line in the console in R Studio (in that particular instance, we were actually installing a package and all these red lines continue popping out although there was no error... grrrr).
A great thing to do when dealing with non-technical audiences, is to make coding mistakes on purpose so that you can analyse the error messages that you get with them. You demistify the error message and they start to be a bit more comfortable with the whole programming thing. And then you get them to read the manual, or Google the error code, or hell even use chatgpt if they're onto that. But the important thing is you have them engage with their code.
Another thing that I often see in my field is that people often have complex data, and they want to analyse it and get a results out of it. They don't care about programming details, or how the underneath algorithms work, and in a sense I understand them. However, you need a good balance between the two if you are to trust your own results.
#academicchatter I read a published paper on the other day and noticed a panel in the figure is clearly duplicated but by mistake, I knew this because the authors uploaded the raw data as table and the table has a complete set of numbers that doesn't match the figure panel. Importantly the correct data didnot change the conclusion. This
#openscience approach either enforced by publishers or volunteered by authors maintained the trust I have with the finding they reported.
"My paper was proved wrong. After a sleepless night, here’s what I did next"
column by @oaggimenez describing how to gracefully react when mistakes are discovered in ones published papers.
Senior lecturer at the Zhejiang-Edinburgh Joint Institute (ZJE) and Edinburgh University.
Undergraduate Programme Director, Biomedical Informatics at ZJE.
I teach #imageanalysis & #dataanalysis with #RStats & #python. I study #heterogeneity in #pituitary (and other) cells.
I'm also very interested in #reproducibility and #openscience.