@andrew
I noticed you're a fan of R, and I have to ask if you've played with Julia at all (the programming language, not some person 😂).

I teach a course that's mostly on R and wrote my first thesis in it+RMarkdown. Frankly, I have to say of the big data science/stats languages (python, R, julia) that julia really gets it the "most right" in most cases, though R is a decent alternative in certain cases.

The syntactic sugar and macros alone make julia super nice, and there are so many other handy tools, packages, and integrations (including with R and Python!) that I plug it to anyone else in the stats/etc space. If you want any more details, just let me know!

@johnabs @andrew what do you mean by Julia getting it the "most right?"

Follow

Long post ahead 

@daeyoung @andrew "Right" in terms of interacting with both computers and data, (and package management) it beats R and python IMO.

TL;DR up front: I've found I can accomplish much more with much less, and in a much less convoluted way with julia, and I've used R for 8 years, and julia for 3-4, and python intermittently in that time as well. Read on for why I like it.

Some simple syntax examples:

No tabs/spaces issues like python, every block starts with a signifier like function, if, else, etc. and ends with "end". This makes it very easy to track "delimiter" position and catch unclosed forms.

**Broadcasting functions** (write a function to operate on a scalar, then apply the following)
f(x)=2x
f(2)->4
f(1:2)-> Error, doesn't accept vectors
f.(1:2)->[2,4]

**Broadcasting the broadcast:** if you want to apply other broadcast operators with more complex functions, there's a macro for it:

y=[1,2]
(f(y)+12/)37 -> fails
(f.(y).+12)./37-> works
@.((f(y)+12)/37)-> works

So you can write expressions to operate on single values, test them accordingly, and have guarantees they'll apply correctly in an n-tensor context.

Oh, and speaking of n-tensor, you can easily replicate the outer product function in R with broadcasting, and you can still use any binary function as you can in R:

julia:
x=[1,2,3,4]
x.*x' returns a product matrix
vcat.(x,x') returns a matrix of paired x values.

R:
x=c(1,2,3,4)
outer(x,x)
the second example doesn't work in R with outer(x,x,c), and that data structure seems to need manual creation with the matrix function (and it behaves weird anyway).

**Mixed iterator loops:**
Instead of:

for i in 1:10
j in i+1:11
do something on i and j
end
end

We do:
for i in 1:10, j in i+1:11
do something on i and j
end

And you can add arbitrary amounts of iterators into the flattened statement to make it clear where the logic is happening, and which iterators belong where in nested loops.

**First class functions:** ease of closure construction, and lambdas are cleaner:

just lambdas:
R: {function(x){return x^2}}
python: (lambda x: x + 2)
julia: (x->x+2)

And lambdas can be nested for nested higher order functions. Like so:

map(x->filter(y->y[,1]!=2,x),data_frame_list) (or something like this)

**Making code parallelized** is trivial in most cases. Write a function, wrap it with the @everywhere macro and put it in pmap. The same function can be effortlessly compiled to cuda with the @CUDA macro, since almost all of julia and its packages is written in pure julia. (Side effect: no need to be a C++ expert for performance, and modifying dependencies for your work is easy.)

**Bonuses:**

Package management, creation, and code-sharing between machines is trivial with helpers like DrWatson, which is excellent for academic users, and per-package manifests. (And no pip/conda nightmares)

Dataframe support is excellent, and has similar functions to melt/cast built in. Adding to this, indexing rules make sense as compared to R's [[]], [], and the differences between $name and [,col] when working with tibbles.

Gadfly more or less == ggplot2 but with more features, and different back-ends like plotly are also available for plots.

Finally, multiple dispatch is pretty dang neat, and the type system doesn't "try to help" like it does in R which leads to all sorts of wacky bugs that are difficult to locate (like when R tries to help with lists/vecs/matrices,etc).

e.g.
function(x)=x^2 gets compiled to
function(x::Int)
function(x::Float)
function(x::Real)
etc.

But you can specify you only want one and you'll only get that one, and it also speeds up compilation to a degree.

If you decide you want all of them, the appropriate one is dispatched when required, and this doesn't have the same limitations of traditional overloading. As explained better [here](discourse.julialang.org/t/is-m).

**End**

I hope that helps explain my position, but much of this may just be preferences on my end in terms of how I think about solving problems, and how well this language comports with that. I do suggest giving it a try though, you may like what you find after getting over the initial friction of switching languages. 😄

Long post ahead 

@johnabs @andrew I love your detailed explanation. I used Julia for the last time >5 years ago and it seems to have changed a lot in the meantime. I love the idea of just-in-time compiling and not having to default to a lower-level language, but R has surely made me love C++ and now I use R as if it’s a C++ interpreter lol. Id love to see Julia grow and more people use Julia but still R seems too influential in statistics, for better or worse

Long post ahead 

@daeyoung @andrew I'm glad you liked it! I had it pegged at 35% of people liking it and 65% of it being ignored 😂

Personally I find C++ really really gross; hence my desire to use something that can provide near equivalent performance in certain contexts, but without the icky syntax, header files, and focus on OOP (which I try to avoid due to the issues with finding state-induced, silent, runtime bugs).

One minor point I will add: RCall and Pycall are very mature, so if you want the best of both/all worlds, you can use Julia, or R, for the bulk of the code and integrate R into Julia for ease of use and more familiar libraries or performant Julia into R so you can start bypassing the horrors of C++ 😱

Long post ahead 

@johnabs @andrew omg yes, the silent runtime errors when using OOP! This irks me a lot and hence I don’t really use classes. Also C++ classes are all about ownership and it becomes one hell of a mess when I want to do stuff with pointers and I can’t for the life of me bother learning smart pointers. That’s the line I draw ✍️

Hm maybe it’s time for me to play around more with Julia

Long post ahead 

@daeyoung @andrew My first programming language was java...I have a special hatred in my heart for OOP, and I'm glad things like scala/clojure etc. exist on the JVM to make it actually useful 😂

My number one recommendation before diving in is installing [DrWatson](juliadynamics.github.io/DrWats) to make package management easier and to prevent dependencies from clogging up your main environment, but reach out if you'd like more! I'm always happy to make conver- I mean, umm...show people alternatives...MUHAHAHAHAHAHA 😂

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.