Follow

@andrew
I noticed you're a fan of R, and I have to ask if you've played with Julia at all (the programming language, not some person 😂).

I teach a course that's mostly on R and wrote my first thesis in it+RMarkdown. Frankly, I have to say of the big data science/stats languages (python, R, julia) that julia really gets it the "most right" in most cases, though R is a decent alternative in certain cases.

The syntactic sugar and macros alone make julia super nice, and there are so many other handy tools, packages, and integrations (including with R and Python!) that I plug it to anyone else in the stats/etc space. If you want any more details, just let me know!

@johnabs @andrew what do you mean by Julia getting it the "most right?"

Long post ahead 

@daeyoung @andrew “Right” in terms of interacting with both computers and data, (and package management) it beats R and python IMO.

TL;DR up front: I’ve found I can accomplish much more with much less, and in a much less convoluted way with julia, and I’ve used R for 8 years, and julia for 3-4, and python intermittently in that time as well. Read on for why I like it.

Some simple syntax examples:

No tabs/spaces issues like python, every block starts with a signifier like function, if, else, etc. and ends with “end”. This makes it very easy to track “delimiter” position and catch unclosed forms.

Broadcasting functions (write a function to operate on a scalar, then apply the following)
f(x)=2x
f(2)->4
f(1:2)-> Error, doesn’t accept vectors
f.(1:2)->[2,4]

Broadcasting the broadcast: if you want to apply other broadcast operators with more complex functions, there’s a macro for it:

y=[1,2]
(f(y)+12/)37 -> fails
(f.(y).+12)./37-> works
@.((f(y)+12)/37)-> works

So you can write expressions to operate on single values, test them accordingly, and have guarantees they’ll apply correctly in an n-tensor context.

Oh, and speaking of n-tensor, you can easily replicate the outer product function in R with broadcasting, and you can still use any binary function as you can in R:

julia:
x=[1,2,3,4]
x.*x’ returns a product matrix
vcat.(x,x’) returns a matrix of paired x values.

R:
x=c(1,2,3,4)
outer(x,x)
the second example doesn’t work in R with outer(x,x,c), and that data structure seems to need manual creation with the matrix function (and it behaves weird anyway).

Mixed iterator loops:
Instead of:

for i in 1:10
j in i+1:11
do something on i and j
end
end

We do:
for i in 1:10, j in i+1:11
do something on i and j
end

And you can add arbitrary amounts of iterators into the flattened statement to make it clear where the logic is happening, and which iterators belong where in nested loops.

First class functions: ease of closure construction, and lambdas are cleaner:

just lambdas:
R: {function(x){return x^2}}
python: (lambda x: x + 2)
julia: (x->x+2)

And lambdas can be nested for nested higher order functions. Like so:

map(x->filter(y->y[,1]!=2,x),data_frame_list) (or something like this)

Making code parallelized is trivial in most cases. Write a function, wrap it with the @everywhere macro and put it in pmap. The same function can be effortlessly compiled to cuda with the @CUDA macro, since almost all of julia and its packages is written in pure julia. (Side effect: no need to be a C++ expert for performance, and modifying dependencies for your work is easy.)

Bonuses:

Package management, creation, and code-sharing between machines is trivial with helpers like DrWatson, which is excellent for academic users, and per-package manifests. (And no pip/conda nightmares)

Dataframe support is excellent, and has similar functions to melt/cast built in. Adding to this, indexing rules make sense as compared to R’s [[]], [], and the differences between $name and [,col] when working with tibbles.

Gadfly more or less == ggplot2 but with more features, and different back-ends like plotly are also available for plots.

Finally, multiple dispatch is pretty dang neat, and the type system doesn’t “try to help” like it does in R which leads to all sorts of wacky bugs that are difficult to locate (like when R tries to help with lists/vecs/matrices,etc).

e.g.
function(x)=x^2 gets compiled to
function(x::Int)
function(x::Float)
function(x::Real)
etc.

But you can specify you only want one and you’ll only get that one, and it also speeds up compilation to a degree.

If you decide you want all of them, the appropriate one is dispatched when required, and this doesn’t have the same limitations of traditional overloading. As explained better here.

End

I hope that helps explain my position, but much of this may just be preferences on my end in terms of how I think about solving problems, and how well this language comports with that. I do suggest giving it a try though, you may like what you find after getting over the initial friction of switching languages. 😄

Long post ahead 

@johnabs @andrew I love your detailed explanation. I used Julia for the last time >5 years ago and it seems to have changed a lot in the meantime. I love the idea of just-in-time compiling and not having to default to a lower-level language, but R has surely made me love C++ and now I use R as if it’s a C++ interpreter lol. Id love to see Julia grow and more people use Julia but still R seems too influential in statistics, for better or worse

Long post ahead 

@daeyoung @andrew I'm glad you liked it! I had it pegged at 35% of people liking it and 65% of it being ignored 😂

Personally I find C++ really really gross; hence my desire to use something that can provide near equivalent performance in certain contexts, but without the icky syntax, header files, and focus on OOP (which I try to avoid due to the issues with finding state-induced, silent, runtime bugs).

One minor point I will add: RCall and Pycall are very mature, so if you want the best of both/all worlds, you can use Julia, or R, for the bulk of the code and integrate R into Julia for ease of use and more familiar libraries or performant Julia into R so you can start bypassing the horrors of C++ 😱

Long post ahead 

@johnabs @andrew omg yes, the silent runtime errors when using OOP! This irks me a lot and hence I don’t really use classes. Also C++ classes are all about ownership and it becomes one hell of a mess when I want to do stuff with pointers and I can’t for the life of me bother learning smart pointers. That’s the line I draw ✍️

Hm maybe it’s time for me to play around more with Julia

Long post ahead 

@daeyoung @andrew My first programming language was java…I have a special hatred in my heart for OOP, and I’m glad things like scala/clojure etc. exist on the JVM to make it actually useful 😂

My number one recommendation before diving in is installing DrWatson to make package management easier and to prevent dependencies from clogging up your main environment, but reach out if you’d like more! I’m always happy to make conver- I mean, umm…show people alternatives…MUHAHAHAHAHAHA 😂

Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.