@andrew
I noticed you're a fan of R, and I have to ask if you've played with Julia at all (the programming language, not some person 😂).
I teach a course that's mostly on R and wrote my first thesis in it+RMarkdown. Frankly, I have to say of the big data science/stats languages (python, R, julia) that julia really gets it the "most right" in most cases, though R is a decent alternative in certain cases.
The syntactic sugar and macros alone make julia super nice, and there are so many other handy tools, packages, and integrations (including with R and Python!) that I plug it to anyone else in the stats/etc space. If you want any more details, just let me know!
Long post ahead
@daeyoung @andrew "Right" in terms of interacting with both computers and data, (and package management) it beats R and python IMO.
TL;DR up front: I've found I can accomplish much more with much less, and in a much less convoluted way with julia, and I've used R for 8 years, and julia for 3-4, and python intermittently in that time as well. Read on for why I like it.
Some simple syntax examples:
No tabs/spaces issues like python, every block starts with a signifier like function, if, else, etc. and ends with "end". This makes it very easy to track "delimiter" position and catch unclosed forms.
**Broadcasting functions** (write a function to operate on a scalar, then apply the following)
f(x)=2x
f(2)->4
f(1:2)-> Error, doesn't accept vectors
f.(1:2)->[2,4]
**Broadcasting the broadcast:** if you want to apply other broadcast operators with more complex functions, there's a macro for it:
y=[1,2]
(f(y)+12/)37 -> fails
(f.(y).+12)./37-> works
@.((f(y)+12)/37)-> works
So you can write expressions to operate on single values, test them accordingly, and have guarantees they'll apply correctly in an n-tensor context.
Oh, and speaking of n-tensor, you can easily replicate the outer product function in R with broadcasting, and you can still use any binary function as you can in R:
julia:
x=[1,2,3,4]
x.*x' returns a product matrix
vcat.(x,x') returns a matrix of paired x values.
R:
x=c(1,2,3,4)
outer(x,x)
the second example doesn't work in R with outer(x,x,c), and that data structure seems to need manual creation with the matrix function (and it behaves weird anyway).
**Mixed iterator loops:**
Instead of:
for i in 1:10
j in i+1:11
do something on i and j
end
end
We do:
for i in 1:10, j in i+1:11
do something on i and j
end
And you can add arbitrary amounts of iterators into the flattened statement to make it clear where the logic is happening, and which iterators belong where in nested loops.
**First class functions:** ease of closure construction, and lambdas are cleaner:
just lambdas:
R: {function(x){return x^2}}
python: (lambda x: x + 2)
julia: (x->x+2)
And lambdas can be nested for nested higher order functions. Like so:
map(x->filter(y->y[,1]!=2,x),data_frame_list) (or something like this)
**Making code parallelized** is trivial in most cases. Write a function, wrap it with the @everywhere macro and put it in pmap. The same function can be effortlessly compiled to cuda with the @CUDA macro, since almost all of julia and its packages is written in pure julia. (Side effect: no need to be a C++ expert for performance, and modifying dependencies for your work is easy.)
**Bonuses:**
Package management, creation, and code-sharing between machines is trivial with helpers like DrWatson, which is excellent for academic users, and per-package manifests. (And no pip/conda nightmares)
Dataframe support is excellent, and has similar functions to melt/cast built in. Adding to this, indexing rules make sense as compared to R's [[]], [], and the differences between $name and [,col] when working with tibbles.
Gadfly more or less == ggplot2 but with more features, and different back-ends like plotly are also available for plots.
Finally, multiple dispatch is pretty dang neat, and the type system doesn't "try to help" like it does in R which leads to all sorts of wacky bugs that are difficult to locate (like when R tries to help with lists/vecs/matrices,etc).
e.g.
function(x)=x^2 gets compiled to
function(x::Int)
function(x::Float)
function(x::Real)
etc.
But you can specify you only want one and you'll only get that one, and it also speeds up compilation to a degree.
If you decide you want all of them, the appropriate one is dispatched when required, and this doesn't have the same limitations of traditional overloading. As explained better [here](https://discourse.julialang.org/t/is-multiple-dispatch-the-same-as-function-overloading/4145/5).
**End**
I hope that helps explain my position, but much of this may just be preferences on my end in terms of how I think about solving problems, and how well this language comports with that. I do suggest giving it a try though, you may like what you find after getting over the initial friction of switching languages. 😄
Long post ahead
@daeyoung @andrew I'm glad you liked it! I had it pegged at 35% of people liking it and 65% of it being ignored 😂
Personally I find C++ really really gross; hence my desire to use something that can provide near equivalent performance in certain contexts, but without the icky syntax, header files, and focus on OOP (which I try to avoid due to the issues with finding state-induced, silent, runtime bugs).
One minor point I will add: RCall and Pycall are very mature, so if you want the best of both/all worlds, you can use Julia, or R, for the bulk of the code and integrate R into Julia for ease of use and more familiar libraries or performant Julia into R so you can start bypassing the horrors of C++ 😱
Long post ahead
@johnabs @andrew omg yes, the silent runtime errors when using OOP! This irks me a lot and hence I don’t really use classes. Also C++ classes are all about ownership and it becomes one hell of a mess when I want to do stuff with pointers and I can’t for the life of me bother learning smart pointers. That’s the line I draw ✍️
Hm maybe it’s time for me to play around more with Julia
Long post ahead
@daeyoung @andrew My first programming language was java...I have a special hatred in my heart for OOP, and I'm glad things like scala/clojure etc. exist on the JVM to make it actually useful 😂
My number one recommendation before diving in is installing [DrWatson](https://juliadynamics.github.io/DrWatson.jl/dev/) to make package management easier and to prevent dependencies from clogging up your main environment, but reach out if you'd like more! I'm always happy to make conver- I mean, umm...show people alternatives...MUHAHAHAHAHAHA 😂
Long post ahead
@johnabs @andrew I love your detailed explanation. I used Julia for the last time >5 years ago and it seems to have changed a lot in the meantime. I love the idea of just-in-time compiling and not having to default to a lower-level language, but R has surely made me love C++ and now I use R as if it’s a C++ interpreter lol. Id love to see Julia grow and more people use Julia but still R seems too influential in statistics, for better or worse