It's not a quick shift, has been going on for a while. I'm busy with my day job as a developer, studying here and there, and have historically got more involved with R. Then I should sometimes see people to avoid getting shot by a Walking Dead fan.
Someone recently told me that my preference for Python is not generally justified when it comes to analytics, to my surprise the alternative he mentioned was Java. Needless to say, I was looking at things from an analyst's perspective, who applies various methods to extract insight from data.
So... is Java Taking Over?
I wanted to double check - but I had to realize it's non-trivial to find (or I was unlucky at finding) a respective and current ranking, more interestingly, a useful visualisation of how languages perform in the analytics segment.
The other thing I realised was it wasn't that difficult to get a grip on the problem. Even if it's a less reliable grip than one stemming from a more complex methodology.
Digging on GitHub
A small code has pulled a lot of data from GitHub. Then it got cross-checked with StackOverflow data.
The (privately drawn) conclusion suggested by the charts is that Python is essentially the best all-purpose choice for analytics currently. Java is coming up fast, but it has a brutal market share anyway, so it's possibly just, as everything else, getting soaked in analytics, while not really driving the change from an end-user's (analyst's) perspective. R appears to fall out of grace slowly, but steadily - with handicaps in the Machine Learning area.
The most recent version of the full document/source code is available here, or by clicking the miniature. At the time reading, I'm probably still in the progress of making improvements to these.
So the conclusion may as well change - but odds are it's accurate. These findings should primarily be a predictor of the future share of languages among now wannabe analysts, and to be representative of the present to a smaller extent.
The Winner - in My Opinion
A summary of my considerations about Python's performance is presented below.
Aspect | Quick Assessment |
Analysis | Prime Language, Increasing Share |
Spark | Significant, Reducing Share |
Machine Learning | Prime Language |
Deep Learning | Prime Language |
Big Data | Strong Choice |
AWS | Strong Choice, Increasing Share |
Data Science | Prime Language / Strong Choice, Increasing Share |
Mining | Strong Choice |
Visualization | Prime Language / Strong Choice, Increasing Share |
Chart | Good Enough Choice from a Diverse Competition |
Kaggle | Prime Language, Increasing Share (possibly due to R's demise <sniff/>) |
For those who have attempted to follow the events in the analytics area, these segments (or the segments these keywords proxy towards) may be familiar and of high importance.
For others, machine learning and deep learning are expected to be the key players for processing data produced and archived at big data levels, and to be able to encompass models for such complex systems as human thinking.
Different Views
As final thought, the situation probably abruptly changes once it is about architects looking to choose an implementation language for some software they plan to provide for analysts. However, the versatility of Python can possibly counterbalance the efficiency trade-offs it brings into the development.
Also mind that GitHub, the (at the time writing) most used source code hosting option, is the vehicle for many courses on Coursera, which then is/was the most popular MOOC, and as such, heavily affects even a massive vehicle like GitHub (gathering data about this is in progress - but it is getting clear already that these courses are close to the heart of R's popularity on GitHub).
For those interested, I also had a quick peek at server-side languages, with the results summarized in this (continuously updating) document.
No comments:
Post a Comment