Tuesday, 10 October 2017

Wednesday, 27 September 2017

Some things I like about >>>this

Beginner's Python stuff it may be, still there are some very important concepts wrapped in this ...

>>> import this
The Zen of Python, by Tim Peters
Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!

(... and some things I'd be too happy to debate or discuss, such as "hard to explain" is very subjective - a picture sometimes speaks a thousand words, and what if it's a sound, sculpture; prior education, a shared interest to come at help, you name it, rules are not really rules, but guidelines for the here, now and us, but let's not get lost in the details, rather get things right ... good thing.)

Monday, 11 September 2017

The irony ("life always finds a way")

Fixding. So typocal :) Though I guess it's just appreciable self-irony.



Anyway, you can't escape fate - "if they cut off one head, two more shall take it's place".


Monday, 3 July 2017

Out of memory again? Cut me some ...

The recent purchase of a new 5 yo laptop has made things fluent again work-wise.

But then, when getting to do some feasibly picked exploratory data processing, R started running out of memory, seemed unreasonable at first.

The bad boy seems to be the Slack client version installed via the Ubuntu Aptitude. There's an interesting-looking article (which I am yet to read in its full glory) about some likely difficulties faced when relying on web UI's on the desktop, saying:
Work is underway to fix the underlying factors affecting client memory consumption, but in the meantime we’ve built a tiny new Slack client to help address this issue.
... which is still giving me some daunting moments when installing the old client.

However, I am already on 6.2.3, the latest Ubuntu 64-bit beta version :S and the change log does mention some mem. footprint improvements. Hopefully those "tiny client" improvements weren't those and are to come soon ...

Saturday, 17 June 2017

Me and nudging

Just read about nudging ... it does happen everywhere, the funny thing is I'd never have thought someone will reinvent a name for this relatively intuitive behaviour.

However, I'd certainly question if it's a nice thing to do. I guess it's a step on a ladder of questionnable ethics. Point being: handle with care - empowered teams are meant to be competent - maybe (god forgive) even smart!

<3 Dishonesty

As first of all: this is just that. You my big friend mate are in full control, but I do get you to do what I want you to do. Are you, my mate actually in full control then? Or is it just a plain illusion of freedom?

<3 Underestimation

I mean, imagining that smart people won't notice being nudged, is ... so then they are by chance thought of as "not smart" enough. And then how is being noticeably influenced not very similar to being controlled in their mind?)

But how would we have even thought living without - at least in our minds, degrading everyone around? We all want to feel exceptional ... or at least that over 90% percent of people who are thinking they are smarter than the average, or that they have a 100+ IQ. I have hopes I'm not amongst them, up to the reader to work out in which way.

<3 Works?

Hm ... hm ... maybe doesn't. Maybe it's just not better. Would you wholeheartedly do what someone, an Interpersonal Expert, who's less than capable to appreciate/assess you wants you to do? Hm... Nudge-nudge.

Conclusion!

Because every South Park episode should have one (except for the multi-episode stories). One exception: if being nudged is openly done? And then it's nothing, but sharing of intentions and techniques, and helping others to develop themselves. I also heard "good management does not tell you what to do, it only removes obstacles". I think that approach is also closer to my heart.

Tuesday, 16 May 2017

About the Outstanding Hungarian GDP Growth [Draft]

While government propaganda will be keen to emphasize the extraordinary growth rate in my home country, and label it as a huge success, taking a slightly deeper quick (and dirty) look behind the headline numbers shows how this may be all misinterpreted, and which country to attribute with true and extreme success.

Just as I found in a Hungarian article by portfolio.hu, Hungary has stepped up quite a bit in the ranking of EU countries based on GDP growth figures, accomplishing 1.3% for 2017 Q1, making it 5-6th, tied with the Czech Republic.

Browsing through the chart provided there, it becomes quite clear that those big income western countries, such as the UK, didn't perform too well. It is easy to cough up some intuition though! Perhaps the higher an absolute performance has risen from the mean abs. performance of others, the more difficult it becomes to maintain a relative increase, a steady relative development rate?

Similarly, it has crossed more than one mind already that any relative growth figure over 100% is actually even theoretically unsustainable on a finite-sized Earth. But instead of philosophising, for now, let's just examine what those poorly growing countries had already achieved before they got seemingly stuck with their progression, and put that in contrast with the recent relative GDP growth numbers (source).


This chart already reveals a definitive downward trend - the more you earn, the harder it becomes to stretch it farther, at least proportionately.

The second chart even underlines that the GDP change (estimated) is almost uniform.

The two key takeaways are
  • the relative growth of the Hungarian economy is apparently nothing short of usual in its category
  • the amazing Finnish performance diminishes all other top ranked countries

I'd say it's highly advisable to keep an eye on how this performance was rolled out by Finland.
Their progressive educational system is already famous - otherwise blame Salmiakki?

(Finland for president! The workbook for the charts can be found here.)


Wednesday, 15 March 2017

Python

Python declares a war on declarations with each variable left undeclared.

I spoke. Good night.

Saturday, 4 March 2017

No easy way to Big Query from Europe for individuals

Is this real?
Seems like Google Big Query can only be used by businesses in the European Union.

e.g.

"Effective December 7, 2016, Google Cloud Platform, Firebase, and API services in European countries will be used only for business and commercial purposes only."

Well, really kind, but thanks then, not just yet.


Saturday, 14 January 2017

About a 40 Character Compression

"SHA-1 is an algorithm and what it does is: it takes some data as input and generates a unique 40 character string from it." (source)

That is just so horribly incorrect I had to take a note - a hash function typically does not return unique results.

Wikipedia says "A hash function is any function that can be used to map data of arbitrary size to data of fixed size."

With this definition the function has no chance to do that.

Just like you'd fail to assign a 8 bit (2 ^ 8 states) unique ID to anyone in a sizeable country, you cannot assign even a 160 bit unique value to any input of typically well over 160 byte (>> bit) size.

Obviously if one could assign a unique 160 bit identifier to data of any size, that would mean a universal 160 bit (40 character) compression. Possibly in a huge number of steps though, a decompression algorithm could just step through all the valid inputs spiralling through 1 bit, 2 bit etc. candidates, calculating the hash of each, checking when it matches up the desired result.

And otherwise this would mean a bijection between sets of different number of elements - 2^160 vs. infinite.

Anyway, the conclusion is even Git doesn't have the powers to make SHA-1 do the job. OMG :)

Monday, 2 January 2017

Which Programming Language for Analytics?

I am in the progress of turning towards Python for analytics.

It's not a quick shift, has been going on for a while. I'm busy with my day job as a developer, studying here and there, and have historically got more involved with R. Then I should sometimes see people to avoid getting shot by a Walking Dead fan.

Someone recently told me that my preference for Python is not generally justified when it comes to analytics, to my surprise the alternative he mentioned was Java. Needless to say, I was looking at things from an analyst's perspective, who applies various methods to extract insight from data.

So... is Java Taking Over?


I wanted to double check - but I had to realize it's non-trivial to find (or I was unlucky at finding) a respective and current ranking, more interestingly, a useful visualisation of how languages perform in the analytics segment.

The other thing I realised was it wasn't that difficult to get a grip on the problem. Even if it's a less reliable grip than one stemming from a more complex methodology.

Digging on GitHub


A small code has pulled a lot of data from GitHub. Then it got cross-checked with StackOverflow data.

The (privately drawn) conclusion suggested by the charts is that Python is essentially the best all-purpose choice for analytics currently. Java is coming up fast, but it has a brutal market share anyway, so it's possibly just, as everything else, getting soaked in analytics, while not really driving the change from an end-user's (analyst's) perspective. R appears to fall out of grace slowly, but steadily - with handicaps in the Machine Learning area.

The most recent version of the full document/source code is available here, or by clicking the miniature. At the time reading, I'm probably still in the progress of making improvements to these.
So the conclusion may as well change - but odds are it's accurate. These findings should primarily be a predictor of the future share of languages among now wannabe analysts, and to be representative of the present to a smaller extent.


Picking a Language for Analytics and Machine Learning


The Winner - in My Opinion


A summary of my considerations about Python's performance is presented below.

AspectQuick Assessment
AnalysisPrime Language, Increasing Share
SparkSignificant, Reducing Share
Machine LearningPrime Language
Deep LearningPrime Language
Big DataStrong Choice
AWSStrong Choice, Increasing Share
Data SciencePrime Language / Strong Choice, Increasing Share
MiningStrong Choice
VisualizationPrime Language / Strong Choice, Increasing Share
ChartGood Enough Choice from a Diverse Competition
KagglePrime Language, Increasing Share (possibly due to R's demise <sniff/>)

For those who have attempted to follow the events in the analytics area, these segments (or the segments these keywords proxy towards) may be familiar and of high importance.

For others, machine learning and deep learning are expected to be the key players for processing data produced and archived at big data levels, and to be able to encompass models for such complex systems as human thinking.

Different Views


As final thought, the situation probably abruptly changes once it is about architects looking to choose an implementation language for some software they plan to provide for analysts. However, the versatility of Python can possibly counterbalance the efficiency trade-offs it brings into the development.

Also mind that GitHub, the (at the time writing) most used source code hosting option, is the vehicle for many courses on Coursera, which then is/was the most popular MOOC, and as such, heavily affects even a massive vehicle like GitHub (gathering data about this is in progress - but it is getting clear already that these courses are close to the heart of R's popularity on GitHub).

For those interested, I also had a quick peek at server-side languages, with the results summarized in this (continuously updating) document.