Sunday, 16 October 2016

Idle priority: data project contributions

I will mention two here, so that I won't forget them. One is new, that sounds more interesting (of course, new things, always ...)

Amnesty Decoders (update: false trail...)

"Join a global network of digital volunteers helping us research and expose human rights violations."


This is actually something that deceived me big time:


They want you! To click on sections of satellite imagery where villages (artificial structures) are present. Hm... I would have thought people (with some proficiency) can contribute to the image analysis with code :(
I wonder why these projects don't end up on Kaggle. Or do they?
This task is absolutely crying out for automatisation ...



... and another one, which I actually started doing something with, from the past:

Open Corporates



It is a large database, commercially available for companies - that's how they make their living. It is backed by paid coders as well as enthusiasts, their meetup if someone wants a bit of hacking for good (they do/did provide free means of accessing the data as well, especially for contribution) is at:


I never found the time again, so my bot is still 'booting', but would have been nice to. Should.

The task is to create bots, in Python, which do the web scraping (in my days, it happened with BeautifulSoup), and crack up the data (through multiple steps) to extract relevant attributes.

Their stuff is helpful for those doing data journalism, and there are hopes it can help to cut back and/or eliminate corruption hidden behind the ownership graphs.

Friday, 7 October 2016

Why or why not Go [draft]

Gnack, language is a matter of choice, but the choice is a matter of the market, etc.; so this year as usual:
-- Away from me, Pascal!

So what language?


Starts to be apparent that some companies just do choose newer technologies, despite the safe player masses, although it's not always obvious when looking at the overall trend. Workforce performance (I mean code * quality / dev_hours or something similar) increase? I wonder. Secrets, never told.

One example: I've been trying to find the reasons pro and con for learning Go programming, as I'm not a massive C++ fan, but I know I like Python and, after a quick assessment, it didn't seem to be adding a lot over it. Still I doubted my judgement on this (being a master of neither of the two), so I looked at Google Trends and didn't find anything promising. Even Google didn't market it too well :)



However, just today I got reminded by a lecture of ItJobsWatch's charts, and see these UK-wide statistics! (Numbers at the front are approximately correct as of 8/10/2016)

1%: Go on ITJobsWatch

That's a small outbreak :) Mind that, my earlier investments, R and Python have gone way way up, too.

1.5% R on ITJobsWatch

14%: Python on ITJobsWatch

This is how it works! Or rather when it works.

Then add that Docker is written in Go, that CloudFoundry and RabbitMQ and that things people write for people to use, have started to recognize the power of this language, the result starts to get much nicer.

UPDATE:
On Quora they found that the Golang expression statistics show a nice, steep upward Google trend. I'd add that it's never guaranteed that it's a sign of success, while people realize that this expression works, the apparent explosion is possibly just a manifestation of the overtaking of term search numbers from other, related expressions (sort of a cannibalisation). "Golang" to date is backed by fraction of the searches for "Go (programming language)" and "Google go".

The Bug #1: Windows not that much love Go


However, there's this little bug, which still prevents (at the time writing) building Windows DLL's...

So then it doesn't seem like a "platform-neutral" development attempt, but on the other hand, something that has a brand starting "accidentally" like that of Google, forever, and not supporting full scale Windows development. Hm :) Me? Not suspecting a thing. Good question: who knows when?

TODO: Would be nice to have a time vs. number of comments/total length of comments :) when the hell is it going to get closed?

Update: The Bug #2 - unfriendly again, Python would love GoLang if...


So as "Bug #1" says, you'll have to think before building Windows DLL's with Go. Although from the epic talk on the bug page it seems it's only affecting multi-threaded code via the Windows TLS support. However, something like that should work, I guess ... especially if you use Go which prides itself of its multi-threading support.

So (or without noticing why) on SO they compromise on building .so's for Linux.
http://stackoverflow.com/questions/12443203/writing-a-python-extension-in-go-golang

But who wants to really create a *nix-only Python package, or one that may not be extensible at some point - further from internal use? (Think of Anaconda - I guess it's a blocker for more official python distros.)

Then there's gopy also mentioned, for making importing trivial. However, it's still not compatible with go >= 1.6. Even if it's a very good start to creep in to commonplace use as an extension language first.

And Linux is still not even nearly everything.

So seems like there's a little longer while to wait for the ecosystem to get ready for broader market penetration. Exciting moments anyway.

Surely more promising already on the server side!

TODO: Mention fun language - intrinsic motivation - creativity association, weakness in analytics.
TODO: Mention recently created Go on GitHub charts? Create new ones?

Thursday, 6 October 2016

An entropy paradox

Entropy/diversity is personally one of my favourite brain-wasting topics (I have my reasons for that, good ones :) ). Here's a short line of thoughts that illustrates the tricky nature of these notions. At the end, I'll probably not connect this back to naive everyday thinking which everyone is taking for granted, as it would be either embarrassing for many or worse even (in case I'm wrong), very embarrassing, but only for me alone :) But beware I think I could (make a fool out of myself)...

So the opening thought is that when we are young, we are more similar to both of our parents, and as we turn older, we will exhibit stuff related to the matching gender parent, i.e. specialize.

Now let's describe this by something that behaves like the Herfindahl index (any entropy measure does the job in one way or another) over a pair of similarity metrics. We'll find that from a pair: (mom_similarity, dad_similarity) closer to (0.5, 0.5) the individual's stats move closer to either (1, 0.0) or (0, 1.0), and that this means this index will increase, suggesting a less diverse individual.

Now let's take a look at this on the macro level, multiply up the aforementioned individual so that it becomes a population (of roughly indetically aged, random gender people which is growing up)!

From a series similar to [(0.5, 0.5), ..., (0.5, 0.5)] we observe a transition towards (assuming equal probabilities for the genders) a series that is more like [(1.0, 0.0), (0.0, 1.0), ...] etc.

What we then find is that the diversity on the macro level did exactly the opposite - a population of randomly grown ups is more diverse than that of babies. Actually that was quite trivial without the numbers already, but the joy and the words with the weird spelling ... :)

So yes, almost paradoxically, micro and macro level entropy may work against each other - the level of abstraction does matter a lot!



P.S.: Don't calm down. I look forward to distribute similarly useless thoughts in the future.