Python declares a war on declarations with each variable left undeclared.
I spoke. Good night.
Wednesday, 15 March 2017
Saturday, 4 March 2017
No easy way to Big Query from Europe for individuals
Is this real?
Seems like Google Big Query can only be used by businesses in the European Union.
e.g.
"Effective December 7, 2016, Google Cloud Platform, Firebase, and API services in European countries will be used only for business and commercial purposes only."
Seems like Google Big Query can only be used by businesses in the European Union.
e.g.
"Effective December 7, 2016, Google Cloud Platform, Firebase, and API services in European countries will be used only for business and commercial purposes only."
Well, really kind, but thanks then, not just yet.
Saturday, 14 January 2017
About a 40 Character Compression
"SHA-1 is an algorithm and what it does is: it takes some data as input and generates a unique 40 character string from it." (source)
That is just so horribly incorrect I had to take a note - a hash function typically does not return unique results.
Wikipedia says "A hash function is any function that can be used to map data of arbitrary size to data of fixed size."
With this definition the function has no chance to do that.
Just like you'd fail to assign a 8 bit (2 ^ 8 states) unique ID to anyone in a sizeable country, you cannot assign even a 160 bit unique value to any input of typically well over 160 byte (>> bit) size.
Obviously if one could assign a unique 160 bit identifier to data of any size, that would mean a universal 160 bit (40 character) compression. Possibly in a huge number of steps though, a decompression algorithm could just step through all the valid inputs spiralling through 1 bit, 2 bit etc. candidates, calculating the hash of each, checking when it matches up the desired result.
And otherwise this would mean a bijection between sets of different number of elements - 2^160 vs. infinite.
Anyway, the conclusion is even Git doesn't have the powers to make SHA-1 do the job. OMG :)
That is just so horribly incorrect I had to take a note - a hash function typically does not return unique results.
Wikipedia says "A hash function is any function that can be used to map data of arbitrary size to data of fixed size."
With this definition the function has no chance to do that.
Just like you'd fail to assign a 8 bit (2 ^ 8 states) unique ID to anyone in a sizeable country, you cannot assign even a 160 bit unique value to any input of typically well over 160 byte (>> bit) size.
Obviously if one could assign a unique 160 bit identifier to data of any size, that would mean a universal 160 bit (40 character) compression. Possibly in a huge number of steps though, a decompression algorithm could just step through all the valid inputs spiralling through 1 bit, 2 bit etc. candidates, calculating the hash of each, checking when it matches up the desired result.
And otherwise this would mean a bijection between sets of different number of elements - 2^160 vs. infinite.
Anyway, the conclusion is even Git doesn't have the powers to make SHA-1 do the job. OMG :)
Monday, 2 January 2017
Which Programming Language for Analytics?
I am in the progress of turning towards Python for analytics.
It's not a quick shift, has been going on for a while. I'm busy with my day job as a developer, studying here and there, and have historically got more involved with R. Then I should sometimes see people to avoid getting shot by a Walking Dead fan.
Someone recently told me that my preference for Python is not generally justified when it comes to analytics, to my surprise the alternative he mentioned was Java. Needless to say, I was looking at things from an analyst's perspective, who applies various methods to extract insight from data.
I wanted to double check - but I had to realize it's non-trivial to find (or I was unlucky at finding) a respective and current ranking, more interestingly, a useful visualisation of how languages perform in the analytics segment.
The other thing I realised was it wasn't that difficult to get a grip on the problem. Even if it's a less reliable grip than one stemming from a more complex methodology.
A small code has pulled a lot of data from GitHub. Then it got cross-checked with StackOverflow data.
The (privately drawn) conclusion suggested by the charts is that Python is essentially the best all-purpose choice for analytics currently. Java is coming up fast, but it has a brutal market share anyway, so it's possibly just, as everything else, getting soaked in analytics, while not really driving the change from an end-user's (analyst's) perspective. R appears to fall out of grace slowly, but steadily - with handicaps in the Machine Learning area.
The most recent version of the full document/source code is available here, or by clicking the miniature. At the time reading, I'm probably still in the progress of making improvements to these.
So the conclusion may as well change - but odds are it's accurate. These findings should primarily be a predictor of the future share of languages among now wannabe analysts, and to be representative of the present to a smaller extent.
A summary of my considerations about Python's performance is presented below.
For those who have attempted to follow the events in the analytics area, these segments (or the segments these keywords proxy towards) may be familiar and of high importance.
For others, machine learning and deep learning are expected to be the key players for processing data produced and archived at big data levels, and to be able to encompass models for such complex systems as human thinking.
As final thought, the situation probably abruptly changes once it is about architects looking to choose an implementation language for some software they plan to provide for analysts. However, the versatility of Python can possibly counterbalance the efficiency trade-offs it brings into the development.
Also mind that GitHub, the (at the time writing) most used source code hosting option, is the vehicle for many courses on Coursera, which then is/was the most popular MOOC, and as such, heavily affects even a massive vehicle like GitHub (gathering data about this is in progress - but it is getting clear already that these courses are close to the heart of R's popularity on GitHub).
For those interested, I also had a quick peek at server-side languages, with the results summarized in this (continuously updating) document.
It's not a quick shift, has been going on for a while. I'm busy with my day job as a developer, studying here and there, and have historically got more involved with R. Then I should sometimes see people to avoid getting shot by a Walking Dead fan.
Someone recently told me that my preference for Python is not generally justified when it comes to analytics, to my surprise the alternative he mentioned was Java. Needless to say, I was looking at things from an analyst's perspective, who applies various methods to extract insight from data.
So... is Java Taking Over?
I wanted to double check - but I had to realize it's non-trivial to find (or I was unlucky at finding) a respective and current ranking, more interestingly, a useful visualisation of how languages perform in the analytics segment.
The other thing I realised was it wasn't that difficult to get a grip on the problem. Even if it's a less reliable grip than one stemming from a more complex methodology.
Digging on GitHub
A small code has pulled a lot of data from GitHub. Then it got cross-checked with StackOverflow data.
The (privately drawn) conclusion suggested by the charts is that Python is essentially the best all-purpose choice for analytics currently. Java is coming up fast, but it has a brutal market share anyway, so it's possibly just, as everything else, getting soaked in analytics, while not really driving the change from an end-user's (analyst's) perspective. R appears to fall out of grace slowly, but steadily - with handicaps in the Machine Learning area.
The most recent version of the full document/source code is available here, or by clicking the miniature. At the time reading, I'm probably still in the progress of making improvements to these.
So the conclusion may as well change - but odds are it's accurate. These findings should primarily be a predictor of the future share of languages among now wannabe analysts, and to be representative of the present to a smaller extent.
The Winner - in My Opinion
A summary of my considerations about Python's performance is presented below.
| Aspect | Quick Assessment |
| Analysis | Prime Language, Increasing Share |
| Spark | Significant, Reducing Share |
| Machine Learning | Prime Language |
| Deep Learning | Prime Language |
| Big Data | Strong Choice |
| AWS | Strong Choice, Increasing Share |
| Data Science | Prime Language / Strong Choice, Increasing Share |
| Mining | Strong Choice |
| Visualization | Prime Language / Strong Choice, Increasing Share |
| Chart | Good Enough Choice from a Diverse Competition |
| Kaggle | Prime Language, Increasing Share (possibly due to R's demise <sniff/>) |
For those who have attempted to follow the events in the analytics area, these segments (or the segments these keywords proxy towards) may be familiar and of high importance.
For others, machine learning and deep learning are expected to be the key players for processing data produced and archived at big data levels, and to be able to encompass models for such complex systems as human thinking.
Different Views
As final thought, the situation probably abruptly changes once it is about architects looking to choose an implementation language for some software they plan to provide for analysts. However, the versatility of Python can possibly counterbalance the efficiency trade-offs it brings into the development.
Also mind that GitHub, the (at the time writing) most used source code hosting option, is the vehicle for many courses on Coursera, which then is/was the most popular MOOC, and as such, heavily affects even a massive vehicle like GitHub (gathering data about this is in progress - but it is getting clear already that these courses are close to the heart of R's popularity on GitHub).
For those interested, I also had a quick peek at server-side languages, with the results summarized in this (continuously updating) document.
Saturday, 24 December 2016
Tools are not (just) tools
Note to self
I asked someone today whether he still likes maths, and got the response that maths is just a tool.
It made me think whether tools are actually just tools, or slightly more than that.
Accidentally I was about to study innovation management, which reminded me of adoption curves.
The Rogers adoption curve starts with the innovators. These people are the first to get acquainted with an innovation, use a new gadget, try out a new service, etc. Their characteristics include trying out things l'art pour l'art, which means they don't care about any practical value of those attempts.
Just toy with them.
I had two things to note:
1. these people are those who exercise themselves with things without a particular purpose
2. this is very much an analogue of how ADD people (in my imagination) get distracted by literally anything
Actually what purpose do we have in mind when we are kids playing with things? Some made up one I think normally drives things, but this can be so minimally targeted as 'joy', 'fun', etc.
Sounds like a good idea to me.
And just because they have the "thing" around, they can play around with it. The presence of the item allows for another degree of freedom.
And people, like a gas, fill the space they are given.
So the new toy gets played with, new routine gets added to what was there before.
Tools as toys, become doors, and so is maths a door, doors get entered, and so will maths likely become part of your future path, once you've played with it.
This is just a corollary - I would guess being overloaded with options just as well as overloaded with topics on the internet created its new addict group. Innovators are the ADD of the market.
Positively mad people. See 1. - allowing themselves options, they allow for creativity, by heading against discipline.
Now is a good time to ask - is discipline good or bad then?
Obviously, it depends.
Discipline and minimalism are just two sides of the same coin.
Capitalism and competition made us put things perhaps too much on the minimalist, the efficiency side.
You'll choose the offer for 999 coins but not for 1000 unless the cheaper is noticeably worse.
It's only the resolution of your value perception that needs to be tricked and the quality rot begins, a worse product sold for almost as high a price as the better one.
Discipline is a tool, not an objective. And an option, a freedom to choose. A tool, but once not a must, once you played with it without a need, more than a tool, too.
You can design bottom up as well as top down, and so you can create something without knowing the final design. Just because you didn't slap yourself in the face to get back in line before you'd have found another path.
We shouldn't forget to embrace freedom. And ADD makes people less controllable. A good thing, in some times.
I asked someone today whether he still likes maths, and got the response that maths is just a tool.
It made me think whether tools are actually just tools, or slightly more than that.
Accidentally I was about to study innovation management, which reminded me of adoption curves.
The Rogers adoption curve starts with the innovators. These people are the first to get acquainted with an innovation, use a new gadget, try out a new service, etc. Their characteristics include trying out things l'art pour l'art, which means they don't care about any practical value of those attempts.
Just toy with them.
I had two things to note:
1. these people are those who exercise themselves with things without a particular purpose
2. this is very much an analogue of how ADD people (in my imagination) get distracted by literally anything
About 1.
Actually what purpose do we have in mind when we are kids playing with things? Some made up one I think normally drives things, but this can be so minimally targeted as 'joy', 'fun', etc.
Sounds like a good idea to me.
And just because they have the "thing" around, they can play around with it. The presence of the item allows for another degree of freedom.
And people, like a gas, fill the space they are given.
So the new toy gets played with, new routine gets added to what was there before.
Tools as toys, become doors, and so is maths a door, doors get entered, and so will maths likely become part of your future path, once you've played with it.
About 2.
This is just a corollary - I would guess being overloaded with options just as well as overloaded with topics on the internet created its new addict group. Innovators are the ADD of the market.
Positively mad people. See 1. - allowing themselves options, they allow for creativity, by heading against discipline.
Now is a good time to ask - is discipline good or bad then?
Obviously, it depends.
Discipline and minimalism are just two sides of the same coin.
Capitalism and competition made us put things perhaps too much on the minimalist, the efficiency side.
You'll choose the offer for 999 coins but not for 1000 unless the cheaper is noticeably worse.
It's only the resolution of your value perception that needs to be tricked and the quality rot begins, a worse product sold for almost as high a price as the better one.
Discipline is a tool, not an objective. And an option, a freedom to choose. A tool, but once not a must, once you played with it without a need, more than a tool, too.
You can design bottom up as well as top down, and so you can create something without knowing the final design. Just because you didn't slap yourself in the face to get back in line before you'd have found another path.
We shouldn't forget to embrace freedom. And ADD makes people less controllable. A good thing, in some times.
Thursday, 15 December 2016
Is education going out of fashion?
Apparently people have finally found their ways to the schools ... or what's going on? They don't seem to search for institutions that much anymore.
It could be interesting to find out what alternative terms people have started to search for.
MOOC's are a very natural first thought. And they have clearly gained interest over time...
Are they all that significant?
Indeed.
So perhaps the above does not mean education is getting out of fashion, maybe as expected, it is actually getting to play a more and more significant role - but who knows.
The interest in academic degrees seems to nearly stagnate, although there is a likely shift towards shorter degrees:
MOOC's are a very natural first thought. And they have clearly gained interest over time...
Are they all that significant?
Indeed.
So perhaps the above does not mean education is getting out of fashion, maybe as expected, it is actually getting to play a more and more significant role - but who knows.
The interest in academic degrees seems to nearly stagnate, although there is a likely shift towards shorter degrees:
Tuesday, 6 December 2016
I want to be the internet
I'm in love:
http://www.theuselessweb.com/
/fav hit so far:
http://www.staggeringbeauty.com/
do not miss out on:
http://www.electricboogiewoogie.com/
but others are brilliant, too!
http://endless.horse/
Subscribe to:
Posts (Atom)
