QuickCheck - another testing method/philosophy
First of all, I wanted to delve into QuickCheck I heard about years ago. To me this one is more on the integration testing side, still a decent one (actually - what isn't a bit of integration testing?)
I believe the stochastic behaviour of many ML routines almost mandates having repeated tests - a single test case being passed is not convincing on its own, just as no one really trusts an algorithm that forecasts well once (but perhaps never anymore). And this is just one side of the coin - the output normally cannot be exactly known, but approximate values, with error bands, are typically acceptable (actually, I expect this could get as far as putting proper statistical testing in place). Combine that with an output of several values; 'narrow' rules of self-consistency instead of in-depth byte-to-byte specification of the output becomes an appealing option.
On the inbound train I got to installing it
and finding the steps at https://github.com/RevolutionAnalytics/quickcheck, then doing a
gave 2 more examples.
The examples are like (this one is from the quickcheck package vignette):
Unary function, unary output. What could have been a little more interesting to me, was defining a bivariate function - since it's operating on a much larger input space (+1 dimension), where random sampling of the space becomes one reasonable approach to somewhat evenly testing on the volumetrically exploding set of input states.
I had exactly 0 luck there:
Some - at first, incomprehensible, recycling occurs and I'm flooded with warnings like:
9: In y + x :
longer object length is not a multiple of shorter object length
Sure it could be implemented as two embedded test() calls, but I assume there's a nicer, terser way.
Anyhow, I'll have to postpone stuff again ... somehow (no surprise there) I was the only person interested in this thing :)
PowerBI
So I found myself in a group of lads not so much interested in QuickCheck but Microsoft's new BI platform, thanks to Marczin for bringing up this.
The "big thing" here was that it can operate with R, so that streams of data processing get combined with R's powers at statistical analysis and data visualization.
Disappointing, but this is for Windows users. At least the desktop version. As I'm not much of a business user, I was sort of sent away when I wanted to run things online... all right, the more of me remains for everything else then :) (everything else applauding)
We did manage to put things together. After dragging the fields of the automagically linked tables to the inserted R object's "values" list, and writing a tiny script referencing values in dataset$field style, R charts appeared. Actually, it wasn't at all as intuitive as was MS Access (2000/XP) ages ago, I mean, putting the data flow together took some time to work out, also I had some redundancy in one of my CSV's so we wasted some time on that, but it finally worked.
Microsoft R Open
What is more relevant is that it relies on Microsoft R Open, which in turn does work on various platforms. That I'll surely have to give a closer look.
I somewhat tend to forget what I saw: xgboost, for instance, comes up in my mind as something written in a low-level way, e.g. in C or C++ at the heart. However, if it isn't much so (for instance, matrix operations are still left to be carried out by R's native solutions), then R Open's capabilities could speed things up. Appears I'll have to see that for myself ...
No comments:
Post a Comment