Tuesday, 16 February 2016

R: why using require() over library() shouldn't be thy default


This is a very simple thing, and really isn't worth wasting too many words on it (but I will! :) ) The point I'm making is right there in the CRAN documentation on these functions:

"require is designed for use inside other functions; it returns FALSE and gives a warning (rather than an error as library() does by default) if the package does not exist."

The trend seems to be that people in increasing numbers started to prefer require(), however. In the one-off recipes, which are a relevant portion of what's being created with R, this doesn't make much sense.

For instance, at the beginning of one of XGBoost's demos, you'll find this:
require(xgboost)
require(Matrix)
require(data.table)
if (!require(vcd)) {
  install.packages('vcd') #Available in Cran. [...]
  require(vcd)
}
Created by Pretty R at inside-R.org

The first 3 require() calls do something. Then something else happens. Then some output, the main point of the demonstration is printed. The warning messages, possibly thrown in the header, will easily go unnoticed among the output lines, unless someone's keen to scroll up to the top.

So let's get together some reasons against the overuse of require():

#1 Delayed feedback. Fatal problems better turn out at an early point and an obvious way if it comes at no extra cost. Errors from require() may take extra effort to find.

#2 Risks. A system silently executing on with problems nobody expected, is always a jeopardy. It would be even more interesting with function name collisions - once the correct library() is missing, a standard function in the environment may get used instead of what the author hoped for.

#3 Copycats (education). Furthermore, the reader/user of a tutorial code does not necessarily have that much routine and is easily deceived for a while - and while that period lasts, this bogus practice will be copied all over the web, in Kaggle scripts and blog entries (like this one ;) ).

As a side note, in the mentioned code, the lucky thing is that the packages are later on used at an early point in time, so the problems will likely turn out at an early point anyway, data.table creation and similar things make this happen. But this is far from making the code correct.

Sunday, 7 February 2016

GPU, Lubuntu, ML - bloody roots

I have presented myself with a CUDA-compatible VGA card for Christmas (Asus GT730) so that I can check out Theano and GPU computing in general. No matter the appetite, having the appropriate amount of business to deal with, I was conveniently procrastinating the installation. Could be good intuition :)

I tried reinstalling it a few times, until I found that some RAM-drive for whatever reason needs to be purged:

sudo update-initramfs -u

Running this at the beginning & also the end of the previously attempted installation steps, I believe it's "there" now, although it was missing a couple of things.

When running Octave, it was still complaining for not having a video driver in place

Xlib:  extension "GLX" missing on display ":0".

A page said that adding a couple of lines to the code (it was the Coursera Machine Learning course's ex3.m) can remedy the situation ... and in fact I only needed to insert the line

graphics_toolkit gnuplot

right after the initial clean-up:

%% Initialization
clear ; close all; clc


and that solved this one of my problems. Or I could at least see that initial plot.


Some of the Cuda examples are broken now, though. Not sure why (perhaps just a missing path), but when running

MonteCarloMultiGPU

I'm getting this:

error while loading shared libraries: libcurand.so.7.5: cannot open shared object file: No such file or directory

... thank you so much, laters, then ...