Evolving Code: 2018

Tuesday, 30 October 2018

The late night git show

(... note to self ...)

git status

Happens to be a useful command.

A very useful one, especially when you're amidst an interactive rebase that you were keen to forget about in roughly 5 seconds after intiating it.

So when the stuff goes incomprehensibly missing, especially committed files ... don't try to make it a late night adventure recreating things from memory :)

Just git status.

GIT STATUS

gIt StAtUs

...

Well not sure if git cola displayed that bit somewhere though, didn't catch my eyes ... could be a valuable feature to request if not.

Wednesday, 26 September 2018

6 page memos

Just have been reading an article about Jeff Bezos's approach to keeping a company as dynamic as a startup is or should be.

I'll take the liberty to (disagree and commit lol) almost just bullet-point list out the keywords ... just becasue this is how much time I have :) - the below only makes sense together with the above Forbes link really)

First, "senior executives start meetings at Amazon in silence, with everyone reading six-page narrative memos about the topic they are gathered to discuss, for up to 30 minutes"
Note that here quality preparations are brought into the game - when appropriate at least.

Then, it is important to make a distinction between decisions: "Type 1 decisions can't be reversed and as such require great care. Type 2 decisions can be easily reversed."

"Make Decisions With 70% Of The Info You Wish You Had"

"Disagree and commit"

"You have to somehow make high-quality, high-velocity decisions"

Well, that's it, build your own Amazon!

Saturday, 4 August 2018

Easter eggs for experts in MongoDB: beat the 8

^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:38Z")
> ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:38Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
> ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
> ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
> ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
> ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:39Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:40Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:40Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:40Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:40Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:40Z")
^[ObjectId().getTimestamp()
ISODate("2018-08-05T00:16:40Z")

Yes, that's eight of a kind...!

Monday, 25 June 2018

EBS init hell

I should remind myself that there is some initial penalty for creating an EBS volume on AWS - right after creation it is really slow to access for a good while.

It's a gp2 volume and iotop reports a 5 MB/s access speed, in agreement with the CloudWatch metrics.

I am wondering for how long - the stuff on this link suggests straight after first access it starts feeling fine.

However, I can see it being slow on the second run of the recommended

dd if=/dev/xvda of=/dev/null bs=1M

okay I only partially ran it at first, but I'd expect walking the blocks to be a very deterministic process for dd. So maybe it's rather the first complete access? Or a few hours of initialization, such as the case is a large enough chunks when growing a volume or e.g. with a complete drive type change (i.e. gp2-io1)?

Well, I give up on that for today/night but best remember this caveat ...

Update: really a little googling confirms this... that it needs a complete read through:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html

Monday, 4 June 2018

Processing close: filter early

After all, it's just a bit of extension on the previous post, and is a generally utilizable intuitive augmentation concept in IT.

+1) Consider filtering/reducing the information or data in transport early

If need be for performance optimization, and there is some filtering involved at a late stage in data processing (e.g. with databases - a where clause), it's worth considering whether it improves the overall process if that filtering is done at an early stage.

Of course, the most radical filtering steps, i.e. those with the best selectivity are often the best to evaluate first, etc..

This can again be useful when designing map-reduce algorithms, or - more generally - data processing flows. Obvious factors to consider could be the network transportation costs, the temporary persistence in a file system, reducing on these can easily improve the overall performance of the system.

One explicit case: for symmetric graphs, where the symmetry is preserved in intermediate results, it can be a good thought to transfer only a half of it, e.g. in and above the diagonal, and only "double" the output in the very last step, if that is required at all.

Tuesday, 22 May 2018

Processing close

People sometimes write code that is very assertive about this or that piece of a system. In a more suspicious case, many parts.
This typically isn't going to yield performant code - something will often try to slow your job down.

A specific advise:

0) Keep (simple) processing close to the database.

What I mean by simple is something that easily could be translated to a few machine level instructions (e.g.: sum) in each iteration, is done in a few iterations (e.g. 1), doesn't operate on a large dataset in each iteration.

Let's generalize it:

1) Keep avoiding to cross boundaries between system parts unnecessarily.

Or: try to consider alternatives where fewer communication routes are involved.

Such means of system boundary crossing to be avoid could be anything really, depending on the scale of operation, and at what magnification we consider the system:

Querying data from a server
Calling into a DLL/library
Exchanging information with a service/daemon (inter-process - but also, probably with a smaller overhead, synchronized inter-thread communication).
Looking up data from another table (one implication: look for unnecessary JOINs, maybe consider denormalization, esp. in a NoSQL setting)
Calling Java bytecode from a compiled binary
Calling another function (remember there's a cost of leaving return information on top of the stack)
Blocking on-screen confirmation with the user

A major exception: if the overall complexity is high.

In this case it may be worth turning to some hard/software dedicated to the processing - using a GPU, an SSD, scaling up, a cluster etc. a few terms to think of as opposed to CPU/FPU, HDD, your regular hardware, single compute node. And this will all involve moving data from one part of the system, less suited for the processing task, to another, more feasible, possibly more dedicated, potentially a temporarily allocated resource from a cloud.

However, it's also worth noting:

faulty and or critical system components may cry for redundancy

Such as - people. And then there come the desirable boundaries - but also come peer insight. Whether the superior efficiency of the one-man teams is a myth... well, while we'd like to believe in myths, we do know, that may not always be the best that can happen. You may need very proficient and disciplined people for that to work out - these days, with an expanding IT industry, years of experience on average is going plummeting.
Stumbling upon them could be way more the exception than the norm.

Thursday, 17 May 2018

A quick reminder to self about error handling in R (baby steps #1)

I guess the below code tells pretty much it all:

tryCatch({
x = function() {
stop("hahaha now you blew the code!")
}
y = function() {
x()
}
y()
}, error=function(e){
print(e$call)
print(e$message)
})

And then sourcing it gives:

x()
[1] "hahaha now you blew the code!"

I guess that's all that there is in practice...
... except maybe that there's a finally parameter included which I never notice :

function (expr, ..., finally)

So... worth a second look :)

(To be continued...)

Monday, 7 May 2018

JavaScript versus integer sequence

Looks like the ever feared, popularly hated JS really has its drawbacks ... you need functional programming to create an integer sequence? you need a loop? and wouldn't anybody want to just do something about it? :)

E.g.
https://stackoverflow.com/questions/3895478/does-javascript-have-a-method-like-range-to-generate-a-range-within-the-supp

Jeez :) time to realize how good Python and R are... well, R as a language only in certain aspects, but still.

Thursday, 26 April 2018

Gesundheit MongoDB

MongoDB 3.6

Looking for null values? The answer: BSON type 10.
You are to look for "$type: 10" values...

Naturally the question is whether you guys are seriously using a magic number by design in a database engine built for web developers, in the 21st century. Exciting times.

(What am I overlooking ...?)

Thursday, 29 March 2018

Beauties of R: isNumeric

So, in R apparently there is no function in the core packages to test if a string would work out as a number.

There are really unappealing workarounds, like:
https://stackoverflow.com/questions/16194212/how-to-suppress-warnings-globally-in-an-r-script
This uses global settings (options(warn=-1), bonus points for the magic number), risking failing to recover in case of erroneous situations, potentially affecting other threads unwittingly/unintentionally as part of a larger scale quest, etc.

Probably the slightly more laycoder-minded need to lay out their homemade witchery for this maybe with a regex. Or use something slightly external e.g. limma from the bioconductor repository. (And then ... there is a function called isNumeric. How ... why wouldn't you subconsciously confuse that with is.numeric? Cautious language design could come to the rescue ...)

Seriously? Is it the time when non-mathsy JS's isNaN() functionality clearly beats R?? :D No way. What am I missing?

Anyway, just for the huge fun here's my quick regex solution...

> test.values = c("5", "56", "5.67", ".67e+30")
> grepl("^([+-]?)((\\d+)|(\\d*[.]\\d+))([eE]([+-])?\\d+)?$", test.values)
[1] TRUE TRUE TRUE TRUE

Update: suppressWarnings(), for instance. I did overlook that one. However, what you get there is suppression of all warnings, not only the one you wanted to suppress. Typically, the range of warnings is not defined by the documentation of the function guarded by this function... so you never know what you got rid of.

Seems like a built in function would be a beneficial addition.

Thursday, 22 March 2018

Python console progressbar "???"

1%^M142/6456

Hence, 142 / 6456 < 0.02.

Good old Python (???) progressbar .. broke the mold, eh? Don't want to know ;)

Wednesday, 14 February 2018

No swap on AWS

Using Linux on AWS? As of now, there's no swap file then by default coming out of the images.

See the below two links for advices

How can I check if swap is active from the command line?
Why don't EC2 ubuntu images have swap?

A brief highlight roughly from those is this helpful command, to confirm your suspicion:

cat /proc/meminfo | grep -i swap

This will likely provide you with big fat zeroes as below:

SwapCached: 0 kB
SwapTotal: 0 kB
SwapFree: 0 kB

How cool is that (without a warning at the very least)...

Watch out when doing memory intensive processing (which is a less than good sign :) but happens), for any reason.

Friday, 9 February 2018

Love hate / Ubuntu & Android

When having to transfer files like

mkdir FromPhone/DCIM; sudo mv /media/motog/Internal\ shared\ storage/DCIM/Camera FromPhone/DCIM/

After apt installing this'n'that and mounting with
sudo jmtpfs /media/motog/

Hm... not feeling like a plug & play Windows user that much anymore :)

Too bad I'll never remember/work out in retrospect if it's just my faulty ("universal") laptop charger transferring a noise that normally makes my touchscreen crazy - and yes, it may have failed file transfers too - which was playing tricks with me.

Because anyhow, it just works now.