Monday, 25 June 2018

EBS init hell

I should remind myself that there is some initial penalty for creating an EBS volume on AWS - right after creation it is really slow to access for a good while.

It's a gp2 volume and iotop reports a 5 MB/s access speed, in agreement with the CloudWatch metrics.

I am wondering for how long - the stuff on this link suggests straight after first access it starts feeling fine.

However, I can see it being slow on the second run of the recommended
dd if=/dev/xvda of=/dev/null bs=1M

okay I only partially ran it at first, but I'd expect walking the blocks to be a very deterministic process for dd. So maybe it's rather the first complete access? Or a few hours of initialization, such as the case is a large enough chunks when growing a volume or e.g. with a complete drive type change (i.e. gp2-io1)?

Well, I give up on that for today/night but best remember this caveat ...

Update: really a little googling confirms this... that it needs a complete read through:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-initialize.html


Monday, 4 June 2018

Processing close: filter early

After all, it's just a bit of extension on the previous post, and is a generally utilizable intuitive augmentation concept in IT.

+1) Consider filtering/reducing the information or data in transport early


If need be for performance optimization, and there is some filtering involved at a late stage in data processing (e.g. with databases - a where clause), it's worth considering whether it improves the overall process if that filtering is done at an early stage.

Of course, the most radical filtering steps, i.e. those with the best selectivity are often the best to evaluate first, etc..

This can again be useful when designing map-reduce algorithms, or - more generally - data processing flows. Obvious factors to consider could be the network transportation costs, the temporary persistence in a file system, reducing on these can easily improve the overall performance of the system.

One explicit case: for symmetric graphs, where the symmetry is preserved in intermediate results, it can be a good thought to transfer only a half of it, e.g. in and above the diagonal, and only "double" the output in the very last step, if that is required at all.