Monday, 4 June 2018

Processing close: filter early

After all, it's just a bit of extension on the previous post, and is a generally utilizable intuitive augmentation concept in IT.

+1) Consider filtering/reducing the information or data in transport early


If need be for performance optimization, and there is some filtering involved at a late stage in data processing (e.g. with databases - a where clause), it's worth considering whether it improves the overall process if that filtering is done at an early stage.

Of course, the most radical filtering steps, i.e. those with the best selectivity are often the best to evaluate first, etc..

This can again be useful when designing map-reduce algorithms, or - more generally - data processing flows. Obvious factors to consider could be the network transportation costs, the temporary persistence in a file system, reducing on these can easily improve the overall performance of the system.

One explicit case: for symmetric graphs, where the symmetry is preserved in intermediate results, it can be a good thought to transfer only a half of it, e.g. in and above the diagonal, and only "double" the output in the very last step, if that is required at all.

No comments:

Post a Comment