Tuesday, 28 June 2016

R: a quick permutation sightseeing

Listing permutations of a number of distinct elements is surely doable, as long as the number of elements is not too big. But with only a little pickiness, such an innocent task can raise many interesting (but small) challenges to deal with once everyday expectations (memory, heap, CPU efficiency) are respected.

I found myself facing this task which involved a list of permutations of 5 elements (related to the Draper Satellite Image Chronology competition on Kaggle), and while it's truly simple, I thought I would just have a peek at how far the common sense has progressed with it.

Answers to a StackOverflow question should tell something about it.

There were
  1. the brute force exhaustive search approach, probably the simplest of all (which is probably the most suited for my original motivation)
  2. the classic recursive implementation
  3. Museful's much more vectorized ("matrixized") solution, a variant of which had too crossed my mind (unfortunately the SO one is just as good if not better than mine would have been, so I think I shouldn't waste many words on it)
and some others which seemed conceptually equivalent to either of the above.

The problem was pretty much solved, but my brain kept ticking... so there is one more I would like to take a note of here.

Firstly, I did deal with this before, a very long time ago, and I had a solution which I annoyingly forgot about, but it was easy to recreate something similar.
That key concept was that (over the same elements) a permutation is only different from another one by (a number of) exchanges between pairs of elements.

Actually, that's almost too easy to see, as just by always exchanging a misplaced element with the one with the right number, and repeating it, in no more than n exchanges (actually n-1 is sharper, and probably the average number of minimum required steps is much lower) we should get there.

E.g. have 12345 want 54321

exchange 1 & 5, get:

52341

exchange 2 & 4, get:

54321

done.

In other words, just by exchanging elements, we can go from one permutation to the other.

So this is probably an option to list all the permutations just as well.

Going from 12345 to 12354 is a simple move: replace the last two. To get to the next meaningful step, a third element surely needs to be involved. Then we may get back to varying the first two. Then the elements in the most frequently varying pair need some fresh blood again.

To see what I mean, the pair variations over a given 'symbol set' may be:
  1. 12345 and 12354
  2. 12435 and 12453
  3. 12534 and 12543
Giving the first 6 permutations. The generalisable recipe of items for the above starters:
  1. don't do anything,
  2. replace the 3rd with one element from the pair,
  3. replace the 3rd with the other element from the pair.
The flow goes as:

This may still feel a bit unclear, but technically from the permutations of (n - 1) elements we get those of n elements by placing the new element into different positions within all n slots, which means in the first case it's not involved in the last (n - 1), then it substitutes a different one of those (n - 1) in each "big" step. The last (n - 1) elements are just shuffled around as usual.

Slightly modifying it (reverse order) for the ease of coding, we get the below implementation.

I was really happy to sketch this out, although it's not much of an improvement. If I really want to be nice to myself then it is, as it does not have to be recursive, surely has a light heap footprint (the momentarily state of the algo is stored in 2 * n integers), but otherwise isn't too interesting, at least in R, where vectorization over tons of short loops is much preferred. It could possibly be a good choice in C(++) if someone's bored enough.

Then I Googled permutations with pair switches and got to a really interesting PDF.
Their solution assigns a 'momentum' (direction) to each number at each step, and I don't fully understand it (yet). Powers beyond my control :)
But surely is a simple one and is more efficient in that this one doesn't put back the items from where they came from, i.e. only applies half of the switches. However, it does have to find the largest movable element.

So do I win or what? :) Maybe only as long as I resist the temptation to read beyond the first page of their stuff... I'd say, for the temporary illusion, not today! Sure they have something better.. :)

Monday, 13 June 2016

R: the "is ordered" exercise

Some programming treats look nicer to me, one of them is determining whether a series of numbers is ordered in one way or another.

It can come useful when trying to decipher some shuffled data set, which mainly strikes me on Kaggle. Albeit with a little flaw it could be done 'on the spot', i.e. trivial solutions are easy to implement, it seems like an interesting exercise if vectorization is a requirement.

And I'm trying to collect such things here and keep my brain moving, so I created a solution which I'll give below.

The function

Not much (just at a proof of concept level), but tested: link to the test code. Obviously if anyone has a better idea for the implementation, please give me a shout!