Friday, 21 June 2019
... and some dislikes in Python itself :) ... #1: file seek offset
"A from_what value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. from_what can be omitted and defaults to 0, using the beginning of the file as the reference point."
(source)
Hopefully one day this changes.
Magic numbers in a framework are not really the best practice I'd diplomatically say - although this one is apparently just a relic from those old/new days of Python when the foundations were laid down. Yet, a glitch yielding poorly readable code and causing potential repetitive googling syndrome - plus I guess a negative offset would be more idiomatic than a separate constant (update: just realized that the negative offset is supposed to be used in conjunction with the "2" constant - isn't that a bit odd? Raises the question - so does it work with the "0" - from beginning - case?), which could then retire to be a more often ignored optional parameter.
Friday, 14 June 2019
5 things I hate about Pandas #1 (..#3)
Not sure it'll be a 5 element list, but frankly, to me with an OOP origin, some aspects of the Pandas multi-index support feel extremely ridiculous.
For instance this one.
Obviously, there may be consistency urges stemming from other areas (which?) but I would primarily expect a multi-index to allow for picking a subset of a data frame by reference.
Then it could be useful, at the very least for the sake of elegance ('data hiding') make it possible to subject that to data processing that has no reason to look elsewhere.
And so I wouldn't get confused when debugging etc.
Such as:
process_stuff1(df.stuff1_columns)
or at least, if it's a copy on 'read' reference
df.stuff1_columns = process_stuff1(df.stuff1_columns)
(I prefer the . notation over the indexer [""] for code completion purposes.)
And yes, no. Neither works. No, memorize.
I do wonder though how many find these not being "the approach" intuitive?
(Those finding the SO entry probably not.)
Well, I can live with it. (There's the can do = compromising attitude :) )
UPDATE: #2
Drop rows based on condition #20944
It's mid-2019 now. No comment.
UPDATE: #3
When aggregating a Pandas data frame, you may choose to specify a dictionary that assigns functions to columns. Say you aggregate a column of numbers by adding them up, but another column - you may wish to see the maximum value.
They may receive odd column names in the end - like it can be obscure to name the total value column as that of the values being tallied up.
So you'd change the names, intuitively (df.columns = [...]).
Now, if you think the order is granted, you're wrong.
Python dictionaries by default are unordered, and so the order in which you defined their elements may get lost along the way.
So just generally don't do that, or specify an order, or whatever ... the intuitive road is, anyway, blocked :\
For instance this one.
Obviously, there may be consistency urges stemming from other areas (which?) but I would primarily expect a multi-index to allow for picking a subset of a data frame by reference.
Then it could be useful, at the very least for the sake of elegance ('data hiding') make it possible to subject that to data processing that has no reason to look elsewhere.
And so I wouldn't get confused when debugging etc.
Such as:
process_stuff1(df.stuff1_columns)
or at least, if it's a copy on 'read' reference
df.stuff1_columns = process_stuff1(df.stuff1_columns)
(I prefer the . notation over the indexer [""] for code completion purposes.)
And yes, no. Neither works. No, memorize.
I do wonder though how many find these not being "the approach" intuitive?
(Those finding the SO entry probably not.)
Well, I can live with it. (There's the can do = compromising attitude :) )
UPDATE: #2
Drop rows based on condition #20944
It's mid-2019 now. No comment.
UPDATE: #3
When aggregating a Pandas data frame, you may choose to specify a dictionary that assigns functions to columns. Say you aggregate a column of numbers by adding them up, but another column - you may wish to see the maximum value.
They may receive odd column names in the end - like it can be obscure to name the total value column as that of the values being tallied up.
So you'd change the names, intuitively (df.columns = [...]).
Now, if you think the order is granted, you're wrong.
Python dictionaries by default are unordered, and so the order in which you defined their elements may get lost along the way.
So just generally don't do that, or specify an order, or whatever ... the intuitive road is, anyway, blocked :\
Saturday, 1 June 2019
(So many of) my days with Python in a visualization

P.S.: yes, it's charting.
Subscribe to:
Posts (Atom)