Monday, 18 February 2019

Caching, Plotly Dash, idea: bang your head ...


First observation: plotly Dash uses multi-threaded callbacks to generate content. At least the following suggest that this is the case:
  • threading.get_ident() reports different thread ID's
  • functools.lru_cache occasionally fails
  • well, I get messages of multiple call starts, then multiple function body ends - pretty evident that they're executing in parallel
The fun thing is that the function calls seem to execute before any of them would have the chance to place some pre-calculated value on top of the cache, to make the day of the rest of the callers a little better.
So well, the same calculation is at least executing on about 4 cores (in my specific case, according to the below log), which is nonsensical, even if it doesn't cause a visible delay on the given system.

So how am I going to get away with this?

Intriguingly, I've apparently run out of standard solutions so much as I've even tried the Dash recipe with the filesystem based Flask Cache, to no avail, or at least apparetly: not achieving a 100% success.

So next thing is to work out why, if there's still a chance, the cache misses - because the decorated function gets called relentlessly...

@cache.memoize(timeout=CACHE_TIMEOUT)
def get_filtered_data(date_range):
...


The situation roughly is that date_range is a tuple (pair) of datetime values. Can there be some equality checking problem then in some dictionary lookup or similar? That being my last guess, I'm calling it a way too long day, and will get back to this tomorrow.

To be continued ...




@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102504281856
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140103183415040
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102863554304
get_filtered_snapshot_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102479103744
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102855161600
get_filtered_snapshot_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140103292983040
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102504281856
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102487496448
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102495889152
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102863554304
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102470711040
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102855161600
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140103183415040
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102495889152
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102470711040
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 2, 18, 0, 0)) thread: 140102487496448







Update - some uninteresting (! wow ...) developments, diskcache.FanCache.memoize() achieves about the same:

cache miss in cache # 140434996323104
get_filtered_snapshot_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434349221632
cache is setting at key ('__main__get_filtered_snapshot_data', (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0))) value [0, 20, 15, 0, 22, 0, 12, 2, 17, 14, 23, 15, 16, 12, 19, 15, 9, 13, 9, 21, 13, 21, 12, 1, 18, 1, 15, 23, 17, 13, 0, 19, 5, 21, 15, 22, 15, 23, 16, 22, 18, 12, 21, 13, 11, 10, 21, 14, 18, 15, 19, 21, 19, 23, 1, 1, 1, 14, 1, 16, 12, 20, 5, 22, 17, 16, 20, 18, 0, 21, 17, 12, 19, 11, 12, 18, 1, 19, 11, 21, 17, 20, 17, 2, 17, 6, 20, 17] with expire None
cache miss in cache # 140434996323104
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434340828928
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434340828928
cache is setting at key ('__main__get_filtered_data', (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)), <class 'tuple'>) value Empty DataFrame
Columns: [timestamp, status, key_count, backspc_count, hour]
Index: [] with expire 1
cache miss in cache # 140434996323104
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434357614336
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434357614336
cache is setting at key ('__main__get_filtered_data', (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)), <class 'tuple'>) value Empty DataFrame
Columns: [timestamp, status, key_count, backspc_count, hour]
Index: [] with expire 1
cache miss in cache # 140434996323104
@strt get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434324043520
@ends get_filtered_data (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)) thread: 140434324043520
cache is setting at key ('__main__get_filtered_data', (datetime.datetime(2018, 7, 31, 0, 0), datetime.datetime(2019, 1, 28, 0, 0)), <class 'tuple'>) value Empty DataFrame
Columns: [timestamp, status, key_count, backspc_count, hour]
Index: [] with expire 1

 

No comments:

Post a Comment