Poorly implementing caching in python - eye opener
So after looking at some slides on caching function returns in javascript, I was keen on trying out so in Python. And LOL I came up with this logic -
fun(val):
if val in cache.keys():
return cache[val]
else:
do the right thing...
But it seems, though "if val in cache.keys()" sounds very human friendly, it definitely would suck for a very big cache, and so it does in the following test.
I guess I'm not using timeit in the classical way where I would pass some statements in string and ask it to do them for N number of times, but it seems passing some variables from existing code to statement string is a pain, tried global etc, didn't work. Hence a simple time diff test.
# -*- coding: utf-8 -*-
import random
from timeit import Timer
class PlainColorParser:
def parse(self,value):
rgb = int(value[1:], 16)
r = rgb >> 16 & 0xff
g = rgb >> 8 & 0xff
b = rgb & 0xff
return (r,g,b)
def __call__(self,value):
return self.parse(value)
class NoBitColorParser:
def __call__(self,value):
return (int(value[1:3],16), int(value[3:5],16), int(value[5:7],16))
class PoorCachedColorParser(PlainColorParser):
def __init__(self):
self.cache = {}
def __call__(self,value):
if value in self.cache.keys():
return self.cache[value]
self.cache[value] = self.parse(value)
return self.cache[value]
class CachedColorParser(PlainColorParser):
def __init__(self):
self.cache = {}
def __call__(self,value):
try:
return self.cache[value]
except KeyError:
self.cache[value] = self.parse(value)
return self.cache[value]
if __name__ == "__main__":
t = Timer()
pccParse = PoorCachedColorParser()
ccParse = CachedColorParser()
pcParse = PlainColorParser()
nbParse = NoBitColorParser()
# setup some random data to test
colors = []
for i in xrange(50000):
colors.append("#" + hex(random.randint(0xff0000, 0xff0aff))[2:])
def timeDiff(method):
start = t.timer()
for c in colors:
method(c)
stop = t.timer()
return ((1000000*stop - 1000000*start)/1000000)
# ---- test poorly cached
print "Poorly Cached - %.2fs" % timeDiff(pccParse)
# ---- test cached
print "Cached - %.2fs" % timeDiff(ccParse)
# ---- test uncached
print "Non Cached - %.2fs" % timeDiff(pcParse)
# ---- test no bitwise, uncached
print "Not Bitwise, Non Cached - %.2fs" % timeDiff(nbParse)
So we've got 4 classes to represent different ways of parsing an html hex code for color e.g. "#f00f00" into a tuple of (r,g,b) integers. PlainColorParser and NoBitColorParser could easily be functions with no need of classes over them as they do not cache, but to bring them a little equal to other two cached ones, I've bound them in classes.
NoBitColorParser does string manipulations and parses 3 times before returning a tuple. PlainColorParser does better than that, it uses bit shifts and AND masks to filter out content after 1 round of parsing integer from string. PoorCachedColorParser does caching in an obvious way "if its there in cache keys... else ...", and CachedColorParser complies to the philosophy of "Fail early, fail often", which is quite interesting :D
What the test does is - generate 100000 random color codes, pick them from a range of 2816 colors (0xff0aff - 0xff0000 + 1). Obviously many colors are bound to get repeated says pigeon hole.
Here's what goes on in an average run on my netbook -
[ideamonk@pebble:~/code] $ python pycaching.py
Poorly Cached - 37.60s
Cached - 0.44s
Non Cached - 0.89s
Not Bitwise, Non Cached - 1.19s
[ideamonk@pebble:~/code] $
"if value in self.cache.keys():" in PoorCachedColorParser gives you a thumbs down with a sucky performance, obviously not the right thing to do!!! (I was mistaken)
CachedColorParser gives a sweet 0.44s, "Fail early, Fail often" works :)
NoBitColorParser is suckier than the Non Cached PlainColorParser, which points out that string ops, parsing integers is one costly affair.
So much for food to my sleeplessness. Oh I remember doing something similar in PyMos too :D.
Lol this is the performance on my new machine -
[abhishekmishra[:~/code] $ python pycaching.py
Poorly Cached - 3.98s
Cached - 0.04s
Non Cached - 0.08s
Not Bitwise, Non Cached - 0.08s
[abhishekmishra[:~/code] $
Labels: caching, experiments, python




