IdeaMonk

thoughts, ideas, code and other things...

Saturday, July 17, 2010

Poorly implementing caching in python - eye opener

So after looking at some slides on caching function returns in javascript, I was keen on trying out so in Python. And LOL I came up with this logic -
fun(val):
if val in cache.keys():
return cache[val]
else:
do the right thing...

But it seems, though "if val in cache.keys()" sounds very human friendly, it definitely would suck for a very big cache, and so it does in the following test.

I guess I'm not using timeit in the classical way where I would pass some statements in string and ask it to do them for N number of times, but it seems passing some variables from existing code to statement string is a pain, tried global etc, didn't work. Hence a simple time diff test.
# -*- coding: utf-8 -*-
import random
from timeit import Timer

class PlainColorParser:
def parse(self,value):
rgb = int(value[1:], 16)
r = rgb >> 16 & 0xff
g = rgb >> 8 & 0xff
b = rgb & 0xff
return (r,g,b)

def __call__(self,value):
return self.parse(value)

class NoBitColorParser:
def __call__(self,value):
return (int(value[1:3],16), int(value[3:5],16), int(value[5:7],16))

class PoorCachedColorParser(PlainColorParser):
def __init__(self):
self.cache = {}

def __call__(self,value):
if value in self.cache.keys():
return self.cache[value]
self.cache[value] = self.parse(value)
return self.cache[value]

class CachedColorParser(PlainColorParser):
def __init__(self):
self.cache = {}

def __call__(self,value):
try:
return self.cache[value]
except KeyError:
self.cache[value] = self.parse(value)
return self.cache[value]

if __name__ == "__main__":
t = Timer()
pccParse = PoorCachedColorParser()
ccParse = CachedColorParser()
pcParse = PlainColorParser()
nbParse = NoBitColorParser()

# setup some random data to test
colors = []
for i in xrange(50000):
colors.append("#" + hex(random.randint(0xff0000, 0xff0aff))[2:])


def timeDiff(method):
start = t.timer()
for c in colors:
method(c)
stop = t.timer()
return ((1000000*stop - 1000000*start)/1000000)

# ---- test poorly cached
print "Poorly Cached - %.2fs" % timeDiff(pccParse)

# ---- test cached
print "Cached - %.2fs" % timeDiff(ccParse)

# ---- test uncached
print "Non Cached - %.2fs" % timeDiff(pcParse)

# ---- test no bitwise, uncached
print "Not Bitwise, Non Cached - %.2fs" % timeDiff(nbParse)

So we've got 4 classes to represent different ways of parsing an html hex code for color e.g. "#f00f00" into a tuple of (r,g,b) integers. PlainColorParser and NoBitColorParser could easily be functions with no need of classes over them as they do not cache, but to bring them a little equal to other two cached ones, I've bound them in classes.

NoBitColorParser does string manipulations and parses 3 times before returning a tuple. PlainColorParser does better than that, it uses bit shifts and AND masks to filter out content after 1 round of parsing integer from string. PoorCachedColorParser does caching in an obvious way "if its there in cache keys... else ...", and CachedColorParser complies to the philosophy of "Fail early, fail often", which is quite interesting :D

What the test does is - generate 100000 random color codes, pick them from a range of 2816 colors (0xff0aff - 0xff0000 + 1). Obviously many colors are bound to get repeated says pigeon hole.

Here's what goes on in an average run on my netbook -
[ideamonk@pebble:~/code] $ python pycaching.py
Poorly Cached - 37.60s
Cached - 0.44s
Non Cached - 0.89s
Not Bitwise, Non Cached - 1.19s
[ideamonk@pebble:~/code] $

"if value in self.cache.keys():" in PoorCachedColorParser gives you a thumbs down with a sucky performance, obviously not the right thing to do!!! (I was mistaken)

CachedColorParser gives a sweet 0.44s, "Fail early, Fail often" works :)

NoBitColorParser is suckier than the Non Cached PlainColorParser, which points out that string ops, parsing integers is one costly affair.

So much for food to my sleeplessness. Oh I remember doing something similar in PyMos too :D.

Lol this is the performance on my new machine -
[abhishekmishra[:~/code] $ python pycaching.py
Poorly Cached - 3.98s
Cached - 0.04s
Non Cached - 0.08s
Not Bitwise, Non Cached - 0.08s
[abhishekmishra[:~/code] $

Labels: , ,

Thursday, July 15, 2010

अपने घर के चोर

You find it to be a sorry ass pussy system 
that fails to respond to situations in time,
you read up, watch stuff and blame on them for "not doing anything",
but hardly would it strike to your conditioned imagination
that they might just be so pre-occupied in doing something else
that you could have never imagined.

And yet you passive cunt, you take it all for granted
and sleep for the next day to come by,
expecting everything to be normal.

Who knows for everything,
that you quoted the system to be pussy about,
over your cup of tea and a morning newspaper,
there were these "अपने घर के चोर",
thiefs, filling their pockets.


On a side note,
Now that things have become so well connected,
the systems are so well informed, fed by numerous channels,
and fingers just like the ones that typed out this piece of brainf*ck,
clicking out zillions of preferences,
we're not far from the point when
it would be possible to crunch these into
a whole new understanding, meaning, life.

Just that in this spiderweb,
analyzing nodes, reaching from one to another,
predicting their minds and way of thinking has become easier.

Labels:

Wednesday, July 14, 2010

random junk

" x86 means that it is 86 times faster than the previous chip.

Now the most kick butt processors are the 486sx es
(the sx stands for 'sexy' because the floating
points of fat have been trimmed off) "

Labels:

Wednesday, June 30, 2010

So Distribute!



Sitting with my new (in working-on-linux sense) pen tablet, with nothing much to do, I just scribbled down my newly developed understanding of why should we distribute processing and there isn't much point in increasing CPU speeds these days.

Zoom it to get a clear picture of the problems with just one processor.

Labels: , ,

Monday, June 28, 2010

Yet another Google Calculator shell in Python

For the sake of weekend hack and getting rid of some headache of post bangpypers meetup chicken overload of today's evening -
import re
import urllib
import mechanize
import unittest

class GCalc:
def __init__(self):
self.browser = mechanize.Browser()
self.browser.set_handle_robots(False)
self.browser.addheaders = \
[('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

def cleanup(self,text):
strip_tags = re.compile("<.*?>")
return strip_tags.sub('', text.replace("<sup>","^").replace("</sup>","").replace("×","X"))

def calc(self, query):
try:
raw_result = self.browser.open("http://www.google.com/search?hl=en&q="+urllib.quote_plus(query)).read()
return self.parse(raw_result)
except:
return "Failed to reach Google"

def parse(self, raw_result):
try:
result = raw_result.split('''<td style="vertical-align:top" >''')[1].split('</h2>')[0] + '</h2>'
return self.cleanup(result)
except:
return "Not a calculator query"

class TestGCalc(unittest.TestCase):
def setUp(self):
self.g = GCalc()

def test_numeric(self):
self.assertEqual(self.g.calc("4+5"), "4 + 5 = 9")

def test_bignumbers(self):
self.assertEqual(self.g.calc("234623476 * 59999999999"), "234 623 476 * 59 999 999 999 = 1.40774086 X 10^19")

def test_units(self):
self.assertEqual(self.g.calc("14 inch in mm"), "14 inch = 355.6 millimeters")
self.assertEqual(self.g.calc("one acre"), "one acre = 4\xc2\xa0046.85642 m^2")

def test_cacl(self):
self.assertEqual(self.g.calc("3 * PI / sin(3)"), "(3 * PI) / sin(3) = 66.7855543")

def test_facts(self):
self.assertEqual(self.g.calc("speed of light"), "the speed of light = 299\xc2\xa0792\xc2\xa0458 m / s")


if __name__=='__main__':
unittest.main()

Fork it here, have fun, curse me for crappy splits :)
Just learnt unittest a bit, ah so easy and useful :D

I guess you can notice the grand theme here - Its just ~ 50 lines of code packed up with a unit test and does the job!

I suggest you better not miss PyCon India 2010 in case you too are eager to do some productivity magic with you/your employees/students/etc.

Adios

Labels: ,

Wednesday, June 23, 2010

Might as well seize it with my own hands

Now, I can't let this fail system get me the feeling of "You could have had", when it answers to me with words like "maybe", "I don't know", "not sure", when I have to make decisions in one of the two words "Yes", "No", while they never give enough concrete information to help me make a good decision, while some of its members become fan of the anti-share policy.

Screw you! Whats the pride in having scribbled on a lame white piece of paper and being lucky, screw your for your fake smiles of the non-achievement you brag of, for its just another matter of demand and supply.

Adios, watch me fly by with my own wings sometime.

Labels:

Monday, June 21, 2010

DWM-ified

Arch + Dwm + Conky + Dzen2 + links + vim + irssi (plus ram eaten by chrome, firefox, thunderbird, and eden) otherwise awesome at ~100MB, ideal light & fast setup for my netbook :)

Labels:

Saturday, June 12, 2010

No more self everywhere

A lot of people complain about how they have to write "self" repetitively in their OO code in Python. Here is a neat trick that doesn't get you rid of "self" but lets you put another name for it -

>>> class Foo:
... def __init__(me):
... me.var = "Hola"
...
... def say(me):
... print me.var
...
>>> f = Foo()
>>> f.say()
Hola
>>>

Labels: ,