Yeah.
I was curious about which words I say the most when talking on msn so I decided to write a quick python script to find out:
import re,os,string
logdir = "/home/mat/.purple/logs/msn/[my email]"
regex = re.compile("\(\d\d:\d\d:\d\d\) Mat: (.*)")
info = {}
def compareVal(a,b):
"""Compare a list of tuples using their 2nd value"""
return b[1]-a[1]
def removePunc(word):
"""Couldn't find an existing function for this, strip only removes characters from the ends
"""
return "".join([i for i in word if i not in string.punctuation])
#go through all files in the subdirectories
for root, dirs, files in os.walk(logdir, topdown=True):
for f in files:
fullPath = os.path.join(root,f)
text = open(fullPath)
for line in text:
m = regex.match(line)
if m: #only look at stuff I've said
words = m.group(1).split()
words = [removePunc(i) for i in words] #remove punctuation
words = [i for i in words if len(i)>3] #filter out short words
for word in words: #increment the frequency for this word
if word in info: info[word] += 1
else: info[word] = 1
li = info.items()
li.sort(compareVal) #sort by frequency (highest first)
for i in li:
print "%s: %d" % i
(It ignores anything with 3 letters or less)
The results were kind of boring. Here’s the top 10:
yeah: 1984
that: 1956
have: 1565
what: 1321
like: 1148
just: 993
dont: 937
think: 870
well: 832
with: 722
October 10th, 2008
Hot diggity daffodil!