Thursday, April 2, 2009

CTags 2: TDD

This is part two of CTags. See part 1

Test Driven Development

Adherents religiously write tests first for *every* function they write.

Personally, I think a tonne of unit tests while prototyping is a waste of time for `simple` stuff with only one programmer working. It tends to feel like walking in mud. I prefer higher level tests that while they may not pinpoint the exact cause of failure won't double refactoring efforts. Especially when prototyping. Spend ages testing that `The Wrong Way` works? No thanks... Prototype, then rewrite with tests.

Sometimes though, `TDD` really is indispensable while exploring. Especially when working on stuff that is pushing the limits of your understanding. If I start encountering bugs I usually see it as a sign I need to start writing tests. Rather than using transient print statements to debug I'll write some unit tests.

The TagFile class below (now commented) is an example of when I found testing while prototyping invaluable.

  1  #################################### IMPORTS ###################################
2

3 from __future__ import with_statement
4
5 import os
6 import bisect
7 import mmap
8 import unittest
9
10 ################################### CONSTANTS ##################################
11

12
13 """
14 The tags in a `tags` file are listed one per line formatted as so:
15
16 tag_name<TAB>file_name<TAB>ex_cmd;"<TAB>extension_fields
17
18 """

19
20 # symbolic constants for column indexes
21
SYMBOL = 0
22 FILENAME = 1
23
24 ################################################################################
25

26 class TagFile(object):
27 def __init__(self, p, column):
28 """
29
30 Instantiate a new TagFile
31
32 @p path to `tags` file
33 @column which column to read
34
35 """

36
37 self.p = p
38 self.column = column
39
40 def __getitem__(self, index):
41 "self.fh is the mmap opened by get"
42 # Seek to a certain byte index
43
self.fh.seek(index)
44
45 # The position is likely to be halfway through a line so read up to
46
# the first new line and throw away the `junk`
47
self.fh.readline()
48
49 # Note that it's actually returning the column from the line *after* the
50
# line region containing the index.
51

52 return self.fh.readline().split('\t')[self.column]
53
54 def __len__(self):
55 # bisect.bisect_left search must know how large the file is
56
return os.stat(self.p).st_size
57
58 def get(self, *tags):
59 """
60
61 Get all lines for one or more tags
62
63 """

64
65 with open(self.p, 'r+') as fh:
66 # mmap is not needed but delivers performance increase
67
self.fh = mmap.mmap(fh.fileno(), 0)
68
69 for tag in tags:
70 # As __getitem__ returns the colum from the line region *after*
71
# that containing the index pt then bisect( alias of
72
# bisect_right ) will give the wrong index.
73

74 b4 = bisect.bisect_left(self, tag)
75
76 # Move the file to the position found
77
fh.seek(b4)
78
79 # Iterate over file. There may be more than one tag line to get
80
# per symbol/filename
81
for l in fh:
82 # Compare search vs line at column
83
comp = cmp(l.split('\t')[self.column], tag)
84
85 # This handles the case of being `left of left` due to
86
# __getitem__ index being left of symbol it returns
87
# ie wait until catch up
88
if comp == -1: continue
89 # If line is greater then have moved on to next symbol
90
elif comp: break
91
92 # Found tag!
93
yield l
94
95 # close mmap
96
self.fh.close()
97
98 ##################################### TESTS ####################################
99

100 class CTagsTest(unittest.TestCase):
101 def test_tags_files(self):
102 """
103
104 This test basically iterates over each line in the tags file creating
105 a list of lines for each unique symbol it finds. It then compares this
106 list to that returned by the TagFile binary search.
107
108 """

109
110 # Successfully passed test on 10MB+ tags file
111
# tags = r"C:\Python25\Lib\tags"
112

113 tags = 'tags'
114 tag_file = TagFile(tags, SYMBOL)
115
116 with open(tags, 'r') as fh:
117 latest = ''
118 lines = []
119
120 for l in fh:
121 symbol = l.split('\t')[SYMBOL]
122
123 if symbol != latest:
124
125 if latest:
126 tags = list(tag_file.get(latest))
127 self.assertEqual(lines, tags)
128
129 lines = []
130
131 latest = symbol
132
133 lines += [l]
134
135 if __name__ == '__main__':
136 unittest.main()
137
138 ################################################################################

No comments: