Tuesday, July 28, 2009

Consolation: It's not you, it's the other guy.

I have been doing some freelancing work but it hasn't really been buttering the bread on a consistent basis. I avoid PHP which doesn't really leave much else job wise. Normally I value my time and flexible working hours far more than dollars so a full time job isn't something I'd really look for. Once you have been poor long enough however you start resorting to desperate measures and thinking of all sorts of crazy ideas. Boredom sets in and you even do things like offer your services practically free. Fail with consolation:
it is also clear to me that you possess attributes that would make an excellent employee - talent, enthusiasm and dedication. I would have no issues with offering you work if I were able to get XXXX to that stage... I am more than happy for you to put me down as a reference for any job opportunities, as I would highly recommend you to any employer.
Now whether the guy was just politely fobbing me off or he actually meant that I'm not entirely sure half the time. I've never particularly thought of myself as `talented` when it comes to code. `Persistent` like a retarded dog who doesn't know when to stop, sure.

When you can't practically give yourself away and you again decide you really want stable cash what do you then? 'I know! I'll look for a *well paying* job. What a shrewd and cunning plan!'

As I'm starting to get reasonably fluent with python, not needing references every 10 seconds, I naturally looked for jobs using that. There were however none available that didn't require `3 years experience` and `knowledge of finance` etc. The `python` keyword did however turn up another interesting advertisement.

We're a Richmond-based software development studio specialising in web technologies such as Flex and Ruby on Rails. We are looking for a passionate and energetic developer to join our team of three.

We're more interested in what you can do than what you say you can do. So prove yourself with your attitude and with your code.

We're looking for someone who's spent their evenings and weekends working on their own projects, and has something to show for it. We have a particular love of open source collaboration, and you should too. Tell us about the projects you watch, and the projects you contribute to: do you have accounts with Github, SourceForge, or RubyForge?

We want to be floored by your enthusiasm to work with new products, programs, and languages. Dynamic or functional languages like Python, Ruby, Actionscript, Erlang, and Haskell are looked upon favorably. We're not interested in your academic transcript, your 2-day ScrumMaster black-belt, or your Microsoft certificates, but we are interested in real talent, sharp skills, and unmatched motivation.

You must send through a cover letter describing your pet projects or open source projects, example source code from the project, and your resume to recruitment@silverpond.com.au. We won't consider any applications which don't have all three.

Normally I wouldn't have even bothered applying for something like that but I thought, if that guy will give me a reference then I may as well try it. I didn't really have much time to create an application or research the position all that much as I was away on a trip helping my father with a building project. Being an unfit person who would normally spend way too much time on a computer, after a days work I was quite tired in my poor fitness. I mostly reused an old application for another Job at a game development firm.

I was somewhat suprised when I got a positive sounding response from the recruiter/director of the company "I'm free to chat late this week or early next week, so I'm looking forward to hearing from you"

We eventually had a chat a few days later, which I wasn't exactly well prepared for. There was some initial awkardness with some nerves causing to me blurt out "good thanks and yourself" completely out of phase. It then smoothened out somewhat and I was happy to find the team was a group of 20 somethings and that it wasn't an issue I didn't have any experience with the technologies they used. I didn't really have many decent questions prepared ( it was actually my first job interview! ) but regardless at the end I was left with the impression it went well enough. The recruiter wanted me to come in a few days later for some facetime with the rest of the team.

As it turns out I never got that opportunity. Someone else had done their homework and really impressed them so well with their phone interview they got the job solely on that. I found this out as after as I asked the recruiter for any particular reasons the other candidate was chosen (over me). I was very appreciative with the email he replied with:

1) The candidate we chose asked questions about the projects we're doing and suggested improvements to them

2) The candidate also outlined what he could contribute to the team, and offered a timeframe to do them in.

Of the 80 of so applicants it was between the candidate, one other person and yourself. Your energy and enthusiasm is highly commendable. I'll touch base with you later in the year, say November, and see where you're at.
Overall, I have mixed feelings. I'm somewhat disappointed that I didn't have greater time to research the company more and also that I didn't make better use of the time I did have. The biggest disappointment is the fact that I missed out on a job where they didn't care that I lacked a degree or even any real experience with their tools of choice. "I don't really know ruby.." I had said. "It won't be an issue" was the reply. Similar conversations re: git / rails / flex. However it's somewhat encouraging to have made the short list from 80 candidates. That I feel pretty good about.

Will I continue investigating ruby / rails? My only concern is that it seems somewhat wasteful to walk away from my python experience ( It's the libraries stupid ) for a language so similiar. My `next language` to learn has always been planned to be C then possibly assembly.

I wonder if I would learn much from ruby about programming in general coming from python. Perhaps the similarities would be actually a boon to recognizing core concepts and distilling an understanding, compared to something radically different where you have no reference? I'm not really sure and I doubt I could really know for myself unless I tried. However, at the end of the day; there *does* seem to be a lot more work available for ruby/rails as opposed to python.

I can't shake the nagging suspicion I have just been "It's not you it's XXXX"d twice.

I posted on facebook, "didn't get the job and wonders why he marginalizes himself learning obscure tool-sets". A friend commented:
And don't focus on tool-sets - focus on your general problem solving abilities, There are a million different ways to solve any programming problem these days. Key thing is to be adaptable.
I agree with the notion that general problem solving ability is more important than any particular toolset. However, if you had said ability *and* a proficiency with a particular toolset it would make sense to use them where possible. Using the `right tool for the job` is part of solving problems.

Python isn't exactly an `obscure` toolset, however if you are unwilling to work with PHP you definitely are marginalized. I have a very low pain threshold in general. PHP is painful. I don't like pain. Nuff said.

Of interest to me was that the recruiter during his `wish you all the best` phonecall mentioned that the ruby community met once a month (in my home city) Perhaps learning some ruby/rails and getting involved would at the least yield some freelancing contracts. Never really been to any geek meetups. Surely beer is involved?

Monday, July 27, 2009

Ruby Tuesday

IPython is a huge part of my workflow. How to replace it?

(1:24:57 PM) akalias: Does ruby have an equivalent to python's IPython ? ie a hugely enhanced shell with better completion / syntax coloring etc?
(1:25:32 PM) dominikh: irb with auto_indent and the "wirble" gem

I have sourced a few ruby/git ebooks

Windows Ruby Solutions


(1:26:08 PM) akalias: what is the best option for ruby on windows? a vm with linux ;) ?
(1:26:25 PM) dominikh: in my opinion actually yes...
(1:26:39 PM) dominikh: don't know if you can even get a decent terminal in window

The solution is simple: remove windows from the ruby mix

I have VMWare installed and preconfigured 256MB `headless` Ubuntu webserver appliances are readily available. If I set up one of these with SAMBA file-sharing I should be able to use my existing toolset quite easily and use SSH to log in.

Now if only the internet wasn't so damn slow where I live...

UPDATE:

Net is super slow - however I discovered that I had an old Gutsy Gibbon image laying about on one of my portables. It already had colemak keymap installed. The problem was Gutsy Gibbon has reached the EOL. apt-get was no longer working
(5:00:13 PM) ubottu: wants you to know: For upgrading, see the instructions at https://help.ubuntu.com/community/UpgradeNotes - see also http://www.ubuntu.com/getubuntu/upgrading



No problem.

UPDATE:

Compiling Ruby 1.9.1 from source

Windows Ruby Options

I'm looking at learning ruby/rails/git for a full-time job I'm a candidate for. Windows seems to be very much a second class citizen in all of these worlds. Git at least has MSYS git readily available and well maintained. Ruby is another story.

The 1.9 version of ruby, I'm assuming is used by the company, was released in late 07 and now, midway through 09, there is no `one click installer` available.

You can download a 1.9 binary zip at patch level 0 ( the latest is in the high two hundreds ) but it is missing crucial dlls such as zlib and readline.

What's a windoze fool to do? I'm familiar with linux, mostly ubuntu/debian but a lot more productive in a patched up windows. I use replacement windows managers, independent virtual desktop managers for each monitor, a replacement for explorer etc

I have tried setting up a similar environment using linux before but my crappy video card wouldn't allow virtual desktop sets per monitor. Changing environments *and* languages at the same time wouldn't be that smart.

Probably the best would be to try ( in order ):

I have played with ruby under cygwin in the past and it was retarded slow so that is not really an option.

Delving Into Ruby

A lot of python users seem to hold a fair amount of contempt for ruby. They don't like the use of `perlish` punctuation and operators like `@`, `=~` and `:` and its `more than one way` expressiveness.

I have always looked at ruby code and thought the syntax seemed quite readable. Having explicit repeated sections like `self.` as instance variable in python seems just noisy and I prefer the more compact @ruby way.

Rails seems to be architected with a controller class `instance` per request, the view context sourced from the controllers instance variables.
def method
@a = 'hello'
@b = @a * 4
@c = @b * 3
end

Compare this with the typical python web paradigm. Instance variables (self.xxx) are unused as controller instances are shared between requests. Locals are built up first and then redeclared, returned as a context dict.
def method(self):
a = 'hello'
b = a * 4
c = b * 3

return dict( a=a, b=b, c=c )

The example is very basic but note in python you couldn't even validly do the following
def method(self):
return dict (
a = 'hello',
b = a * 4,
c = b * 3,
)

`a` hasn't been declared until the dict structure has been parsed therefore you can't reference it to build `b`. At best you can do
def method(self):
a = 'hello'

return dict (
a = a,
b = a * 4,
c = b * 3,
)

I have always hated that pattern (as do some other pythonistas) and would generally return locals()
def method(self):
a = 'hello'
b = a * 4
c = b * 3

return locals()

But what if you want to convert your context to JSON? At the very least you have the `self` controller instance in your context. This doesn't convert very well to JSON literals. You then have to filter keys from your context dict. It's supposed to be the Python Way to be explicit ( ie declaring your context vars rather than using locals() or horrible _getframe hacks to try and remove some repetition )

Beautiful is better than ugly.
Explicit is better than implicit.

To me, being explicit in Python (in this case) breaks the first commandment; it is fugly. Contrastingly, the compactness of the @ruby way, a trifling detail, allows a quite sweet pattern. Just prefix any variables you want in the context with a @. I appreciate it not just from a purely visual sense of aesthetics; practically it would be easier to work with.

How on earth is using @ here less readable or uglier on a whole? Could it be better? maybe .

Anyway, `It's all subjective`, `Beauty is in the eye of the beholder` and `Ugliness is to the bone` and other cliches.

Sunday, July 19, 2009

NodeSelect (or Hi SilverPond!)

I haven't blogged in a month or so as generally I prefer actually coding to blogging about the coding I am doing. So many ideas! So little time! I have noticed quick screen-casts illustrate concepts much more effectively and efficiently. Therefore when communicating about projects with fellow programmers I'll generally just whip up a quick cast.

So what's been up? Amongst other fun projects, lately I have been working on creating some xml/xhtml helpers for my editor. It's still in prototype stage but the basic concept is proving to be quite promising. As I'm away on a trip I'll just list the commands and link to a few screencasts recorded earlier for the moment.

This quick blog post expressly for the purpose of showing a potential employer some of my `pet projects`.




Commands

  • pathSelect CSS
  • pathSelect XPath
  • collapseNode
  • prettyPrintNode
  • commentNode
  • tidySelection
  • selectInsideTag
  • moveToNodeEnd
  • selectElementName

ScreenCasts

Links

Friday, May 29, 2009

Sprox

An interesting CRUD library based on sqlalchemy / formencode / toscawidgets. It actually generates the html. Looks quite handy if you wanted fully generated forms.

Sprox

I'm not sure if it can do nested model forms or if it can that the functionality is decoupled from the html generation and available for use. It seems like the author has put a lot of thought into it. I will definitely investigate it and ToscaWidgets further. I haven't really had much experience with auto-generated forms. I get the feeling that they are hard to customize but it's likely worth auto-generating when possible.

There doesn't seem to be m?any projects focused on (semi or fully) automatically creating list/pagination filter forms. This is quite commonly required in admin sections of a site.

I can't even remember if Django does.

sqaencode 0.1

I have been messing around a bit and now pretty much have a working lib for encoding nested models into forms and back and automatically generating formencode Schema from sqlalchemy models.

E:\python_repos\sqaencode>nosetests --with-coverage --cover-package=sqaencode
................
Name Stmts Exec Cover Missing
----------------------------------------------------
sqaencode 6 6 100%
sqaencode.constants 10 10 100%
sqaencode.decode 47 45 95% 126, 161
sqaencode.encode 28 26 92% 90-91
sqaencode.util 12 12 100%
sqaencode.validators 46 42 91% 30, 64, 66, 129
----------------------------------------------------
TOTAL 149 141 94%
----------------------------------------------------------------------
Ran 16 tests in 0.962s

OK

I created a ModelSchema class, subclassing formencode.Schema and giving it an inline __metaclass__ inheriting from formencode.declarative.DeclarativeMeta. I over-rode the __repr__ to output an (almost) eval()able representation.

class ModelSchema(Schema):
class __metaclass__(DeclarativeMeta):
def __repr__(cls):
model = cls.__model__.__name__
base = [ "class %(model)sSchema(model_schema(%(model)s)):" % dict (
model = model) ]

for arg in SCHEMA_ARGS:
base.append(' %-20s = %r' % (arg, getattr(cls, arg)))

base.append('')
for key, validator in sorted(cls.fields.items()):
if key.startswith('_'): continue

args = non_default_validator_args(validator)
base.append(' %-20s = %s(%s)' % (
key,
type(validator).__name__, args) )

return '\n'.join(base)

Using the model_schema factory to create a ModelSchema and then getting a repr:
In [1]: from sqaencode import model_schema

In [2]: model_schema(Product)
Out[2]:
class ProductSchema(model_schema(Product)):
ignore_key_missing = True
allow_extra_fields = True
pre_validators = []

active = Bool()
amount = UnicodeString()
colour = UnicodeString()
description = UnicodeString()
featured = Bool()
id = Int()
image = UnicodeString()
image_thumb = UnicodeString()
image_zoom = UnicodeString()
keywords = UnicodeString()
material = UnicodeString()
name = UnicodeString()
ordernum = Int()
sku = UnicodeString()
views = Int()

The __metaclass__ __repr__ hack serves a dual purpose. a) for debugging and b) as templating. By printing the repr you can use that as a template for customizing a model schema. You actually inherit from a dynamically generated class ( a function call taking a model and optional arguments). I wasn't even too sure you could do that in python. It's nice I don't have to create a new meta class mechanism and can stick with existing formencode semantics.

Note the `ignore_key_missing` flag that is by default set to True. I think when using this I will just define which fields to validate purely by virtue of what is included in the html. eg If there is no `sku` field in the form then it will not be validated.

What if I *didn't* want to globally ignore missing keys and wanted to manually declare which to ignore? Formencode has an in-built sub-classing mechanism whereby if you declare `some_key = None` then some_key will not be validated at all.

To declare a Product model_schema with plural Colors inline:
class ProductSchema(model_schema(Product, nested=True)):
colors = model_schema(Color, plural=True)

With Python 2.6 you can do inline customisable declarations of relations:
class ProductSchema(model_schema(Product, nested=True)):
@sqaencode.plural
class colors(model_schema(Color)):
some_field = NonDefault()


I'm thinking about creating a mechanism whereby I subclass sqlalchemy.types.* for metadata purposes to further drive automatic schema generation. A cool thing about Django's tight integration is the high level data types. Url, Email etc You declare higher level properties to what are essentially stored as VARCHAR types in the database. It's not *just* a string.

SqlAlchemy, while really great, (rightly) doesn't try and abstract beyond basic SQL. There is nothing stopping an end user however doing something like this:
1  class Url(Unicode):   pass
2 class Email(Unicode): pass
3
4 higher_level_table = Table ( 'higher_level', metadata,
5 Column(
'url', Url(32)),
6 Column(
'email', Email(32)),
7 Column(
'id', Integer(), primary_key=True, autoincrement=True, nullable=False),
8 )
9

If Url and Email were imported column types from sqaencode.types then you could add them to the sqalchemy => formencode type mapping and they would be picked up by model_schema()

What's on the todo?
  • Options for automatically generating child Schemas.
  • Setting the length on String/UnicodeString validators automatically to the max length of the corresponding column.
  • Automatically create unique column validators

Wednesday, May 27, 2009

Reading

A few days ago I scribbled an entry into my TODO.txt:
Learn about Python packaging, namespaces etc, best practices. setup.py etc etc
I started reading a lot and watching whatever screen casts I could. One useful book was Expert Python Programming. It is light on actual python and heavy on the soft skills in using open source python tools to refine your work flow. That is exactly what I was looking for. It is only about 300 pages so I skimmed through most of it in a day.

The Agile Development Tools in Python series by Christopher Perkins, while pretty lighton gave some nice quick overviews of some very useful tools.

Between these two sources I got a good overview of:
  • setuptools
  • distutils
  • virtualenv
  • paster
  • nosetest
  • coverage
  • docutils
  • reST
  • sphinx

virtualenv


virtualenv will set up an isolated version of python, where applications can remain blissfuly ignorant of the rapid change in the outside world. If you have an application that needs a particular version of a lib and another requiring a different one then this is the tool for the job.

It will set up a folder structure with (std) Lib and site-packages folders and a scripts directory. One of the scripts is of course the virtual python interpreter. It also has a tremendously useful `activate` script. This will put the Scripts (bin on *nix) folder on PATH and the lib folders on PYTHONPATH. It also modifies your console prompt to display the name of the current virtualenv.
(myenv) C:\myenv

I haven't used it much as of yet but I can imagine that it could get pretty confusing once you had a tonne of them so that is a nice and thoughtful addition.

The net result being that if you `activate` an environment and then run `python some_python.py` from an arbitrary directory it will run it using the virtualenv. Likewise for any of the scripts in your (*nix: bin, win: Scripts) folder. There is no need for explicitly referencing them. When you are done you just `deactivate` the environment with corresponding script.

setuptools is installed by default into each new virtualenv and easy_install is included in the scripts folder. You use easy_install to, funnily enough, easily install the packages you require.

virtualenv seems of great utility, however I imagine it could get pretty wasteful in terms of network usage. You don't really want to be downloading 10 packages EVERY time you start some new project. It would be handy if all of your local virtualenv environments used some sort of global local cache when using easy_install.

If you needed for example cherrypy and you already had the latest version on your hide drive some where it would just pull the egg from there. Otherwise it would first pull the package into the cache from pypi.

It would be good if this all happened transparently and each virtualenv's easy_install respected some global distutils.cfg setting for cache location. Where do eggs lay about usually? In a Nest. I'll probably happily find that this has been sorted out.

Even if you had a cache, saving yourself some bandwidth, if you just copied an egg into each environment that would still be disk wastage. This seems like a good use for symbolic linking. It would be silly to reinvent that in python. Although, I'm not too sure how well Windows supports something like that. *nix is definitely superior in that respect.

paver


Another interesting util is one called paver that sort of ties together virtualenv and distutils/setuptools to allow a more `pythonic` zc.buildout. Buildout uses declarative ini files for everything and is apparently harder to hack if you want to do anything out of the ordinary. (I wouldn't know for sure if these are valid criticisms )

From what I have read of buildout vs paver/virtualenv I think I will invest in the latter. It just seems to appeal more to my personal sense of aesthetics.

paster


Paster is a hodgepodge macgyver swiss knife created by Ian Bicking. It can launch WSGI applications, launch missiles, sink ships and god knows what else. This utility, `eclectic` in the words of Ian, is basically the kitchen sink.

Of interest to me is the project template creation commands. It will create a folder with setup.py, README.txt etc and run you through a command line based wizard to set it all up, asking version number, author name and the like. These project templates can give a bit of an insight into how other people structure their projects.

Where do the tests go? Inside the distribution proper? Housed inside the same folder as the setup.py? I'm somewhat leaning towards having the tests included in the public package. ie package.tests.fixtures I think it makes it easy then to import fixtures in doctested examples documentation.

distutils / setuptools


With regard to structuring projects, if there is one thing I can hope to take away from all this, then it is the `setup.py develop` command.

This distutils (or is it setuptools?) command will put your under development package on sys.path so you don't have to run `setup.py install` every time you make changes. Quite handy. Of course you could have a virtualenv where you have a package in `develop` mode and another where you have ran a full `setup.py install` at a certain version.

nosetests


Nosetests has a really great coverage plugin. It will run your tests, and you can specify which package/module you want it to dump a list of line ranges that haven't been `covered` by your test code. This makes it super easy to tell which code paths still needs testing( or culling! ). I'm assuming this is only really of use for pure python code. If so, that is another tick next to dynamic `interpreted` languages.

sphinx


I also experimented with docutils/ reST / Sphinx for documentation. Sphinx is the spiffy new reST based documentation system developed originally for Python proper. It has a built-in indexer and javascript search client and seemingly many other great features. For this reason it seems a LOT of python projects are using it now. I imagine the fact that it's quite easy on the eye out of the box doesn't hurt take-up much either.

A new project


I started on creating a setup.py enabled, nose/doc tested, sphinx documented project. It *really* slows down the whole process. I suppose it's a lot to try and learn all at once.

I'm still not really sold on test first prototyping. TDD of something you have a little bit of experience with, sure. TDD/documenting something when you are in new terrain and you KNOW that you are going to mess up and have to redux anyway, just seems wasteful.

I probably just don't `get it` yet. `Slow is fast` has generally held true in a lot of other areas of my experience.

What else to read up on? I need to learn more about the standard library logging package.

Friday, May 22, 2009

Souvenir


prison color blue
it's a uniform of choice
count yourself lucky
that you don't write the software


Some lines from the great Neil Finn's song `Souvenir`. Count yourself lucky indeed.

FormEncode + SqlAlchemy = SQAEncode

The way I do admin model CRUD at the moment is have a models.py file containing all my sqlalchemy tables/classes and their respective formencode.Schema/s in a forms.py file. The problem is that I end up having to repeat all the fields and to me that just stinks.

It's not a big deal as far as typing them out goes as I will typically declare a table / model in sqlalchemy and then use a homebrewed scaffolding function to generate derived code for the Schema and html for the actual form. When actually editing a model form I use genshi's HTMLFormFiller filter to set the blank forms values.

This approach really is keeping in the spirit of the formencode library. The author's philosophy is that after a while configuring an `autogenerated` form from a model by declaring options in code is just as much work as declaring in html.

Configuration includes black/white listing certain fields, whether they are allowed to be empty (null/None) and also logical/presentational ordering and grouping of individual fields.

formalchemy on the other hand will take a model and automatically generate a form with the values already set. It can not though do nested models. ie A parent model with an arbitrary amount of inline children to CRUD in the same form. The author considers this `almost always bad design` It only allows what I term `relating` the parent/root model to existing models.

While it *may* sometimes save you from doing any `configuration` whatsoever in the case where the form validation schema reflects perfectly the model, in my (admittedly meagre) experience this would be fairly rare.

The authors of formalchemy recommend the use of CSS to customize the appearance of the auto generated forms. For more extensive customization you can override each fields renderer (<input type = 'text'> vs <textarea> etc) or the global form generation function.

Having not really used formalchemy that much I can't say with certainty just how unweildy it gets trying to customize a forms options / looks. I'd guess that you would end up having to do just as much `configuration`, ie repeating of fields / `work`

Looking at a lot of my formencode Schemas in relation to their underlying model, it seems they are declared in a `white list` manner. The schemas (and html) simply don't declare fields that aren't `public`. The underlying type map is reasonably consistent. A sqlalchemy Bool becomes a formencode Bool, a Unicode maps to a UnicodeString etc. It's typically only the options that are changed.

I have an emybronic idea to auto generate formencode validators from an sqlalchemy model, mapping validators to column/relation types. I imagine `configuring` this would look something like the following.

1  class ProductSchema(ModelSchema): 
2 _model = Product
3
4 long_description = options(not_empty = True)
5 date_created = None
6 name = UniqueName()


The ModelSchema would use some type of __metaclass__. Any `private` field declared as None would be blacklisted from the underlying derived schema. Above `options` would just be an alias for dict. A dict would be used to configure the auto mapped validator. `not_empty = True` would override the not_empty argument mapped from the respective columns nullable property. UniqueName above would completely override from the UnicodeString. (In fact any column with unique = True could probably have an auto generated `unique field` validator attached)

I'm also currently working on some sqlalchemy formencode integration functions that will take basic one table models and their relations and encode them into nested dictionaries / primary key lists. Also, the other way round, taking a nested dict (as from a formencode.NestedVariables pre validated Schema) and creating/updating/deleting an object tree/graph.

SqlAlchemy mappers makes it quite easy to introspect classes for relations so the object_graph func signature just looks like:

1  object_graph(nd_dict, root_model)


This all works fine in basic unit test land AND for one model forms. The problem I'm having is ONETOMANY child objects in a form. Imagine an Invoice form with an inline InvoiceItems table with each row representing an individual item. On a new invoice there would be 3 `blank` rows. How would you know which ones to ignore? Which ones to delete?

I want to set a `_keep` flag for each item. The _keep flag will be reflected as a checkbox in the form and if unchecked will mean that the child item should be ignored in the validation process.

I could leave the child tables ( invoice items in the concrete sense) empty and use javascript to add a [+] button to add new children. That to me seems like a `bent over` concession.

The `object_graph` function which CRUDS model[s] from a nestable dict is currently decoupled from the validation process and knows nothing of form errors. This might make it hard. Before that I was successfully using `child_crud` hooks in my CRUD controllers for this purpose.

I'll have to sit down and work out which cases I'll need to accommodate. I want to be able to update/delete existing models, create new models and ignore cases where the child CRUD form partials have not been edited.

How to get the formencode.ForEach validator to ignore items without messing up the errors index? These problems were all solved using a child_crud hook in the controllers but trying to abstract and decouple things is making it a lot harder.

It will be a nut worth cracking anyway.

Monday, May 11, 2009

PyParsing + SqlAlchemy = Basic Search Engine

Today I learned about writing recursive descent parsers using the PyParsing library. I managed to cobble together an sqlalchemy expression builder for a basic search engine.

 1  #################################### IMPORTS ###################################
2

3 # PyParsing
4
from pyparsing import ( CaselessLiteral, Literal, Word, alphas, quotedString,
5 removeQuotes, operatorPrecedence, ParseException,
6 stringEnd, opAssoc )
7
8 # SqlAlchemy
9
from sqlalchemy import and_, not_, or_
10
11 ################################## LIKE ESCAPE #################################
12

13 LIKE_ESCAPE = r'\\'
14
15 def like_escape(s):
16 return '%' + ( s.replace('\\', '\\\\')
17 .replace('%', '\\%')
18 .replace('_', '\\_') ) + '%'
19
20 ############################### REUSABLE ACTIONS ###############################
21

22 class UnaryOperation(object):
23 def __init__(self, t):
24 self.op, self.a = t[0]
25
26 def __repr__(self):
27 return "%s:(%s)" % (self.name, str(self.a))
28
29 def express(self):
30 return self.operator[0](self.a.express())
31
32 class BinaryOperation(object):
33 def __init__(self, t):
34 self.op = t[0][1]
35 self.operands = t[0][0::2]
36
37 def __repr__(self):
38 return "%s:(%s)" % ( self.name,
39 ",".join(str(oper) for oper in self.operands) )
40
41 def express(self):
42 return self.operator[0](*( oper.express() for oper in self.operands ))
43
44 class SearchAnd(BinaryOperation):
45 name = 'AND'
46 operator = [and_]
47
48 class SearchOr(BinaryOperation):
49 name = 'OR'
50 operator = [or_]
51
52 class SearchNot(UnaryOperation):
53 name = 'NOT'
54 operator = [not_]
55
56 ############################### REUSABLE GRAMMARS ##############################
57

58 AND = CaselessLiteral("and") | Literal('+')
59 OR = CaselessLiteral("or") | Literal('|')
60 NOT = CaselessLiteral("not") | Literal('!')
61
62 searchTermMaster = (
63 Word(alphas) | quotedString.copy().setParseAction( removeQuotes ) )
64
65 ########################## THREAD SAFE PARSER FACTORY ##########################
66

67 def like_parser(model, fields=[]):
68 class SearchTerm(object):
69 def __init__(self, tokens):
70 self.term = tokens[0]
71
72 def express(self):
73 return or_ (
74
*( getattr(model, field).like( like_escape(self.term),
75
escape = LIKE_ESCAPE)
76
for field in fields )
77 )

78
79 def __repr__(self):
80 return self.term
81
82 searchTerm = searchTermMaster.copy().setParseAction(SearchTerm)
83
84 searchExpr = operatorPrecedence( searchTerm,
85 [ (NOT,
1, opAssoc.RIGHT, SearchNot),
86 (AND,
2, opAssoc.LEFT, SearchAnd),
87 (OR,
2, opAssoc.LEFT, SearchOr) ] )
88
89 return searchExpr + stringEnd
90
91 ########################### SEARCH FIELDS LIKE HELPER ##########################
92

93 def search_fields_like(s, model, fields):
94 if isinstance(fields, basestring): fields = [fields]
95 parser = like_parser(model, fields)
96 return parser.parseString(s)[0].express()
97
98 ################################################################################
99

Friday, April 3, 2009

ListFilter Prototype



Multiple space terminated regular expressions for list filtering.

The examples presented are somewhat hare-brained. a) Cause I'm hare-brained b) I'm really tired at the moment. Note the dramatic pauses :) Can.....you.....see....what....I'm.........do....ing You can imagine the usefulness though.

The filter is a prototype implementation of a filtering syntax for a QuickOpen Panel specifically designed for `quick` (try not to laugh) *multiple item* selection. The idea is to allow you to just keep typing in to refine your search. NOT this OR that. NOT that. `Open All In New Window` etc.

It is implemented using the editors actual text buffer API.

My editors current QuickOpen panel (while it does support multiple selection using ctrl-enter) works a lot like the FireFox url history search; space terminated tokens that each list item must match else will be filtered from the list. This works great for selecting one item but what if you want to open up multiple items at a time?

Plain regex searches are too unwieldy and lack the speed of entry desired.

The filtering is a regex extension of the current `all space terminated word chunks in any order`. It maintains most of the benefits and generally works as before for non regex characters. If you just need to type in some alphanumerics then it should be just as quick. It's essentially multiple regexes instead of multiple substring matches.

OPERATORS:

' ' AND operator
! NOT operator
| OR operator

Thursday, April 2, 2009

jQuery = $;



I have been making a foray into JavaScript recently for work and having had good experiences with `jQuery Lightbox` decided to use it for the basis of a job I was doing.

It is essentially a carpet gallery website with collections of colors (1:M). The designer wanted it to be `AJAX` ( a term that seems to have been hijacked to mean any page updated without slow browser refresh )

In the middle of the page is a large image. Above it are next/prev collection links and tiny `swatches` (thumbnails) containing links to each color for the current selection.

To either side of the image are next/prev color in collection links and upon mouseover a window will appear showing a magnified area following the cursor which is changed to a crosshair. The cursor will change via css styling to `cursor:wait` whilst waiting for the zoom image to load.

Upon changing color (via swatch, next/prev collection/color ) an animated gif will show while the chosen color's image is loading.

At first I was keeping a counter of the current colors position in the collections array of colors (clicking on the little thumbnails would take the title attribute, slug it and use that for the color, updating the current position index by with an $.inArray(color, colors) )

The filenames to load was, and are, a function of current collection and color.

The problem was having the state in an internal counter didn't really work for `open in new tab` or for sending links. "Hey check out this carpet... no the red one... Did you type in that url properly? Just paste it in."

I did some googling for `ajax urls` and stumbled upon a technique that sounded useful. That being polling the hash location for changes and then setting page state as a function of the hash. They said the ideal `polling` rate was 100ms. Sounded pretty hacky but at least the urls worked.

I searched for and found a plugin for jQuery that allows you to set event handlers for when the window hash changes. It uses polling but is responsive (42 ms) and works on IE 6.

I therefore just set a callback to update all the links and the image upon hash changing. I split the hash on '--' to find the current collection and color

The great thing about it is that the event is fired on page load so it goes through the `change color` routine. Updates all the links, shows the loading animated gif etc.

You can send links to people, open in next tab from any of the links etc

jQuery and its plugins made everything really straight forward. The only head scratching was getting the magnifying glass to work. None of the plugins worked out of the box for images that changed src. They worked mainly for a `static` gallery.

CTags 2: TDD

This is part two of CTags. See part 1

Test Driven Development

Adherents religiously write tests first for *every* function they write.

Personally, I think a tonne of unit tests while prototyping is a waste of time for `simple` stuff with only one programmer working. It tends to feel like walking in mud. I prefer higher level tests that while they may not pinpoint the exact cause of failure won't double refactoring efforts. Especially when prototyping. Spend ages testing that `The Wrong Way` works? No thanks... Prototype, then rewrite with tests.

Sometimes though, `TDD` really is indispensable while exploring. Especially when working on stuff that is pushing the limits of your understanding. If I start encountering bugs I usually see it as a sign I need to start writing tests. Rather than using transient print statements to debug I'll write some unit tests.

The TagFile class below (now commented) is an example of when I found testing while prototyping invaluable.

  1  #################################### IMPORTS ###################################
2

3 from __future__ import with_statement
4
5 import os
6 import bisect
7 import mmap
8 import unittest
9
10 ################################### CONSTANTS ##################################
11

12
13 """
14 The tags in a `tags` file are listed one per line formatted as so:
15
16 tag_name<TAB>file_name<TAB>ex_cmd;"<TAB>extension_fields
17
18 """

19
20 # symbolic constants for column indexes
21
SYMBOL = 0
22 FILENAME = 1
23
24 ################################################################################
25

26 class TagFile(object):
27 def __init__(self, p, column):
28 """
29
30 Instantiate a new TagFile
31
32 @p path to `tags` file
33 @column which column to read
34
35 """

36
37 self.p = p
38 self.column = column
39
40 def __getitem__(self, index):
41 "self.fh is the mmap opened by get"
42 # Seek to a certain byte index
43
self.fh.seek(index)
44
45 # The position is likely to be halfway through a line so read up to
46
# the first new line and throw away the `junk`
47
self.fh.readline()
48
49 # Note that it's actually returning the column from the line *after* the
50
# line region containing the index.
51

52 return self.fh.readline().split('\t')[self.column]
53
54 def __len__(self):
55 # bisect.bisect_left search must know how large the file is
56
return os.stat(self.p).st_size
57
58 def get(self, *tags):
59 """
60
61 Get all lines for one or more tags
62
63 """

64
65 with open(self.p, 'r+') as fh:
66 # mmap is not needed but delivers performance increase
67
self.fh = mmap.mmap(fh.fileno(), 0)
68
69 for tag in tags:
70 # As __getitem__ returns the colum from the line region *after*
71
# that containing the index pt then bisect( alias of
72
# bisect_right ) will give the wrong index.
73

74 b4 = bisect.bisect_left(self, tag)
75
76 # Move the file to the position found
77
fh.seek(b4)
78
79 # Iterate over file. There may be more than one tag line to get
80
# per symbol/filename
81
for l in fh:
82 # Compare search vs line at column
83
comp = cmp(l.split('\t')[self.column], tag)
84
85 # This handles the case of being `left of left` due to
86
# __getitem__ index being left of symbol it returns
87
# ie wait until catch up
88
if comp == -1: continue
89 # If line is greater then have moved on to next symbol
90
elif comp: break
91
92 # Found tag!
93
yield l
94
95 # close mmap
96
self.fh.close()
97
98 ##################################### TESTS ####################################
99

100 class CTagsTest(unittest.TestCase):
101 def test_tags_files(self):
102 """
103
104 This test basically iterates over each line in the tags file creating
105 a list of lines for each unique symbol it finds. It then compares this
106 list to that returned by the TagFile binary search.
107
108 """

109
110 # Successfully passed test on 10MB+ tags file
111
# tags = r"C:\Python25\Lib\tags"
112

113 tags = 'tags'
114 tag_file = TagFile(tags, SYMBOL)
115
116 with open(tags, 'r') as fh:
117 latest = ''
118 lines = []
119
120 for l in fh:
121 symbol = l.split('\t')[SYMBOL]
122
123 if symbol != latest:
124
125 if latest:
126 tags = list(tag_file.get(latest))
127 self.assertEqual(lines, tags)
128
129 lines = []
130
131 latest = symbol
132
133 lines += [l]
134
135 if __name__ == '__main__':
136 unittest.main()
137
138 ################################################################################

Sunday, March 29, 2009

CTags 1: Captain Bisect And His Rag (c)Tag Crew

CTags

Ctags generates an index (or tag) file of language objects found in source files
that allows these items to be quickly and easily located by a text editor or
other utility. A tag signifies a language object for which an index entry is
available (or, alternatively, the index entry created for that object).

Python has a philosophy of `Batteries Included` (and `Designer Straight Jacket`)
and its standard library has many useful modules to allow you to zoom out and
fly. Say what? What pixie dust have you have been snorting!?

The higher level you are the higher level you can go. If your stuck in the
trenches of detail it's hard to see emergent patterns to exploit which can help
you simplify code. Refactoring and simplifying is generally an iterative process
of simplifying, getting a higher level view and simplifying again.

Taking a bottom up approach in this helps as you know what building blocks you
need to create along the way. Don't want to be *too* simplistic. How can you do
things `top down` if you aren't `on top`?
Maybe you have flown over many times already and are just implementing an old
innovation from a time honoured map. That's exactly the spirit of prototyping.
Sending in the scouts to survey the terrain and report back.

However, those troops get mighty mutinous really quick if you don't take the
time to at least make a spot before you send em out like a swarm of jellyfish
(especially when they are being fired at)

Strategy? Fly over, send in the scouts and maintain communication.

What the hell is all that got to do with CTags? Not much I admit. But we does
love to babble.

Python std libraries are your `A Team` of special operatives that can do the
work you tell em without constant supervision and no need to worry about the
messy details. Go nuke PHP! "Yes Sir!"

One of these crackshot ninjas is Captain `bisect` with his special move
`bisect_left`.

He's a great leader because he's absolutely fastidious in keeping his company of
troops in perfect sorted order at all times.

Some of his duties involves training new recruits and he'll do a trick where
upon meeting them all for the first time he will get them to silently line up in
alphabetical order while he has his back turned. (we'll say alphabetical by
name, but well, these guys *are* macho army men so..)

>>> scum = "omewEDjyFapAdxslfhbgBcnCtkqzuvri"

>>> len(scum) == 32
True

>>> men = ''.join(sorted(scum))
>>> men
'ABCDEFabcdefghijklmnopqrstuvwxyz'
( Isn't it funny how the ones that try and make them selves seem big are in fact the smallest? )

" I bet I can find the position in the line of any one of you maggots after at most 5 guesses "

A, Yeah right, he thinks. "Find q sir"

bisect, "man 16 are you a lesser man than q?" k, "Yes"
bisect, "man 24 are you a lesser man than q?" s, "No"
bisect, "man 20 are you a lesser man than q?" o, "Yes"
bisect, "man 22 are you a lesser man than q?" q, "No" (smiles)
bisect, "man 21 are you a lesser man than q?" p, "Yes"

bisect, "man 22 you are q!"

troops, "Bravo Sir!"

( bisect is a a bit of a kookoo and uses 0 based indexing. Lucky the scum neverseem to get confused )

How does he do it!?

He starts in the middle (32 / 2 = 16) and compares what he was searching for with what he finds there. What he finds, k, is less than q so he knows he can rule out all other men before k as they too would be less than q.

This only worked for him because all the men are in order.

He then subdivides again. The man at position 16, k, is less than q so his area will start at
17 and extend in the opposite direction ending (as before) at 32 (the greatest man). He then looks for his next midpoint with an eye to ruling out another half of the remaining men.

( He repeats this simple process until his start point is no longer less than the
end point )

((17 + 32) // 2 = 24)
men[24] (s) is not lt q so his end point becomes 24

((17 + 24) // 2 = 20)
men[20] (o) is less than q so his start point becomes 21

((21 + 24) // 2 = 22)
men[22] (q) is not less than q :) so his end point becomes 22

((21 + 22 // 2 = 21))
men[21] (p) is less than q so his start point becomes 22
Start is 22 and end is 22 so he has a match!

With 64 men (double 32) he would only take 1 more guess; As each guess rules
out half and half is the inversion of double.

What about 128 (or some arbitrary number?) How many times can you half 128
before you get one (the right `one`).

Or in other words what to the power of two makes 128?

32 log 2 = 5
64 log 2 = 6
128 log 2 = 7

import math
math.log(128, 2)
Captain bisect's talent scales exceedingly well and in fact he's ready to put
forth his talent to whatever use come up with.
import bisect
bisect.bisect_left(some_sequence, search)

WOAH What a verbose explanation! Too much detail! Couldn't you just have said
use `bisect.bisect_left` to index left most occurence of any item in a sequence.

Yeah, that's kind of the point. Python is chock full of handy high level
abstractions you can trust to do the job without worrying about the details.
You are already zooming.

"Ok you admit your are babbling but what the heck is this got to do with
CTags?"

To paraphrase, Ctags generates an index (in sorted order) of symbols to
be quickly and easily located. Sounds like a job for Captain bisect. He is
actually made of this stern stuff called `C` that makes him faster than anything
you could genetically engineer in your laboratories.

Python also has this useful thing going called duck typing. bisect will work
on any class that exposes a __getitem__ method.

But what if you wanted Captain bisect to search a 50 MB ctags file? You can't
sub class a file object... can you? In any case each index would just return
a character wouldn't it?

A sneak preview of how to use bisect and mmap to binary search CTags `tags` files. Explanation to follow.

 1  #################################### IMPORTS ###################################
2

3 from __future__ import with_statement
4
5 import os
6 import bisect
7 import mmap
8 import unittest
9
10 ################################### CONSTANTS ##################################
11

12 # CSV Column in tag file
13
SYMBOL = 0
14 FILENAME = 1
15
16 ################################################################################
17

18 class TagFile(object):
19 def __init__(self, p, column):
20 self.p = p
21 self.column = column
22
23 def __getitem__(self, index):
24 self.fh.seek(index)
25 self.fh.readline()
26 return self.fh.readline().split('\t')[self.column]
27
28 def __len__(self):
29 return os.stat(self.p).st_size
30
31 def get(self, *tags):
32 with open(self.p, 'r+') as fh:
33 self.fh = mmap.mmap(fh.fileno(), 0)
34
35 for tag in tags:
36 b4 = bisect.bisect_left(self, tag)
37 fh.seek(b4)
38
39 for l in fh:
40 comp = cmp(l.split('\t')[self.column], tag)
41
42 if comp == -1: continue
43 elif comp: break
44
45 yield l
46
47 self.fh.close()
48
49 ##################################### TESTS ####################################
50

51 class CTagsTest(unittest.TestCase):
52 def test_tags_files(self):
53 tags = r"tags"
54 tag_file = TagFile(tags, SYMBOL)
55
56 with open(tags, 'r') as fh:
57 latest = ''
58 lines = []
59
60 for l in fh:
61 symbol = l.split('\t')[SYMBOL]
62
63 if symbol != latest:
64
65 if latest:
66 tags = list(tag_file.get(latest))
67 self.assertEqual(lines, tags)
68
69 lines = []
70
71 latest = symbol
72
73 lines += [l]
74
75 if __name__ == '__main__':
76 unittest.main()
77
78 ################################################################################