akalias: May 2009

Friday, May 29, 2009

Sprox

An interesting CRUD library based on sqlalchemy / formencode / toscawidgets. It actually generates the html. Looks quite handy if you wanted fully generated forms.

Sprox

I'm not sure if it can do nested model forms or if it can that the functionality is decoupled from the html generation and available for use. It seems like the author has put a lot of thought into it. I will definitely investigate it and ToscaWidgets further. I haven't really had much experience with auto-generated forms. I get the feeling that they are hard to customize but it's likely worth auto-generating when possible.

There doesn't seem to be m?any projects focused on (semi or fully) automatically creating list/pagination filter forms. This is quite commonly required in admin sections of a site.

I can't even remember if Django does.

sqaencode 0.1

I have been messing around a bit and now pretty much have a working lib for encoding nested models into forms and back and automatically generating formencode Schema from sqlalchemy models.

E:\python_repos\sqaencode>nosetests --with-coverage --cover-package=sqaencode
................
Name                   Stmts   Exec  Cover   Missing
----------------------------------------------------
sqaencode                  6      6   100%
sqaencode.constants       10     10   100%
sqaencode.decode          47     45    95%   126, 161
sqaencode.encode          28     26    92%   90-91
sqaencode.util            12     12   100%
sqaencode.validators      46     42    91%   30, 64, 66, 129
----------------------------------------------------
TOTAL                    149    141    94%
----------------------------------------------------------------------
Ran 16 tests in 0.962s

OK

I created a ModelSchema class, subclassing formencode.Schema and giving it an inline __metaclass__ inheriting from formencode.declarative.DeclarativeMeta. I over-rode the __repr__ to output an (almost) eval()able representation.

class ModelSchema(Schema):
    class __metaclass__(DeclarativeMeta):
        def __repr__(cls):
            model = cls.__model__.__name__
            base = [ "class %(model)sSchema(model_schema(%(model)s)):" % dict (
                              model = model)  ]

            for arg in SCHEMA_ARGS:
                base.append('    %-20s = %r' % (arg, getattr(cls, arg)))

            base.append('')
            for key, validator in sorted(cls.fields.items()):
                if key.startswith('_'): continue

                args = non_default_validator_args(validator)
                base.append('    %-20s = %s(%s)' % (
                    key, type(validator).__name__, args) )

            return '\n'.join(base)

Using the model_schema factory to create a ModelSchema and then getting a repr:

In [1]: from sqaencode import model_schema

In [2]: model_schema(Product)
Out[2]:
class ProductSchema(model_schema(Product)):
    ignore_key_missing   = True
    allow_extra_fields   = True
    pre_validators       = []

    active               = Bool()
    amount               = UnicodeString()
    colour               = UnicodeString()
    description          = UnicodeString()
    featured             = Bool()
    id                   = Int()
    image                = UnicodeString()
    image_thumb          = UnicodeString()
    image_zoom           = UnicodeString()
    keywords             = UnicodeString()
    material             = UnicodeString()
    name                 = UnicodeString()
    ordernum             = Int()
    sku                  = UnicodeString()
    views                = Int()

The __metaclass__ __repr__ hack serves a dual purpose. a) for debugging and b) as templating. By printing the repr you can use that as a template for customizing a model schema. You actually inherit from a dynamically generated class ( a function call taking a model and optional arguments). I wasn't even too sure you could do that in python. It's nice I don't have to create a new meta class mechanism and can stick with existing formencode semantics.

Note the `ignore_key_missing` flag that is by default set to True. I think when using this I will just define which fields to validate purely by virtue of what is included in the html. eg If there is no `sku` field in the form then it will not be validated.

What if I *didn't* want to globally ignore missing keys and wanted to manually declare which to ignore? Formencode has an in-built sub-classing mechanism whereby if you declare `some_key = None` then some_key will not be validated at all.

To declare a Product model_schema with plural Colors inline:

class ProductSchema(model_schema(Product, nested=True)):
    colors = model_schema(Color, plural=True)

With Python 2.6 you can do inline customisable declarations of relations:

class ProductSchema(model_schema(Product, nested=True)):
    @sqaencode.plural
    class colors(model_schema(Color)):
        some_field = NonDefault()

I'm thinking about creating a mechanism whereby I subclass sqlalchemy.types.* for metadata purposes to further drive automatic schema generation. A cool thing about Django's tight integration is the high level data types. Url, Email etc You declare higher level properties to what are essentially stored as VARCHAR types in the database. It's not *just* a string.

SqlAlchemy, while really great, (rightly) doesn't try and abstract beyond basic SQL. There is nothing stopping an end user however doing something like this:

1  class Url(Unicode):   pass
2  class Email(Unicode): pass
3  
4  higher_level_table = Table ( 'higher_level', metadata,
5      Column('url', Url(32)),
6      Column('email', Email(32)),
7      Column('id', Integer(), primary_key=True, autoincrement=True, nullable=False),
8  )
9

If Url and Email were imported column types from sqaencode.types then you could add them to the sqalchemy => formencode type mapping and they would be picked up by model_schema()

What's on the todo?

Options for automatically generating child Schemas.
Setting the length on String/UnicodeString validators automatically to the max length of the corresponding column.
Automatically create unique column validators

Wednesday, May 27, 2009

Reading

A few days ago I scribbled an entry into my TODO.txt:

Learn about Python packaging, namespaces etc, best practices. setup.py etc etc

I started reading a lot and watching whatever screen casts I could. One useful book was Expert Python Programming. It is light on actual python and heavy on the soft skills in using open source python tools to refine your work flow. That is exactly what I was looking for. It is only about 300 pages so I skimmed through most of it in a day.

The Agile Development Tools in Python series by Christopher Perkins, while pretty lighton gave some nice quick overviews of some very useful tools.

Between these two sources I got a good overview of:

setuptools
distutils

virtualenv
paster

nosetest
coverage

docutils
reST
sphinx

virtualenv

virtualenv will set up an isolated version of python, where applications can remain blissfuly ignorant of the rapid change in the outside world. If you have an application that needs a particular version of a lib and another requiring a different one then this is the tool for the job.

It will set up a folder structure with (std) Lib and site-packages folders and a scripts directory. One of the scripts is of course the virtual python interpreter. It also has a tremendously useful `activate` script. This will put the Scripts (bin on *nix) folder on PATH and the lib folders on PYTHONPATH. It also modifies your console prompt to display the name of the current virtualenv.

(myenv) C:\myenv

I haven't used it much as of yet but I can imagine that it could get pretty confusing once you had a tonne of them so that is a nice and thoughtful addition.

The net result being that if you `activate` an environment and then run `python some_python.py` from an arbitrary directory it will run it using the virtualenv. Likewise for any of the scripts in your (*nix: bin, win: Scripts) folder. There is no need for explicitly referencing them. When you are done you just `deactivate` the environment with corresponding script.

setuptools is installed by default into each new virtualenv and easy_install is included in the scripts folder. You use easy_install to, funnily enough, easily install the packages you require.

virtualenv seems of great utility, however I imagine it could get pretty wasteful in terms of network usage. You don't really want to be downloading 10 packages EVERY time you start some new project. It would be handy if all of your local virtualenv environments used some sort of global local cache when using easy_install.

If you needed for example cherrypy and you already had the latest version on your hide drive some where it would just pull the egg from there. Otherwise it would first pull the package into the cache from pypi.

It would be good if this all happened transparently and each virtualenv's easy_install respected some global distutils.cfg setting for cache location. Where do eggs lay about usually? In a Nest. I'll probably happily find that this has been sorted out.

Even if you had a cache, saving yourself some bandwidth, if you just copied an egg into each environment that would still be disk wastage. This seems like a good use for symbolic linking. It would be silly to reinvent that in python. Although, I'm not too sure how well Windows supports something like that. *nix is definitely superior in that respect.

paver

Another interesting util is one called paver that sort of ties together virtualenv and distutils/setuptools to allow a more `pythonic` zc.buildout. Buildout uses declarative ini files for everything and is apparently harder to hack if you want to do anything out of the ordinary. (I wouldn't know for sure if these are valid criticisms )

From what I have read of buildout vs paver/virtualenv I think I will invest in the latter. It just seems to appeal more to my personal sense of aesthetics.

paster

Paster is a hodgepodge macgyver swiss knife created by Ian Bicking. It can launch WSGI applications, launch missiles, sink ships and god knows what else. This utility, `eclectic` in the words of Ian, is basically the kitchen sink.

Of interest to me is the project template creation commands. It will create a folder with setup.py, README.txt etc and run you through a command line based wizard to set it all up, asking version number, author name and the like. These project templates can give a bit of an insight into how other people structure their projects.

Where do the tests go? Inside the distribution proper? Housed inside the same folder as the setup.py? I'm somewhat leaning towards having the tests included in the public package. ie package.tests.fixtures I think it makes it easy then to import fixtures in doctested examples documentation.

distutils / setuptools

With regard to structuring projects, if there is one thing I can hope to take away from all this, then it is the `setup.py develop` command.

This distutils (or is it setuptools?) command will put your under development package on sys.path so you don't have to run `setup.py install` every time you make changes. Quite handy. Of course you could have a virtualenv where you have a package in `develop` mode and another where you have ran a full `setup.py install` at a certain version.

nosetests

Nosetests has a really great coverage plugin. It will run your tests, and you can specify which package/module you want it to dump a list of line ranges that haven't been `covered` by your test code. This makes it super easy to tell which code paths still needs testing( or culling! ). I'm assuming this is only really of use for pure python code. If so, that is another tick next to dynamic `interpreted` languages.

sphinx

I also experimented with docutils/ reST / Sphinx for documentation. Sphinx is the spiffy new reST based documentation system developed originally for Python proper. It has a built-in indexer and javascript search client and seemingly many other great features. For this reason it seems a LOT of python projects are using it now. I imagine the fact that it's quite easy on the eye out of the box doesn't hurt take-up much either.

A new project

I started on creating a setup.py enabled, nose/doc tested, sphinx documented project. It *really* slows down the whole process. I suppose it's a lot to try and learn all at once.

I'm still not really sold on test first prototyping. TDD of something you have a little bit of experience with, sure. TDD/documenting something when you are in new terrain and you KNOW that you are going to mess up and have to redux anyway, just seems wasteful.

I probably just don't `get it` yet. `Slow is fast` has generally held true in a lot of other areas of my experience.

What else to read up on? I need to learn more about the standard library logging package.

Friday, May 22, 2009

Souvenir

prison color blue
it's a uniform of choice
count yourself lucky
that you don't write the software

Some lines from the great Neil Finn's song `Souvenir`. Count yourself lucky indeed.

FormEncode + SqlAlchemy = SQAEncode

The way I do admin model CRUD at the moment is have a models.py file containing all my sqlalchemy tables/classes and their respective formencode.Schema/s in a forms.py file. The problem is that I end up having to repeat all the fields and to me that just stinks.

It's not a big deal as far as typing them out goes as I will typically declare a table / model in sqlalchemy and then use a homebrewed scaffolding function to generate derived code for the Schema and html for the actual form. When actually editing a model form I use genshi's HTMLFormFiller filter to set the blank forms values.

This approach really is keeping in the spirit of the formencode library. The author's philosophy is that after a while configuring an `autogenerated` form from a model by declaring options in code is just as much work as declaring in html.

Configuration includes black/white listing certain fields, whether they are allowed to be empty (null/None) and also logical/presentational ordering and grouping of individual fields.

formalchemy on the other hand will take a model and automatically generate a form with the values already set. It can not though do nested models. ie A parent model with an arbitrary amount of inline children to CRUD in the same form. The author considers this `almost always bad design` It only allows what I term `relating` the parent/root model to existing models.

While it *may* sometimes save you from doing any `configuration` whatsoever in the case where the form validation schema reflects perfectly the model, in my (admittedly meagre) experience this would be fairly rare.

The authors of formalchemy recommend the use of CSS to customize the appearance of the auto generated forms. For more extensive customization you can override each fields renderer (<input type = 'text'> vs <textarea> etc) or the global form generation function.

Having not really used formalchemy that much I can't say with certainty just how unweildy it gets trying to customize a forms options / looks. I'd guess that you would end up having to do just as much `configuration`, ie repeating of fields / `work`

Looking at a lot of my formencode Schemas in relation to their underlying model, it seems they are declared in a `white list` manner. The schemas (and html) simply don't declare fields that aren't `public`. The underlying type map is reasonably consistent. A sqlalchemy Bool becomes a formencode Bool, a Unicode maps to a UnicodeString etc. It's typically only the options that are changed.

I have an emybronic idea to auto generate formencode validators from an sqlalchemy model, mapping validators to column/relation types. I imagine `configuring` this would look something like the following.

1  class ProductSchema(ModelSchema): 
2      _model = Product
3      
4      long_description = options(not_empty = True)
5      date_created = None
6      name = UniqueName()

The ModelSchema would use some type of __metaclass__. Any `private` field declared as None would be blacklisted from the underlying derived schema. Above `options` would just be an alias for dict. A dict would be used to configure the auto mapped validator. `not_empty = True` would override the not_empty argument mapped from the respective columns nullable property. UniqueName above would completely override from the UnicodeString. (In fact any column with unique = True could probably have an auto generated `unique field` validator attached)

I'm also currently working on some sqlalchemy formencode integration functions that will take basic one table models and their relations and encode them into nested dictionaries / primary key lists. Also, the other way round, taking a nested dict (as from a formencode.NestedVariables pre validated Schema) and creating/updating/deleting an object tree/graph.

SqlAlchemy mappers makes it quite easy to introspect classes for relations so the object_graph func signature just looks like:

1  object_graph(nd_dict, root_model)

This all works fine in basic unit test land AND for one model forms. The problem I'm having is ONETOMANY child objects in a form. Imagine an Invoice form with an inline InvoiceItems table with each row representing an individual item. On a new invoice there would be 3 `blank` rows. How would you know which ones to ignore? Which ones to delete?

I want to set a `_keep` flag for each item. The _keep flag will be reflected as a checkbox in the form and if unchecked will mean that the child item should be ignored in the validation process.

I could leave the child tables ( invoice items in the concrete sense) empty and use javascript to add a [+] button to add new children. That to me seems like a `bent over` concession.

The `object_graph` function which CRUDS model[s] from a nestable dict is currently decoupled from the validation process and knows nothing of form errors. This might make it hard. Before that I was successfully using `child_crud` hooks in my CRUD controllers for this purpose.

I'll have to sit down and work out which cases I'll need to accommodate. I want to be able to update/delete existing models, create new models and ignore cases where the child CRUD form partials have not been edited.

How to get the formencode.ForEach validator to ignore items without messing up the errors index? These problems were all solved using a child_crud hook in the controllers but trying to abstract and decouple things is making it a lot harder.

It will be a nut worth cracking anyway.

Monday, May 11, 2009

PyParsing + SqlAlchemy = Basic Search Engine

Today I learned about writing recursive descent parsers using the PyParsing library. I managed to cobble together an sqlalchemy expression builder for a basic search engine.

 1  #################################### IMPORTS ###################################
 2  
 3  # PyParsing
 4  from pyparsing import ( CaselessLiteral, Literal, Word, alphas, quotedString,
 5                          removeQuotes, operatorPrecedence, ParseException,
 6                          stringEnd, opAssoc ) 
 7  
 8  # SqlAlchemy
 9  from sqlalchemy import and_, not_, or_
10  
11  ################################## LIKE ESCAPE #################################
12  
13  LIKE_ESCAPE = r'\\'
14  
15  def like_escape(s):
16      return '%' + ( s.replace('\\', '\\\\')
17                      .replace('%', '\\%')
18                      .replace('_', '\\_') ) + '%'
19  
20  ############################### REUSABLE ACTIONS ###############################
21  
22  class UnaryOperation(object):
23      def __init__(self, t):
24          self.op, self.a = t[0]
25  
26      def __repr__(self):
27          return "%s:(%s)" % (self.name, str(self.a))    
28  
29      def express(self):
30          return self.operator[0](self.a.express())
31  
32  class BinaryOperation(object):
33      def __init__(self, t):
34          self.op = t[0][1]
35          self.operands = t[0][0::2]
36  
37      def __repr__(self):
38          return "%s:(%s)" % ( self.name,  
39                               ",".join(str(oper) for oper in self.operands) )    
40  
41      def express(self):
42          return self.operator[0](*( oper.express() for oper in self.operands ))    
43  
44  class SearchAnd(BinaryOperation):
45      name = 'AND'
46      operator = [and_]
47  
48  class SearchOr(BinaryOperation):
49      name = 'OR'
50      operator = [or_]
51  
52  class SearchNot(UnaryOperation):
53      name = 'NOT'
54      operator = [not_]
55  
56  ############################### REUSABLE GRAMMARS ##############################
57  
58  AND = CaselessLiteral("and") | Literal('+')
59  OR  = CaselessLiteral("or")  | Literal('|')
60  NOT = CaselessLiteral("not") | Literal('!')
61  
62  searchTermMaster =  (
63      Word(alphas) | quotedString.copy().setParseAction( removeQuotes ) )
64  
65  ########################## THREAD SAFE PARSER FACTORY ##########################
66  
67  def like_parser(model, fields=[]):
68      class SearchTerm(object):
69          def __init__(self, tokens):
70              self.term = tokens[0]
71  
72          def express(self):
73              return or_ (
74                  *( getattr(model, field).like( like_escape(self.term),
75                                                 escape = LIKE_ESCAPE) 
76                     for field in fields )
77              )
78  
79          def __repr__(self):
80              return self.term
81      
82      searchTerm = searchTermMaster.copy().setParseAction(SearchTerm)
83  
84      searchExpr = operatorPrecedence( searchTerm,
85             [ (NOT, 1, opAssoc.RIGHT, SearchNot),
86               (AND, 2, opAssoc.LEFT,  SearchAnd),
87               (OR,  2, opAssoc.LEFT,  SearchOr) ] )
88  
89      return searchExpr + stringEnd
90  
91  ########################### SEARCH FIELDS LIKE HELPER ##########################
92  
93  def search_fields_like(s, model, fields):
94      if isinstance(fields, basestring): fields = [fields]
95      parser = like_parser(model, fields)
96      return parser.parseString(s)[0].express()
97  
98  ################################################################################
99

akalias