Using PostGIS with SQLAlchemy

The development version of the Trip Planner uses a Postgres/PostGIS backend (instead of MySQL) SQLAlchemy as the ORM (instead of raw SQL), and PCL for Python geometry types (instead of our own ugly hacked versions).

Question: How do you move geometries out of Postgres/PostGIS into PCL types via SQLAlchemy and vice versa?

Answer: Create a custom geometry column type.

Here’s our SQLAlchemy geometry type definition and subtypes (points, linestrings, and multilinestrings; adding other types should be trivial):

sqltypes.py

Look here for an example of using it:

tables.py

The type is pretty simple. The only tricky part was figuring out how the database stores the geometry. It’s in ASCII hex, so we use binascii.a2b_hex to get a binary representation and feed that to the PCL fromWKB factory. In the other direction we use binascii.b2a_hex on the WKB representation of the PCL geometry.

[8/7/09: Updated links to source code. Note: latest version uses Shapely, not PCL.]

Python Cartographic Library (PCL) — Installing Just the Spatial Package

So let’s say you need a spatial geometry library for Python. You could write your own; you could also use the PCL. The PCL includes some packages we don’t need, like one for MapServer rendering. I only installed the minimum needed to get the spatial package working (which I’ll talk about below).

[Note: "geometry" refers to points, lines, polygons and other geometric forms used to represent real-world objects. Examples: intersection (point), street (line), zip code boundary (polygon).]

I wrote a rudimentary geometry library for the trip planner that’s been working fine, but now I need to do some more “advanced” stuff related to using PostGIS and SQLAlchemy. In particular, I want to convert database values to Python objects and vice versa.

The first part (database to Python) is fairly easy and our current library already does that, but it’s convoluted in that it gets the database value as well-known text (WKT), parses that, and creates a Python object. From what I can tell, the PCL can go straight from well-known binary (WKB) geometry to Python objects.

The second part (Python to database) is harder because it involves converting a Python object to a binary geometry value. I don’t know anything about the binary geometry format and I don’t want to know, and it looks like with the PCL I don’t need to know.

I’m assuming PostGIS and PCL will get along together because they both rely on the same libraries, proj4 and geos. We’ll see.

The installation was fairly straightforward. The PCL includes five sub-packages. We had to install two of them, PCL-Referencing and PCL-Spatial. PCL-Referencing requires proj4. PCL-Spatial requires PCL-Referencing and geos >= 2.2.2. Something in there also requires the OGR library, which is included with GDAL.

The basic steps are, install proj, geos, and gdal, then install PCL-Referencing, and lastly install PCL-Spatial. On Ubuntu 6.06 (Dapper), here’s what I actually did:

  • Installed proj4 using apt-get
  • Installed libgdal using apt-get
  • Installed geos 2.2.3 from source into /usr/lib. I installed this over a slightly older version of geos installed using apt-get; hopefully that won’t cause any issues.
  • Checked out the PCL trunk:
    svn co http://svn.gispython.org/gispy/PCL/trunk PCL
  • Installed PCL-Referencing with the usual python setup.py install
  • Installed PCL-Spatial with the usual…

Test Driven Development, PostgreSQL, SQLAlchemy

I just tried out some o’ that new-fangle “Test Driven Development” (TDD) I’ve been hearing about. Yeah, it’s good stuff.

At the moment, I’m in the process of migrating GIS data from MySQL to PostgreSQL so we can take advantage of the PostGIS spatial extensions. I’ve also been making a bunch of related changes (AKA refactoring) in the “model”, separating things that never belonged together, and so forth.

I started out by rewriting the MySQL data import script. [Note: basically, the script pulls data out of a flat ESRI shapefile and normalizes it.] This wasn’t a complete, from-scratch rewrite–a lot of stuff I just copied over and tweaked a little bit. The biggest changes here were due to using SQLAlchemy instead of typing out SQL queries. I’ll just note that SQLAlchemy is “da bomb” and makes many things easier (once you get the hang of it).

That part was pretty straightforward, and low-level–I didn’t get into the ORM aspects of SQLAlchemy at all.

The next step was to modify the routine that creates adjacency matrices for routing. In the end, this was straightforward too. I ended up reusing some stuff from the new import script, which was cool. I refactored a lot during this process, adding some new modules and classes.

[Here's where we get to the TDD aspect.]

So, I was sitting there (here really) thinking, “Hmmm… what now?” I drew some diagrams with the new classes and associations…. OK, that’s fun…. “Wait, I know. Run the unit tests!” Doy!

I started with the address normalization service, since the other services both depend on it. The test suite for this service isn’t as comprehensive as it probably should be (there are 19 tests), but it proved to very useful for shaking out a bunch of bugs in all that refactoring. The tests also helped keep me focused, and that aspect might be more important than the bug-squashing aspect (maybe).

Today, address normalization. Tomorrow, geocoding.