Work Posts Résumé

Parsing Tax Descriptions

Tax Descriptions are extremely difficult to parse, let alone meaningfully understand, if you're not a surveyor or parcel specialist.

To that end I've spent quite a while trying to think of ways to make Tax Descriptions useful to the public.

Recently I discovered that the County's Public Works department had a publicly accessible FTP server with hundreds of scanned copies of documents referenced in Kitsap County Tax Descriptions.

Pattern Matching

These scanned documents were broken up into five folders (Plats, Short Plats, Large Lots, Condos, and Surveys) and their filenames followed a relatively regular pattern.

The first character in the document's filename is a shorthand for it's type. For example the Plat documents started with a 'p'.

The next six characters were broken up into 2, 3 digit groups. Where the first group was a Volume number and the second group was a Page number.

Finally the rest of the filename was '.tif'; the file type.

Once I knew where these files were kept and I'd worked how to parse their filenames it was relatively simple to write an extension for an existing app that indexed the contents of the FTP server and allowed you to search by Volume and Page number for a specific document type.

Because of the way I structured the FTP index pages, you can provide all three search parameters (Volume, Page, and Document Type) as a QueryString in a URL to link the user to a search for a specific scanned document.

Of course we still need to figure out what those search parameters are based on the text of a Tax Description.

This is a job for regular expressions or RegEx.

The workflow here was to execute a RegEx pattern against the Tax Description. Check if it found any matches and then use those matches to generate a link to the correct scanned document or move on to the next RegEx pattern.

To make sure this process worked correctly I walked through a list of weird Parcel's in Kitsap County that I keep around for testing scenarios like this.

Road Blocks

Since the formal requirements for a Tax Description are pretty lax I found seven different ways these three parameters were recorded in Tax Descriptions. Here's a parcel on Bainbridge Island where this method works.

Even then some of the Tax Descriptions don't contain a reference to related documents of any kind.

The biggest culprit seems to be the parcel's that were defined using a kind of vowel-less shorthand and like this parcel.

To some degree parsing these Tax Descriptions will always be a work in-progress. But I'm pleased with this first draft.