Auto-Discovering Open Geo Data Sets

For the Analyze Boston Open Data Challenge last Saturday at District Hall, I built a little script that auto-discovered datasets from Boston’s Open Data Hub that are likely to have geographic coordinates or semi-spatial information, such as addresses, neighborhoods, zip codes, etc.

I exploited the CKAN developer API for retrieving metadata about the Hub’s resources, and ran some simple heuristics on the data set’s field names. The technique is not specific to the Boston data portal, so it should work just as well on any CKAN-powered site!

Here’s the web page.

Here’s the code.