Creating an open web of points of interest

Location data is everywhere. From huge government databases of geographic features to your pictures in Facebook, it seems like almost every piece of information around nowadays is tagged with its location. However, it still seems that no one is effectively sharing information, or building the smart, next-generation systems that will surely rely on data from multiple, linked information sources.

We in the geospatial profession believe that location is the great common denominator. It has the best potential to be the bridge between systems of related data sets. But how do we devise a simple way to describe places and relationships between them that will appeal to 85% of the developer community?

The W3C Points of Interest Working Group has been tackling this problem throughout 2011, and is nearing completion of a fairly final draft of a specification. You can see the work on the POI Wiki, and join the public mailing list by sending an email here. In this group, we’ve created a relatively simple data model, and expect that people write POI data in JSON, RDF and/or XML format. The jury is still out on which format will win.

What I hope a common format will do is allow everyone, from Yelp to Facebook to humanitarian organizations and event defense departments all over the world, to share basic location information about common places. I think this will not only strengthen the core business propositions of these groups, but even enhance them, freeing up time from the mundane, repetitive task of maintaining accurate locations and creating more time for real application enhancements.

I believe this effort is the most important activity in the geospatial field at the moment, and will be writing and coding heavily around POIs. Join me and make 2012 the year of the POI!

REST and GIS – references

In an email Sean Gillies called me out on the lack of good citations in my earlier post. I pleaded lack of sleep (notice how the number of URLs tails off quickly after the first 2 paragraphs!), but I’ll try to make up for it here. Instead of trying to cite all the individual posts, I’ll just say take a look at the relevant material in these blogs. You wouldn’t learn much from just reading a post or two anyway, so if you care, it’s worth the time to go through the lot.
Sean Gillies: import cartography
Charlie Savage: cfis
Chris Holmes: Into The Pudding

and then you’ll be properly prepared to enter the fray

REST and GIS

REST has been a hot topic this year in the geo world. There’s a discussion group, a geographic data server, many blog posts, and email discussions. I’ve been mulling over what this means to OGC over the last couple months, reading RESTful Web Services, and discussing with the various advocates around the community. After all this, I think I know what’s going on, but I don’t think there’s any one clear explanation (despite some nice pieces of the puzzle here and here) available, and there has certainly been little effort to analyze the REST architecture in relation to geographic information systems theory, so that’s what I’ll try to do now.

At its very core, a REST architecture is centered on gaining access to a generally indivisible piece of information, which is generally called a resource. A REST system expects resources to exist at standard, unchanging, URLs (everybody knows cool URLs don’t change). In the case of Atom–the most cited example–this piece of information is the <entry>. Sure there are other resource types, but their primary purpose is to help you find and understand the information resources.

Right away we have a culture clash when it comes to GIS (geographic information systems). This crowd has been trained in geography departments to think of the data set as the primary piece of information. A data set is often called a layer or coverage, and represents some real-world phenomena like vegetation or rainfall or census tracts. The points, lines, polygons or pixels that comprise the data set are of secondary important to the concept of the data set itself. In fact, it’s considered un-cool to think of the shapes (or pixels in the case of imagery) that make up the data set to be concrete objects at all. Rather they are useful proxies of the real objects of concern. For example, you can’t really capture rainfall in a database because it’s not practical to measure every drop of rain. We estimate and approximate and end up storing some model of rainfall in the database, but we don’t ever have the temerity to say we are capturing the one and only representation of rainfall.

Now a programmer might respond, “screw your ivory tower theoretical nonsense. You’ve got information in a computer, and you read and write to it, so keep your conceptual information models to yourself.” And in large part I agree with this viewpoint, but there are some places where we’ll run into trouble. For example, most of the REST material I’ve read treats images as indivisible, binary resources. This won’t work in the geo field, where images are in the gigabyte and terabyte range and you’ve got to design services that provide access to portions of those images. However, aside from imagery I don’t see much of a problem with using a resource-oriented approach to information access. In fact, in the sensor web arena I think a REST approach makes a ton of sense.

UPATE: Sean is convincing me that imagery isn’t a tough problem for RESTies, even if I don’t know what “empirical orthogonal function decomposition” is. Is this related to what I know as imagery formats using wavelet compression?

Whew. I’ve spent a lot of time talking about resources, but if you don’t get the basics right, everything else is meaningless. Now I’ll move on to the other basic, which is accessing resources through “standard” URLs, and additionally, taking full advantage of the HTTP headers available in URL requests. I won’t get too deep into the technology here, because others do that better, but I will say that to me it seems the big point being made is that URLs work in a lot of places–Web browsers, email, all programming environments, etc. Don’t use anything more than URLs unless you really have to. And if you think you really have to, think again and again.

The other piece of the URL issue is using HTTP headers to do things like set expiration dates on information, say some resource is unavailable at the moment, or unavailable forever because it’s been deleted, and lots of other things that HTTP header geeks know about. Now some might say that the arcane wizardry of HTTP header usage is just as complex as the things REST advocates complain about, like SOAP headers and XML query languages. I agree, but the big difference is that the core infrastructure of the Internet–Web servers and routers for example–comes with built-in software to do useful things with HTTP headers, and if you don’t take advantage of them, you are losing the opportunity to get an amazing amount of useful functionality at a much lower cost than paying programmers to build it into your application. This includes all kinds of content description (MIME types), error handling, caching and load balancing to name a few. I don’t see any problem with this principle. It aligns perfectly with the OGC principle to leverage as much of the common Web infrastructure as possible before inventing new, geo-specific technologies. Even advocates of heavier-weight service-oriented architectures should be able to get behind this idea. So read the HTTP spec and use it. And once again, if you find yourself in a situation where you can’t use it, think again, and then once more…

To summarize, I talked at length about information resources, URLs, and HTTP. This is far from a comprehensive introduction to REST, but it hopefully a nice companion to the more general, programming oriented content available elsewhere. Next time I’ll talk about search and retrieval. And if I get enough positive feedback, I might even try to describe how this would all play out in building a collaborative spatial data infrastructure. I’ll be accepting votes in the form of drinks and/or cool ideas at the upcoming URISA and FOSS4G conferences.