Sunday, February 22, 2015

What Is Open Data? (part 1)

I'm fascinated by the movement known as Open Data and its cousin, Open Government. In the spirit of International Open Data Day, I thought I might contribute to the conversation by explaining some topics (as I understand them) for newcomers.

This post is an informal view from 10,000 feet, painted in broad strokes. A future post will address Open Data on Prince Edward Island, and my hope for a virtuous circle regarding software mentorship and civic engagement.

Disclaimer: I've monitored the Open Data space for some time, and I'm a software professional, but I'm not an expert on this. (I am passionate, however: an upcoming vacation is the 3rd IODC in Ottawa in May.) I'm also non-partisan. Though I hope to participate in a dialogue at the municipal and provincial level, I'm not endorsing any party or platform.

That said, let's go!

What is Open Data?

First, let's tackle data. It's so prevalent, it's tricky to define: data is just information (usually in files or a database). Data can be as mundane as noise-nuisance complaints in the UK, or as modern as disease reports by Ugandan volunteers (via text-messages) on banana crops.

The Open Data movement views public data as a resource that should be, well, open. That means it has these characteristics:
  • freely available to all
  • machine-readable format (e.g. simple text files)
  • explicitly licensed for any use (e.g. Creative Commons)
Some counter examples:
  • a bunch of PDF files isn't open, because nerds can't write programs to read the data
  • similarly, map data published in a format that requires expensive software isn't open
  • a set of simple text files is much better, but if it isn't licensed, then the data might be copyrighted (by default), which is a problem
The data for a given domain (e.g. crime reports for the last 7 days in Halifax) is called a data set. Data sets are published in a data catalogue as part of a larger website (often called a data portal). 

What's the big deal? What can anyone do with noise-nuisance reports?

A data set may seem boring to you and me, but in the right hands, it might prevent us from getting sick.

It's a big deal for software developers, because we're always looking for new ways to improve our technical skills. But trivial exercises (e.g. calculating prime numbers) are boring; and writing a game is a massive undertaking (good luck trying to come up with something "new"). By contrast, the possibilities for data-oriented apps seem endless. Even better, nerds have discovered that people find these apps useful. That's rewarding.

Open Data is a big deal for app users, because we can effect change (or offer fresh perspective) in our real-life community. Consider Ben Wellington's story, as he helped fix a confusing parking space in NYC.

This is a big deal for citizens, for many reasons, including accountability & transparency of public officials. We can have a debate about the role of government (less vs more and all that) in society, but no matter where you reside on the political spectrum, who argues against shining sunlight on data? (You can already see the close relationship between Open Data and Open Government).

All of these reasons culminate in a symbiotic relationship among the actors, and explain the push to "liberate the data".

Aren't there privacy concerns? What about my health records?

Good questions, but don't worry. The Open Data movement isn't about creating an anarchist society where there is no privacy. Not all data should be open, and thoughtful practitioners take measures to ensure that user-privacy isn't compromised.

So, that's good news on the movement at large. As for any given project, it's an important question. Those responsible for a recent data release by the Governor of Florida are guilty of "data malpractice" (my term).

Is this cyber-communism?

Nope. Though the data should be freely available, there's a wide array of players selling software services in this space. Granted, there is a certain esprit du corps for writing open-source software, but it doesn't appear to be a requirement per se. In my view, this seems fair: if someone adds value by writing an app, they can reasonably charge for it, so long as there are no impediments to the data itself.

(Edit: Sameer Vasta makes excellent points in this episode of the Open Government Podcast. He and Richard Pietro offer weather and GPS data as examples: when the government opens the data for use, it enables competition and innovation in the marketplace. Far from communist, that's pure capitalism, if not libertarianism.)

Open Government, Open Source, etc

For now, I've decided to cover these topics in a future post, if there is interest.

If you're interested in chatting more about Open Data, please contact me at codetojoy @t gmail dot com


  1. ps. Note:

    (1) Calculating (large) prime numbers isn't trivial ! But software books often use examples such as "let's find the first 1000 primes" and many of us get bored.

    (2) There are several Creative Commons licenses, and some fit Open Data better than others.

  2. Another note.... On Twitter, Edafe Onerhime (@ekoner) commented that "just a minor point on machine readability - #opendata in PDF isn't desirable but it is still open". I'm a bit surprised by this, and will have to pursue it.