Wednesday, February 22, 2012

Open Data

In his TED presentation, Tim Berners-Lee first talks about how he invented the World Wide Web 20-some years ago. He was frustrated with all his colleagues' different computer systems and software and data formats. There was so much unlocked potential. Linking documents on all these computer together. He asked us to put all our documents on the Web and we did. That went pretty far no?

Now Tim wants us to put data on the Web. There is still so much unlocked potential. He refers to another TED presentation by Hans Rosling, who uses boring data to combine it into something more interesting and presents it in interesting info graphics. The basis however is a large amount of data.

He proposes 3 rules for linking data: 
  1. http names not just used for documents but also used for things that are documents are about: people, places, numbers, etc. 
  2. when looking up such an http thing, important data in a standard format are fetched
  3. data have relationships; the other things it is related to are also given http names
The more they are linked the more powerful they become.

Christian Bizer, Freie Universit├Ąt Berlin, discovered that there are lots of interesting data in wikipedia. He developed some software to extract information from wikipedia and put it in a database--dbpedia. And the data are linked to other sets of data, "and so it starts to grow" ...

Diversity of data - Some examples
  • Government data. Barack Obama said he would make government data available on the Web. Important for transparency, and it also shows a lot about how USA ticks. 
  • Raw data now!
  • People hug their data, even if it was paid by tax payers. 
  • Scientific data. Curing cancer and Alzheimer's, and the worlds economic problems for example. Scientists who are going to solve these problems are having their data hidden on their computers. Scientists start to discover the power of sharing data. 
  • Social networking sites. Every time you do something, add a friend, like something. The network uses it. And the different social web sites do not link their data (JB: at the time of the presentation
  • Open street map. Everyone does their bit. 

And that is how open data are going to work according to Tim; everyone does their bit and if everyone does their thing, power will be huge. Data
This is a (US based) community of policy makers, technologists, and data owners, who want to get information from governments around the world to the people who need to make decisions every day. 
JB: And as any community of practice, it needs people with drive to keep it going. Which appears to be missing at the moment  .... 
Open Data Commons
As with other 'Open' topics, one thinking has gone into a legal framework for the sharing of data. And this has been pulled together under some commons licenses. There are three licenses: 

  1. Public domain for data
  2. Attributions for data
  3. Attribution and share alike for data
Sarah Perez, in the popular technology blog 'Read Write Web', summarises a long list of places on the Web where we can already find open data. For example: 
  • CKAN - A registry of open knowledge packages and projects. 
  • Infochimps - assembles and interconects raw data. 
  • Freebase - An user build and maintained shared database of the world's knowledge. 
The New York Times
This newspaper boasts they have maintained one of the most authoritative news vocabularies ever over the past 150 years. And since 2009 they started to publish this vocabulary as linked data. The Times uses approximately 30,000 tags to power their topics pages, and all these tags will be published. 
JB: And I have yet to find an example that shows what this means ... anyone?
Linked data
This is about using the Web to connect related data, or to lower barriers to linking data that are using different methods. Basically the result of Tim Berners-Lee's vision for linking data. 

JB: It is obvious to me that I have entered realms that are further and further away from my bed (as we say in the Netherlands). The concepts in this topic are very far away from my experience and knowledge base. So on one hand, a lot to learn, on the other hand, very hard to grasp. Let's see what I can pick up in this course :) 

1 comment:

  1. Hello friends,

    Open data is the idea that certain data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. Thanks a lot......

    Extract Website Content