data hub – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Data Business Models https://blog.scraperwiki.com/2013/02/data-business-models/ Wed, 27 Feb 2013 09:46:39 +0000 http://blog.scraperwiki.com/?p=758217603 If it sometimes feels like the data business is full of buzzwords and hipster technical jargon, then that’s probably because it is. But don’t panic! I’ve been at loads of hip and non-hip data talks here and there and, buzzwords aside, I’ve come across four actual categories of data business model in this hip data ecosystem. Here they are:

  1. Big storage for big people
  2. Money in, insight out: Vertically integrated data analysis
  3. Internal data analysis on an organization’s own data
  4. Quantitative finance

1) Big storage for big people

This is mostly Hadoop. For example,

  • Teradata
  • Hortonworks
  • MapR
  • Cloudera

Some people are using NoHadoop. (I just invented this word.)

  • Datastax (Cassandra)
  • Couchbase (Couch but not the original Couch)
  • 10gen (Mongo)

Either way, these companies sell consulting, training, hosting, proprietary special features &c. to big businesses with shit tons of data.

2) Money in, insight out: Vertically integrated data analysis

Several companies package data collection, analysis and presentation into one integrated service. I think this is pretty close to “research”. One example is AIMIA, which manages the Nectar card scheme; as a small part of this, they analyze the data that they collect and present ideas to clients. Many producers of hip data tools also provide hip data consulting, so they too fall into this category.

Data hubs

Some companies produce suites of tools that approach this vertical integration; when you use these tools, you still have to look at the data yourself, but it is made much easier. This approaches the ‘data hubs’ that Francis likes talking about.

Lots of advertising, web and social media analytics tools fall into this category. You just configure your accounts, let data accumulate, and look at the flashy dashboard. You still have to put some thought into it, but the collection, analysis and presentation are all streamlined and integrated and thus easier for people who wouldn’t otherwise do this themselves.

Tools like Tableau, ScraperWiki, RStudio (combined with its tangential R services) also fall into this category. You still have to do your analysis, but they let you do all of your analysis in one place, and connections between that place, your data sources and your presentatino media are easy. Well that’s the idea at least.

3) Internal data analysis

Places with lots of data have internal people do something with them. Any company that’s making money must have something like this. The mainstream companies might call these people “business analysts”, and they might do all their work in Excel. The hip companies are doing “data science” with open source software before it gets cool. And the New York City government has a team that just analyzes New York data to make the various government services more efficient. For the current discussion, I see these as similar sorts of people.

I was pondering distinguishing between analysis that affects businessy decisions from models that get written into software. Since I’m just categorising business models and these things could both be produced by the same guy working inside a company with lots of data, I chose not to distinguish between them.

4) Quantitative finance

Quantitative finance is special in that the data analysis is very close to a product in itself. The conclusion of analysis or algorithm is: “Make these trades when that happens.” Rather than “If you market to these people, you might sell more products.”

This has some interesting implications. For one thing, you could have a whole company doing quantative finance. On a similar note, I suspect that analyses can be more complicated because the analyses might only need to be conveyed to people with quantitative literacy; in the other categories, it might be more important to convey insights to non-technical managers.

The end

Pretend that I made some insightful, conclusionary conclusion in this sentence. And then get back to your number crunching.

]]>
758217603
Constructing the Open Data Landscape https://blog.scraperwiki.com/2011/09/constructing-the-open-data-landscape/ https://blog.scraperwiki.com/2011/09/constructing-the-open-data-landscape/#comments Wed, 07 Sep 2011 11:01:38 +0000 http://blog.scraperwiki.com/?p=758215331 In an article in today’s Telegraph regarding Francis Maude’s Public Data Corporation, Michael Cross asks: “What makes the state think it can be at the cutting edge of the knowledge economy“. He writes in terms of market and business share, giving the example of the satnav market worth over $100bn a year yet it’s based on free data from the US Government’s GPS system.

He credits the internet revolution for transforming public sector data into ‘cashable proposition’. We, along with many other start-ups, foundations and civic coding groups, are part of this ‘geeky world’ of Open Data. So we’d like to add our piece concerning the Open Data movement.

Michael has the right to ask this question because there is this constant custodial battle being fought every day, every scrape and every script on the web for the rights to data. So let me tell you about the geeks’ take on Open Data.

[vimeo http://www.vimeo.com/21711338]

The idea(l) behind Open Data is to create sustainable Open Data projects with purpose. This has been championed in the last couple of years by civic data projects such as MySociety, Open Knowledge Foundation, Code for America, Open AustraliaOpen Development Cambodia is following me on twitter! Older, more established organizations are also being converted to the Open Data ethos. For instance, The World Bank is one major organization turning to Open Data in a big way.

However, much of the public sector data published so far has been pretty much useless. Governments, finally, are beginning to realize that data has little value unless people understand its context and provenance. They are beginning to see that opening up their data can reduce the cost and responsibility of getting it to the end point user, as the Open Declaration on European Public Services clearly says: “The needs of today’s society are too complex to be met by government alone”.

The key to a sustainable Open Data landscape lies not in the organisational heads of government bodies but in the provenance of the data they release and the ways in which it is released. The goal should be to gain the 5 stars of open linked data. For this to be achieved the data needs to be pared down to its raw ingredients. In a research paper entitled “Open Data, Open Society” (see end of post) Marco Fioretti explains:

Public data are really useful only when they are raw, really open and linked … only when data are published online in that way every citizen or organization will be able to automatically analyze and present them in easy to understand forms

This is where ScraperWiki really excels in terms of opening up data. Not only is our data open and accessible through various processes (csv, database, API), even the extraction process is open in the form of a code wiki. In terms of data, we are rawer than raw. If government ordered an open data steak they would order rare, data hubs would order raw, ours would be mooing!

We’re providing some of the heavy machinery needed to construct the Open Data landscape. What it will look like very much depends on the civic cyber-community getting involved. A leader in this community is Chris Taggart, creator of OpenlyLocal and OpenCorporates, and a prolific ScraperWiki user. So I Skyped him to see what he makes of the state thinking it can be at the cutting edge of the knowledge economy:

Speaking of the linked economy, do check out all the links in this post and all the media included here is under Creative Commons license.

If you are interested in getting more involved in the Open Data scene check out the Open Knowledge Foundation.

Open Data, Open Society

]]>
https://blog.scraperwiki.com/2011/09/constructing-the-open-data-landscape/feed/ 1 758215331