Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.


Sharing in 6 dimensions

Hands up everyone who’s ever used Google Docs. Okay, hands down. Have you ever noticed how many different ways there are to ‘share’ a document with someone else?

We have. We use Google Docs at lot internally to store and edit company documents. And we’ve always been baffled by how many steps there are to sharing a document with another person.

There’s a big blue “Share” button which leads to the modal windows above. You can pick individual collaborators (inside your organisation, or out, with Google accounts or without), send email invites, or share a link directly to the document. Each user can be given a different level of editing capability (read-only, read-and-comment, read-and-edit, read-edit-delete). But wait, there’s more!! Clicking the effectively invisible “Change…” link takes you to a second screen, where you can choose between five different privacy/visibility levels, and three different levels of editing capability, for users other than the ones you just picked.

Google Docs publish to web

Or you can side-step the blue button entirely and use File > Publish to the web… which creates a sort of non-editable, non-standard UI for the document, which can either be live or frozen in the current document state, and can either require visitors to have a Google account or not.

Google Docs sidebar

Or, you can save the document into a shared folder, whereby it’ll seemingly randomly appear in the specified collaborators’ Google Drive interfaces, usually hidden under some unexpected sub-directory in the main sidebar. And, of course, shared folders themselves have five different privacy levels, although it’s not entirely clear whether these privacy settings override the settings of any documents within, or whether Google does some magic additive/subtractive/multiplicative divination in cases of conflict.

Finally, if everything’s just getting too much—and frankly I can’t blame you—you can copy and paste the horrifically long URL from your browser’s URL bar into an email and just hope for the best.


What’s this got to do with ScraperWiki?

ScraperWiki is a data hub. It’s a place to write code that does stuff with data: to store that data, keep it up to date, document its provenance, and share it with the world. ScraperWiki Classic took that last part very literally. It was a ScraperWiki – everything was out in the open, and most of it was editable by everyone.

ScraperWiki Classic privacy settings

Over time, a few more sharing options were added to ScraperWiki Classic – first protected scrapers in 2011 and then private vaults in 2012. We’d started on the slippery road to Google-level complexity, but it was thankfully still only one set of checkboxes.

When we started rethinking ScraperWiki from the ground up in the second half of 2012, we approached it from the opposite angle. What if everything was private by default, and sharing was opt-in?

Truth is, beyond the world of civic hacktivism and open government, a lot of data is sensitive. People expect it to be private, until they actively share it with others. Even journalists and open data freaks usually want to keep their stories and datasets under wraps until it’s time for the big reveal.

So sharing on the new ScraperWiki was managed not at the dataset level, but the data hub level. Each of our beta testers got one personal data hub. All their data was saved into it, and it was all entirely private. But, if they liked, they could invite other users into their data hub – just like you might give your neighbour a key to your house when you go on holiday. These trusted people could then switch into your data hub much like Github users can switch into an “organisation” context, or Facebook users can use the site under the guise of a “page” they administer.

But once they were in your data hub, they could see and edit everything. It was a brutal concept of “sharing”. Literally sharing everything or nothing. For months we tested ScraperWiki with just those two options. It worked remarkably well. But we’ve noticed it doesn’t cover how data’s used in the real world. People expect to share data in a number of ways. And that number, in fact, is 6.

Sharing in 6 dimensions

Thanks to detailed feedback from our alpha and beta users, about what data they currently share, and how they expect to do it on ScraperWiki, it turns out your average data hub user expects to share…

  1. Their datasets and/or related visualisations…
  2. Individually, or as one single package…
  3. Live, or frozen at a specific point in time…
  4. Read-only, or read-write…
  5. With specific other users, or with anybody who knows the URL, or with all registered users, or with the general public…
  6. By sharing a URL, or by sharing via a user interface, or by publishing it in an iframe on their blog, or by publicly listing it on the data hub website.

Just take a minute and read those again. Which ones do you recognise? And do you even notice you’re making the decision?

Either way, it suddenly becomes clear how Google Docs got so damn complicated. Everyone expects a different sharing workflow, and Google Docs has to cater to them all.

Google isn’t alone. If you ask me, the sorry state of data sharing is up there with identity management and trust as one of the biggest things holding back the development of the web. I’ve seen Facebook users paralysed with confusion over whether their most recent status update was “public” or not (whatever “public” means these days). And I’ve written bank statement scrapers that I simply don’t trust to run anywhere other than my own computer, because my bank, in its wisdom, gives me one username and password for everything, because, hey, that’s sooo much more secure than letting me share or even just get hold of my data via an API.

Sharing in 2 dimensions

As far as ScraperWiki’s concerned, we’ve struck what we think is a sensible balance: You can either share everything in a single datahub, read-write, with another registered ScraperWiki user; or you can use individual tools (like the “Open your data” tool and, soon, the “Plot a graph” tool) to share live or static visualisations of your data, read-only, with the whole world. They’re the two most common use-cases for sharing data: share everything with someone I trust so we can work on it together, or share this one thing, read-only, on my blog so anonymous visitors can view it.

The next few months will tell us how sensible a division that is – especially as more and more tools are developed, each with a specific concept of what it means to “share” their output. In the meantime, if you’ve got any thoughts about how you’d like to share your content on ScraperWiki, comment below, or email zarino@scraperwiki.com. The future of sharing is in your hands.

Tags: , ,

4 Responses to “Sharing in 6 dimensions”

  1. vandeput July 24, 2013 at 4:44 am #

    Hi Zarino, so if I scheduled the scraper at 9am daily, when the data I shared using “Open your data” updated? ASAP or delayed?

    • Matthew Hughes July 24, 2013 at 11:23 am #

      Hi Irvan!
      I’m not Zarino, but I work at ScraperWiki and I might be able to help. As I understand it, it doesn’t do this yet! I’ve been informed that it generates it once, and never again. I’ve created a ticket for you on Github. https://github.com/scraperwiki/open-your-data/issues/2 Please comment on the ticket and let us know what you’d expect to happen.


      • vandeput July 24, 2013 at 1:03 pm #

        Thanks Matt for responding and opening the ticket, I have commented there 🙂

        Irvan Putra.

  2. Julian Todd August 14, 2013 at 12:14 pm #

    Here! Here! to this. Ever since the unix file permission system, access authority has always been mangled. Don’t offer combinations of dimensions based on orthogonality where no use case is justified. The issue has always been getting people to use their write access when they should, not preventing people from having write access. I’d share everything (except the bank details) with anyone I would trust with my house keys — which is quite a few people.

    Though one thing about “sharing everything with someone you trust”: you could have a time-out on this which you set in advance. Nothing is permanent, and this would be a good way of cleaning up the cruft of what should often be temporary access for specific things.

    In the longer term, if you are recording access times, you might spot patterns that I tend to give access to an acquaintance to get some data in relation to something we were talking about down the pub, and after they have got it during the next week they never visit again — although they are now marked for permanent access, which I have to consciously decrement later, with the concern that I might cause offense if the status is conflated with friending. So I should just give them access for a week, which times out.

    Maybe even the access is for a certain amount of clicks or files, so it’s like: “Sure, I got a copy of that, here’s the key to my office, it’s on top of the filing cabinet.” I can get on with having my coffee, so I don’t need to lead him there up the elevator and down the hallway and physically show it to him, which takes up my precious time. Though I might not be impressed if I get back and I find he’s been rifling through all the desk drawers, the coat pockets, the cabinets, and so forth.

    What we have here is total access for polite people. And politeness does follow a particular pattern that a computer can recognize. Things don’t get moved around that shouldn’t. Maybe a note left on my desk when I get back about something he happened to spot.

    We do commit messages on code check-ins. Maybe if they change something or move stuff around, they are forced to write me a little note. Can we have little notes prior to reading? You start surfing around through a set of documents that aren’t yours. After ten seconds of reading one, the modal dialog pops up with the question: “Hey, what are you doing here?” Fill in the answer that gets parsed for legibility, it gets saved for later and you can carry on looking.

We're hiring!