google – ScraperWiki https://blog.scraperwiki.com Extract tables from PDFs and scrape the web Tue, 09 Aug 2016 06:10:13 +0000 en-US hourly 1 https://wordpress.org/?v=4.6 58264007 Sharing in 6 dimensions https://blog.scraperwiki.com/2013/07/sharing/ https://blog.scraperwiki.com/2013/07/sharing/#comments Fri, 19 Jul 2013 16:36:10 +0000 http://blog.scraperwiki.com/?p=758218205 Hands up everyone who’s ever used Google Docs. Okay, hands down. Have you ever noticed how many different ways there are to ‘share’ a document with someone else?

We have. We use Google Docs at lot internally to store and edit company documents. And we’ve always been baffled by how many steps there are to sharing a document with another person.

There’s a big blue “Share” button which leads to the modal windows above. You can pick individual collaborators (inside your organisation, or out, with Google accounts or without), send email invites, or share a link directly to the document. Each user can be given a different level of editing capability (read-only, read-and-comment, read-and-edit, read-edit-delete). But wait, there’s more!! Clicking the effectively invisible “Change…” link takes you to a second screen, where you can choose between five different privacy/visibility levels, and three different levels of editing capability, for users other than the ones you just picked.

Google Docs publish to web

Or you can side-step the blue button entirely and use File > Publish to the web… which creates a sort of non-editable, non-standard UI for the document, which can either be live or frozen in the current document state, and can either require visitors to have a Google account or not.

Google Docs sidebar

Or, you can save the document into a shared folder, whereby it’ll seemingly randomly appear in the specified collaborators’ Google Drive interfaces, usually hidden under some unexpected sub-directory in the main sidebar. And, of course, shared folders themselves have five different privacy levels, although it’s not entirely clear whether these privacy settings override the settings of any documents within, or whether Google does some magic additive/subtractive/multiplicative divination in cases of conflict.

Finally, if everything’s just getting too much—and frankly I can’t blame you—you can copy and paste the horrifically long URL from your browser’s URL bar into an email and just hope for the best.

AAAAAAAARGHHHHHHH!!!

What’s this got to do with ScraperWiki?

ScraperWiki is a data hub. It’s a place to write code that does stuff with data: to store that data, keep it up to date, document its provenance, and share it with the world. ScraperWiki Classic took that last part very literally. It was a ScraperWiki – everything was out in the open, and most of it was editable by everyone.

ScraperWiki Classic privacy settings

Over time, a few more sharing options were added to ScraperWiki Classic – first protected scrapers in 2011 and then private vaults in 2012. We’d started on the slippery road to Google-level complexity, but it was thankfully still only one set of checkboxes.

When we started rethinking ScraperWiki from the ground up in the second half of 2012, we approached it from the opposite angle. What if everything was private by default, and sharing was opt-in?

Truth is, beyond the world of civic hacktivism and open government, a lot of data is sensitive. People expect it to be private, until they actively share it with others. Even journalists and open data freaks usually want to keep their stories and datasets under wraps until it’s time for the big reveal.

So sharing on the new ScraperWiki was managed not at the dataset level, but the data hub level. Each of our beta testers got one personal data hub. All their data was saved into it, and it was all entirely private. But, if they liked, they could invite other users into their data hub – just like you might give your neighbour a key to your house when you go on holiday. These trusted people could then switch into your data hub much like Github users can switch into an “organisation” context, or Facebook users can use the site under the guise of a “page” they administer.

But once they were in your data hub, they could see and edit everything. It was a brutal concept of “sharing”. Literally sharing everything or nothing. For months we tested ScraperWiki with just those two options. It worked remarkably well. But we’ve noticed it doesn’t cover how data’s used in the real world. People expect to share data in a number of ways. And that number, in fact, is 6.

Sharing in 6 dimensions

Thanks to detailed feedback from our alpha and beta users, about what data they currently share, and how they expect to do it on ScraperWiki, it turns out your average data hub user expects to share…

  1. Their datasets and/or related visualisations…
  2. Individually, or as one single package…
  3. Live, or frozen at a specific point in time…
  4. Read-only, or read-write…
  5. With specific other users, or with anybody who knows the URL, or with all registered users, or with the general public…
  6. By sharing a URL, or by sharing via a user interface, or by publishing it in an iframe on their blog, or by publicly listing it on the data hub website.

Just take a minute and read those again. Which ones do you recognise? And do you even notice you’re making the decision?

Either way, it suddenly becomes clear how Google Docs got so damn complicated. Everyone expects a different sharing workflow, and Google Docs has to cater to them all.

Google isn’t alone. If you ask me, the sorry state of data sharing is up there with identity management and trust as one of the biggest things holding back the development of the web. I’ve seen Facebook users paralysed with confusion over whether their most recent status update was “public” or not (whatever “public” means these days). And I’ve written bank statement scrapers that I simply don’t trust to run anywhere other than my own computer, because my bank, in its wisdom, gives me one username and password for everything, because, hey, that’s sooo much more secure than letting me share or even just get hold of my data via an API.

Sharing in 2 dimensions

As far as ScraperWiki’s concerned, we’ve struck what we think is a sensible balance: You can either share everything in a single datahub, read-write, with another registered ScraperWiki user; or you can use individual tools (like the “Open your data” tool and, soon, the “Plot a graph” tool) to share live or static visualisations of your data, read-only, with the whole world. They’re the two most common use-cases for sharing data: share everything with someone I trust so we can work on it together, or share this one thing, read-only, on my blog so anonymous visitors can view it.

The next few months will tell us how sensible a division that is – especially as more and more tools are developed, each with a specific concept of what it means to “share” their output. In the meantime, if you’ve got any thoughts about how you’d like to share your content on ScraperWiki, comment below, or email zarino@scraperwiki.com. The future of sharing is in your hands.

]]>
https://blog.scraperwiki.com/2013/07/sharing/feed/ 4 758218205
One-click ‘Open in Google Docs’ https://blog.scraperwiki.com/2010/10/one-click-google-docs-export/ Tue, 05 Oct 2010 09:28:34 +0000 http://blog.scraperwiki.com/?p=758213883 Data for data’s sake is no good, it’s what you do with it that matters. To make it easier for everyone, not just programmers, to reuse the data on ScraperWiki, you can now use the new experimental ‘Open in Google Docs’ feature.

Clicking the link at the top right of a scraper page will load the data for that scraper straight into a new public Google Spreadsheet that you can start using straight away!

Because of the way Google Docs works, there are a few limitations: Google will only allow datasets smaller than 1MB or a maximum of 400,000 ‘cells’ of data to be uploaded.

When the ScraperWiki dataset is over these limits, a subset of the data gets uploaded and a message is displayed at the top of the spreadsheet explaining what has happened, and including a link to download the full CSV file.

We thought this was a fair compromise – better than not having the ‘Open in Google Docs’ option on large datasets at all – since it is sometimes useful to see a subset to get a feel for the data.

Please let us know what you think, and if we can improve this feature in anyway.

]]>
758213883