The story of getting Twitter data and its “missing middle”

We’ve tried hard, but sadly we are not able to bring back our Twitter data tools.

Simply put, this is because Twitter have no route to market to sell low volume data for spreadsheet-style individual use.

It’s happened to similar services in the past, and even to blog post instructions.

There’s lots of confusion in the market about the exact rules, and why they happen. This blog post tries to explain them clearly!

How can you get Twitter data?

There are four broad ways.

1. Developers can use the API to get data for their own use. The rate limits are actually quite generous, much better than, say, Linked In. It’s an easy and powerful API to use.

There are two problems with this route – firstly to developers it sets the expectation that you can do whatever the API allows. You can’t, in practice you have to follow one of the routes below, or Twitter will shut down your application.

Secondly, it is unfair to non-programmers, who can’t get access to data which programmers easily can. More on that in the “why” section below.

2. Software companies can make an application which use the developer API.

As soon as it gets serious, they should join the Twitter Certified Program to make sure Twitter approve of the app. Ultimately, only Twitter can say whether or not their T&Cs are being met.

These applications can’t allow general data analysis and coding by their users – they have to have specific canned dashboards and queries. This doesn’t meet ScraperWiki’s original vision of empowering people to understand and work with data how they like.

3. Bulk Tweet data is available from Datasift and Gnip. This is called ‘the firehose’, and only includes Tweets (for other data you have to use the methods above).

Datasift is a fantastic product, which indexes the data for you and provides lots of other social media data. Gnip is now owned by Twitter, and is still in the process of blending into them – they’re based in Colorado, rather than San Francisco.

Both companies have to get the main part of Twitter to vet your exact use case. Your business has to be worth at least $3000 / month to them to make this worthwhile.

The actual cost of roughly 10 cents / 1000 Tweets is not too bad, lots of our customers could pay that. But few have the need to get 30 million Tweets a month! In lots of ways, this option is too powerful for most people.

4. Special programs. There are a few of these, for example the Library of Congress archive all Tweets and Twitter are running a pilot Twitter Data Grants program for academics.

These show that it is worth talking to and lobbying Twitter for new ways to get data.

Why do Twitter restrict data use?

The obvious, and I think incorrect, answer is “for commercial reasons”. These are the real reasons.

1. Protect privacy and stop malicious uses.

If you use the firehose via, say, Datasift you have to delete your copy of a Tweet as soon as a user deletes it. Similar rules apply if, for example, somebody makes their account private, or deletes their account. This is really really impressive – fantastic for user privacy. Part of the reason Twitter are so careful about vetting uses is to make sure this is followed.

Twitter also prevent uses which might harm their users in other ways. I don’t know any details, but I understand that they stop Governments gaining large volumes of Twitter data which could be used to do things like identify anonymous accounts by looking at patterns. I’m guessing this has come from Twitter’s rise to prominence during various ‘Twitter Revolutions‘, such as in Iran in 2009.

2. They’re a media company now.

Twitter has changed from its early days, it is now a media company, rather than a messaging bus. For example, the front page of their developer site is about Twitter Cards and embedding Tweets, with no mention of the data features. This means their focus is on a good consumer experience, and advertising, not finding new routes to market for data.

3. They’re missing bits of the market.

Every company can’t cover and do everything its ecosystem might want. In this case, we think Twitter are simply missing a chunk of the market, and could get more revenue from it.

While there are plenty of products letting you analyse Twitter data in specific ways, there is nothing if you want to use Excel, or other desktop tools like Tableau or Gephi.

For example, Tableau are partnered with Datasift, which from the outside might make it look like Tableau users are covered. Unfortunately, customers still have to have their use case vetted, and be prepared to spend at least $3000 / month. Also, the Tweets are metered rather than limited, making it awkward for junior staff to freely make use of the capability. It’s just too powerful and expensive for many use cases.

The kind of users in this “missing middle” don’t want to learn a new, limited data analysis interface. They want to use the simple, desktop data analysis products that we’re already familiar with. They also just want a file – they know how to keep track of files.

Conclusion

The ScraperWiki platform continues without Twitter data. You can accurately extract tables from PDFs and much more.

We know a lot about Twitter data, and have contacts with lots of parts of the ecosystem. If you have a high value use of the data, our professional services division are happy to help.

8 Responses to “The story of getting Twitter data and its “missing middle””

Andrew Fielding August 21, 2014 at 12:54 pm #

Thanks for trying guys.

You’ve hit the nail on the head here. I fall squarely in that middle box. I want “muck about with spreadsheets” and macros and formulae to wrangle information.

I used ScraperWIki to download all the followers of an account to make a PhotoMosaic. I could never justify the time to build code to download that dataset, but ScraperWiki made it easy and I could get on with USING the data.

I’m slowly learning coding, but the barrier of entry to twitter API is now too high – which means that I am losing interest in coding (call me old fashioned but I like to learn with something that is useful to me).

I will continue to use TweetReach.com to collect relevant tweets (but for how long?) but a list of an account’s followers? Now in the bucket marked “too hard”. Sad.

I just wish Twitter would stop trying to be like Facebook…
Andy Cotgreave August 21, 2014 at 2:44 pm #

Hello. I’m really sorry to hear you’ve not fixed the problem. This is a real shame. Sure, at Tableau I can and do use Datasift and it’s fantastically powerful. But you’re dead right in your blog – Tableau/Datasift is not a solution all our customers want to use. This is because Datasift has two costs: the financial one and the infrastructure one.

This is where Scraperwiki has always been a great alternative service for lighter-weight needs. I’m sorry I won’t be able to point people your way for the time being.

It does seem odd as all you are doing is doing the job of the Developer using the Free API.

🙁
Jamie Riddell August 22, 2014 at 2:18 pm #

I’m sorry to see ScraperWiki go, this game is a tough one.

Should you still need to download Twitter data to excel then please consider BirdSong. We can provide full exports of all Twitter followers for any public account and the most recent 3,000 tweets from that same account, to .csv.

We can provide this data for any Twitter account. We also have downloads available for Instagram and Facebook.

We aren’t free but charge per report with no monthly fees or lasting subscription.

http://www.birdsongdtt.com/2013/03/export-twitter-followers-with-birdsong-on-demand-reports/
Delfin J Paris August 29, 2014 at 5:55 am #

Bummer about the twitter data pull! I’m still trying to fix my famousfollowers.me site based on a hint you gave me about pulling the data. Sorry about getting suspended!
Francis Irving September 8, 2014 at 11:24 am #

Crimson Hexagon have a Twitter data programme for academics: http://www.crimsonhexagon.com/about-us/newsroom/press-releases/pr-social-research-grant_102513 (Thanks @EBBreese for pointing this out!)

Trackbacks/Pingbacks

Data Viz News [66] | Visual Loop - August 23, 2014
[…] The story of getting Twitter data and its “missing middle” | ScraperWiki […]
infographic. Data Viz News [66] - » Titel - August 25, 2014
[…] The story of getting Twitter data and its “missing middle” | ScraperWiki […]
API Måndag – öppna valdata från SVT Pejl, Twitter vs utvecklare & Öppna Helsingborg » Mashup.se - September 8, 2014
[…] så stora att de hade råd att köpa data av Twitter. Jag rekommenderar verkligen att ni läser ScraperWikis blogginlägg om detta, det ger god insyn i hur Twitter tänker om små […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog