Twitter Scraper Python Library

I wanted to save the tweets from Transparency Camp. This prompted me to turn Anna‘s basic Twitter scraper into a library. Here’s how you use it.

Import it. (It only works on ScraperWiki, unfortunately.)

from scraperwiki import swimport
search = swimport('twitter_search').search

Then search for terms.

search(['picnic #tcamp12', 'from:TCampDC', '@TCampDC', '#tcamp12', '#viphack'])

A separate search will be run on each of these phrases. That’s it.

A more complete search

Searching for #tcamp12 and #viphack didn’t get me all of the tweets because I waited like a week to do this. In order to get a more complete list of the tweets, I looked at the tweets returned from that first search; I searched for tweets referencing the users who had tweeted those tweets.

from scraperwiki.sqlite import save, select
from time import sleep

# Search by user to get some more
users = [row['from_user'] + ' tcamp12' for row in 
select('distinct from_user from swdata where from_user where user > "%s"' 
% get_var('previous_from_user', ''))]

for user in users:
    search([user], num_pages = 2)
    save_var('previous_from_user', user)
    sleep(2)

By default, the search function retrieves 15 pages of results, which is the maximum. In order to save some time, I limited this second phase of searching to two pages, or 200 results; I doubted that there would be more than 200 relevant results mentioning a particular user.

The full script also counts how many tweets were made by each user.

Library

Remember, this is a library, so you can easily reuse it in your own scripts, like Max Richman did.

3 Responses to “Twitter Scraper Python Library”

paulbradshaw November 9, 2012 at 8:50 pm #

Is there any way to grab tweets by a particular user?
- Zarino Zappia December 3, 2012 at 10:32 am #
  
  Hi Paul! Yes, you should be able to supply “from:…” as a search parameter, as shown in the example on this page.

Trackbacks/Pingbacks

MediaShift Idea Lab . A Look Back at News Hack Day SF | PBS - July 20, 2012
[…] People always ask me how to save tweets, so I also showed off the function that I discussed in an earlier post on ScraperWiki. […]

ScraperWiki

Extract tables from PDFs and scrape the web

Blog