Hi! We've renamed ScraperWiki.
The product is now QuickCode and the company is The Sensible Code Company.


Getting all the hash tags, user mentions…

We’ve rolled out a change so you get more data when you use the Twitter search tool!

Multiple media entities

We’ve changed four columns. They used to all just randomly return one thing. Now they return all the things, separated by a space. The columns are:

  • hashtags now returns all of them with the hashes, e.g. #opendata #opendevelopment
  • user_mention has been renamed user_mentions, e.g. @tableau @tibco
  • media can now return multiple images and other things
  • url has been renamed urls and can return multiple links

We renamed two of the columns partly to reflect their new status, and partly because they now match the names in the Twitter API exactly.

What can you do with this new functionality?

We had a look at the numbers of media, hashtags, mentions and URLs for a collection of tweets on a popular hashtag (#kittens), using our favourite tool for this sort of work: Tableau. It requires a modicum of cunning to calculate the number of entries in a delimited list using Tableau functions. To count the numbers of entries in each field, we need to make a calculated field like this:

LEN([hashtags]) - LEN(REPLACE([hashtags],'#',''))

This is the calculation for hashtags, where I use # as a marker. You can do the same for mentions (using @ as the marker), and for URL and media use ‘http’ as a marker:

(float(LEN([urls]) - LEN(REPLACE([urls],'http','')))/4.0)

Hat-tip to Mark Jackson for that one.

For URLs and media we see that most tweets only contain one item, although for URLs there are posts with up to six identical URLs, presumably in an attempt to get search engine benefits. The behaviour for mentions and hashtags is more interesting. Hashtags top out at a maximum of 19 in a single tweet, every word has a hashtag.

The distribution is shown in the chart below, each tweet is represented by a thin horizontal bar, the length of the bar depends on the number of hashtags, the bars are sorted by size, so the longest bar at the top represents the maximum number of hashtags.


For mentions we see that most tweets only mention one or two other users at most:


Thanks to Mauro Migliarini for suggesting this change.

Tags: ,

2 Responses to “Getting all the hash tags, user mentions…”

  1. Micah June 9, 2014 at 6:41 pm #

    This is great stuff guys, and thanks for showing the Tableau calcs. Is there a straightforward way (Tableau calc?) to break each hashtag/mention/URL out as a separate data-point? So rather than do counts by tweet, do counts by hashtag/mention/URL?

    This would be helpful to surface the top hashtags in a collection, and the top @mention’ed users. Thanks!

  2. Ian Hopkinson June 13, 2014 at 3:40 pm #

    There isn’t a straightforward way to split out each hashtag etc from the combined field in Tableau. It’s pretty straightforward to do this using something like Python, there maybe a way of doing it in Excel.

We're hiring!