Back to contents
PHP
Python
Ruby
Choose a language:
An important part of scraping is turning string data into structured data. Two very common things this happens with are dates and times.
For more details, read the Python dateutil (parser) docs, and the datetime docs.
Parsing dates/times
The easiest way is to use a general purpose function that detects many common date formats, and converts them into a Python date object.
import dateutil.parser
print dateutil.parser.parse('21 June 2010').date() # 2010-06-21
print dateutil.parser.parse('10-Jul-1899').date() # 1899-07-10
print dateutil.parser.parse('01/01/01').date() # 2001-01-01
print dateutil.parser.parse('21 June 2010').date().__class__ # <type 'datetime.date'>
Or you can parse times as well, making a Python datetime object.
print dateutil.parser.parse('Tue 27 Sep 2011 00:25:48') # 2011-09-27 00:25:48
print dateutil.parser.parse('21 June 2010 6am').__class__ # <type 'datetime.datetime'>
Ambiguous cases
This sometimes goes wrong. For example, is this the 2nd March (US) or 3rd February (UK)?
print dateutil.parser.parse('3/2/1999').date() # 1999-03-02
You can fix it by giving dateutil various hints. Or if you really want control, use a completely explicit format string.
print dateutil.parser.parse('3/2/1999', dayfirst=True).date() # 1999-02-03
import datetime
print datetime.datetime.strptime('3/2/1999', '%d/%m/%Y').date() # 1999-02-03
Saving to the datestore
This is easy as pie. You just save either the Python date or datetime object, and ScraperWiki will convert it into the format SQLite needs.
import scraperwiki
birth_datetime = dateutil.parser.parse('1/2/1997 9pm')
data = {
'name':'stilton',
'birth_datetime' : birth_datetime,
'birth_date' : birth_datetime.date()
}
scraperwiki.sqlite.save(unique_keys=['name'], data=data)
Times are saved as UTC, as SQLite doesn't parse explicit timezones.
Querying dates
From the Web API for a scraper, you can do queries based on dates. See SQLite's date/time functions for more.
select * from swdata where birth_date < '2000-01-01'