PDFs were invented at the same time as the web. As “digital paper”, they’re trustworthy and don’t change behind your back.
This has a downside – often the definitive source of published data is a PDF. It’s hard to get tens of thousands of numbers out and into a spreadsheet or database. Copying and pasting is too slow, and popular conversion tools munge columns together.
At ScraperWiki, we’ve been helping people get the data back out of PDFs for nearly 5 years.
In that time we’ve developed an Artificial Intelligence algorithm. Just like your eyes, it can see the spacing between columns, picking out the structure of a table from its shape.
It’s called PDFTables.com.
This is the first self-service, web-based product designed for getting volumes of data from PDFs. It’s super fast to convert individual PDFs, and there’s a web API to automate more.
We’d love feedback – please contact us to let you know what you think.
Try our easy web interface over at PDFTables.com!