Announcing PDFTables.com

PDFs were invented at the same time as the web. As “digital paper”, they’re trustworthy and don’t change behind your back.

This has a downside – often the definitive source of published data is a PDF. It’s hard to get tens of thousands of numbers out and into a spreadsheet or database. Copying and pasting is too slow, and popular conversion tools munge columns together.

At ScraperWiki, we’ve been helping people get the data back out of PDFs for nearly 5 years.

In that time we’ve developed an Artificial Intelligence algorithm. Just like your eyes, it can see the spacing between columns, picking out the structure of a table from its shape.

It’s called PDFTables.com.

This is the first self-service, web-based product designed for getting volumes of data from PDFs. It’s super fast to convert individual PDFs, and there’s a web API to automate more.

You can use it a couple of times without signing up, and then get 50 pages more for free. We charge per page, so you only pay for what you need.

We’d love feedback – please contact us to let you know what you think.

Got a PDF you want to get data from?
Try our easy web interface over at PDFTables.com!

Tags: pdf

ScraperWiki

Extract tables from PDFs and scrape the web

Blog

Announcing PDFTables.com