Extract Line Chart Data
A repo that shows how to automatically extract the data of a line chart. Mainly a wrapper around LineFormer and ChartDete.
Installation
You need a modal.com account to run this repo out of the box. Sign up here. Deploy the relevant functions by running: chmod +x deploy.sh && ./deploy.sh
If you'd like to see a "modal-free" version of this, ping me.
Usage
All images in the folder input will be processed.
Add your images to the input folder. Run the data extraction using: modal run plextract/main.py Download the processed files using modal volume get plextract-vol <run_id> . The run id is a uuid and can be found in the console log.
How It Works
The pipeline works as follows:
Use ChartDete to detect chart elements, most importantly axis labels and the plot area. OCR the numbers from the labels. Extract the coordinates of the lines in the line chart using LineFormer. Correct the coordinates of the lines to be relative to the plot origin. Calculate the conversion from pixels to axis values. Convert the coordinates using the conversion parameter from step before.
Example
Input
Output
This chart was generated using matplotlib using the extracted data ( example/data.json )
Resources
Contact
If you need help setting this up or would just like to use it, shoot me an email: mail@timonschneider.de