Posts by tag
We use GitHub issues to keep track of all issues. Please do not report bugs or issues in this blog’s comments. Instead, post them on GitHub as an issue. Before submitting a comment with an issue, please use GitHub search to look for existing issues (both open and closed) that may be similar.…
How to Create a Workflow in Apache Airflow to Track Disease Outbreaks in India
What is the first thing that comes to your mind upon hearing the word ‘Airflow’? Data engineering, right? For good reason, I suppose. You are likely to find Airflow mentioned in every other blog post that talks about data engineering. Apache Airflow is a workflow management platform. To oversimplify, you…
A Day in the Life of a SocialCops Data Analyst
I did not have a traditional college experience. While studying at Grinnell College, I spent a year living in Pune, Delhi and Mumbai, where I completed two internships, learned Hindi, traveled the country, and studied Economics at St. Stephen’s College. This year inspired me to leave my home in Kentucky…
PDF Is Evil: Extracting Tabular Data From PDFs
Update: As this blog explains, getting data out of PDFs is a nightmare, even with tools like PDFTables and Tabula. To solve this problem, we created and released Camelot, an open-source Python library and command-line tool that makes it easy for anyone to extract data tables trapped inside PDF files. Read…