Suggested projects and useful advice collected from people at /r/dataengineering.
Suggestion by artfully_rearranged
<aside> <img src="/icons/info-alternate_gray.svg" alt="/icons/info-alternate_gray.svg" width="40px" /> Description:
This project aims to create a live-updating micro-service that extracts raw data from a public API, ingests, stores, transforms, and displays it, and finally, exports it for analysis.
You could display this project to potential employers as proof of your ability to work with APIs, handle data with Python and Pandas, perform ETL/ELT (Extract, Transform, Load/Extract, Load, Transform), work with cloud services, and analyze data.
The ongoing cost should be minimal, as you can use free tiers offered by cloud providers for most of the services involved. Along the way, you could also practice implementing best practices in programming and practice providing comprehensive documentation.
</aside>
<aside> <img src="/icons/checklist_gray.svg" alt="/icons/checklist_gray.svg" width="40px" />
</aside>
1. Choose a Public API
2. Set up a Cloud Environment
Costs are likely to be around $10/mo hosting or less, and it can all be done on free tiers.
3. Write a Python Microservice with Flask