Learning data engineering? Build a pipeline locally.
1. Python to pull data from an API (e.g. Coincap)
2. Load data into a local Postgres container
3. Automate it with cron/task scheduler
Start small, build, improve, & repeat.
#data#dataengineering#pythonlearning#Python
When data to process is larger than memory, try to stream with python generators, before jumping to distributed systems!
#data#dataengineering#Python#pythonlearning#Generator
E.g. Stream a file(note () and not []), get diff between date cols
It can be overwhelming to start learning data engineering. I'd recommend starting with the basics of python, sql, UNIX commands, building a simple data project, update Github, Linkedin. Landing a DE job is 60% part learning and 40% marketing. See reply 👇🏽 for helpful links.
Are you looking for an end-to-end streaming tutorial or a project to understand the foundational skills required to build streaming pipelines? Then this post is for you.
We will use Apache Flink and Apache Kafka for stream processing and queuing.
startdataengineering.com/post/data-engi…#data
Data engineers often work with APIs, but most do not have clear documentation.
Knowing the standard REST API design helps make extracting data from them more straightforward.
Check out this article that goes over REST API design in detail: learn.microsoft.com/en-us/azure/ar…#data#API
Pulling data from an API is a common data engineering task. Here are a few tips to make your API data pipelines resilient. 🧵
1. Paginate: The dataset may be too large or the API server only sends a max of n rows.
#data#dataengineering#datapull#EL