1 Introduction
In the last articles I have intensely dealt with the topic ETL.
An introduction to this topic is worthwhile for the following reasons:
- Modularity - better coding
- Flexibility
- Easier for other data scientists to read the code
- Easier error avoidance
- Automation
- …
2 Roadmap for ETL
At the beginning of the series of lectures I showed basically how to call .py files, which are located in different directories, in a notebook.
You can find this post her:
Furthermore I have designed the following variants of pipelines. Each can be developed further as desired. Here is the corresponding legend in advance:
2.1 “Simple Pipeline”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Simple Pipeline”
2.2 “Pipeline with join”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join”
2.3 “Pipeline with join2”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join2”
2.4 “Pipeline with intermediate storage”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with intermediate storage”
3 Conclusion
The object-oriented programming used in the creation of ETLs is extremely helpful in everyday coding. I therefore advise everyone to learn this kind of programming!
Here again clearly listed all links about ETL: