1 Introduction
In the last articles I have intensely dealt with the topic ETL.
An introduction to this topic is worthwhile for the following reasons:
- Modularity - better coding
- Flexibility
- Easier for other data scientists to read the code
- Easier error avoidance
- Automation
- …
2 Roadmap for ETL
At the beginning of the series of lectures I showed basically how to call .py files, which are located in different directories, in a notebook.
You can find this post her:
Furthermore I have designed the following variants of pipelines. Each can be developed further as desired. Here is the corresponding legend in advance:
2.1 “Simple Pipeline”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Simple Pipeline”
2.2 “Pipeline with join”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join”
2.3 “Pipeline with join2”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join2”
2.4 “Pipeline with intermediate storage”
You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with intermediate storage”