1 Introduction

In the last articles I have intensely dealt with the topic ETL.

An introduction to this topic is worthwhile for the following reasons:

Modularity - better coding
Flexibility
Easier for other data scientists to read the code
Easier error avoidance
Automation
…

2 Roadmap for ETL

At the beginning of the series of lectures I showed basically how to call .py files, which are located in different directories, in a notebook.

You can find this post her:

“ETL - Read .py from different sources”

Furthermore I have designed the following variants of pipelines. Each can be developed further as desired. Here is the corresponding legend in advance:

2.1 “Simple Pipeline”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Simple Pipeline”

2.2 “Pipeline with join”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join”

2.3 “Pipeline with join2”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join2”

2.4 “Pipeline with intermediate storage”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with intermediate storage”

3 Conclusion

The object-oriented programming used in the creation of ETLs is extremely helpful in everyday coding. I therefore advise everyone to learn this kind of programming!

Here again clearly listed all links about ETL: