2 min read

Roadmap for ETL

1 Introduction

In the last articles I have intensely dealt with the topic ETL.

An introduction to this topic is worthwhile for the following reasons:

  • Modularity - better coding
  • Flexibility
  • Easier for other data scientists to read the code
  • Easier error avoidance
  • Automation

2 Roadmap for ETL

At the beginning of the series of lectures I showed basically how to call .py files, which are located in different directories, in a notebook.

You can find this post her:

Furthermore I have designed the following variants of pipelines. Each can be developed further as desired. Here is the corresponding legend in advance:

2.1 “Simple Pipeline”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Simple Pipeline”

2.2 “Pipeline with join”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join”

2.3 “Pipeline with join2”

You can get the corresponding python script here: “GitHub-Michael_Fuchs_Pipeline with join2”

3 Conclusion

The object-oriented programming used in the creation of ETLs is extremely helpful in everyday coding. I therefore advise everyone to learn this kind of programming!

Here again clearly listed all links about ETL: