Extract, transform, load ETL
three database functions that are combined into one tool to pull data out of one database and place it into another database.
Extract is the process of reading data from a database. In this stage, the data is collected, often from multiple and different types of sources, so many different databases or csv or excel files.
Data transformation is the process of converting data from one format or structure into another format or structure. Data transformation is critical to activities such as data integration and data management. You transform the data, such as aggregating sales or membership data or converting date formats, editing text strings or joining rows and columns.
Loading data into one database with all data some people call this the “target data warehouse database” and this is the last step of the ETL process.
Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:
- Migrating data between applications or databases
- Exporting data from databases to flat files
- Loading data massively into databases
- Data cleansing
- Integrating applications
PDI is easy to use. Every process is created with a graphical tool where you specify what to do without writing code to indicate how to do it; because of this, you could say that PDI is metadata oriented – is very very important
PDI can be used as a standalone application, or it can be used as part of the larger Pentaho Suite. As an ETL tool, it is the most popular open source tool available. PDI supports a vast array of input and output formats, including text files, data sheets, and commercial and free database engines. Moreover, the transformation capabilities of PDI allow you to manipulate data with very few limitations.
Remember PDI can be free! Open Source! I have created over 18 applications using PDI like PX & BRAD.
So you can easily ETL Data from
Any data source
CSV
Excel
Emails
Memberships Systems
Salesforce / Netsuite CRM
Sage / Microsoft Dynamics 365 / Finance systems
ERP System – SAP Business One