Hello all! Hope you are doing good.
In the last couple of years, you might’ve heard a lot about Data Engineering. It surely gained a lot of buzz in recent times and… every company wanted data engineers. Needless to say, the demand for data engineers was at an all-time high.
But what is data engineering though? And why do we need it? Let's understand all of that in this post with the following topics:
- What is Data Engineering? And why do we need it?
- Responsibilities of a Data Engineering Team
What is Data Engineering? And why do we need it?
Simply put, data engineering deals with collecting, storing, processing the data in a data warehouse and serving that data to various stakeholders.
Data will be generated in a company by different teams at variety of systems like databases, APIs, streaming events, file servers etc. This is the data required by different teams to carry out various analysis.
Generally, the incoming data is in different formats and sizes from different sources and that data is stored into an archival/analytics system like a data warehouse or a data lake. When the data is in data warehouse, it will be cleaned, transformed into a mutually agreed format between the stakeholders.
The data engineering team will build and maintain the pipelines and processes like ETL/ELT for data ingestion and data transformations across all the data that is being received in the data warehouse.
The end goal of a data engineering efforts is analytics ready data or “clean data”.
Why a centralized system like data warehouse?
To ensure that all the company’s data is in one single system and any team looking for particular data can easily access it. This ensures that there is no overhead for any team to obtain the required data in a common format that is used across the entire organization
This also means that there is no duplication of effort across any teams for creating any dataset from multiple sources.
Responsibilities of a Data Engineering Team
- Identify data sources, analyze the data and ingest the data into the data warehouse
- Build and maintain the data pipelines for periodic ingestion and processing of the data
- Adding resiliency to the pipelines for failures
- Build and maintain the data warehouse tables and specialized datasets
- Maintaining data quality and integrity
- Last but not least, maintain and scale the data infrastructure
- The whole data engineering effort is internal to a company mostly. There is no customer interaction, nor any direct revenue generated here. There will be a number of questions on its viability, credibility and ROI
- There can be a lot of incoming data requests from various teams across entire company
- Context switching. The data engineering team handles a good number of pipelines, and they will be taking up further tasks collaborating with different teams to fulfill their data requests. Handling all of these things at once will require context switching and that might affect the quality in the long run.
You can find more details, examples on the data engineering and few important questions about it in the below video
Did you find this article valuable?
Support maninekkalapudi by becoming a sponsor. Any amount is appreciated!