Understanding Data Engineering

After transitioning to Product Management, I prepared myself to work more closely with software developers. Nothing prepared me to join a team composed of Data professionals only, and this is what happened. It was intimidating at first, but I feel fortunate to collaborate with such a broad range of new tech skills.

I thought I already knew what it was like to be working with a Data focused team. At OCUS, I worked closely with the Data analysts, creating dashboards if they were too busy to and following the same Looker training they received. And in my other roles, where there were no Data teams, I had to look for the numbers myself. But as it turns out, a Data Analyst is just one piece of the puzzle. Joining a data – focused team is stepping into a whole new ecosystem.

Every day has its share of learnings. And discovering the role of Data Engineer was the first one.

The role of the Data Engineer is to build the infrastructure that will support the storage and the movement of your data within your organization. Data Engineers don’t focus on analyzing the data itself or building predictive models; those tasks are handled by Data Analysts or Data Scientists. Instead, Data Engineers ensure that the data is accessible, reliable, and ready for use by others on the data team.

It’s why if you are not working with raw data, there are fewer chances you interact with Data Engineers directly. Unless you are a Data Product Manager or building data-driven products.

Data Engineers have the main responsibility of building the data pipelines. These pipelines will collect, transform and load data from various sources into your data warehouses. Sources can be public data, data from your marketing tools, from your CRM, from product analytics tools… The data stored in your warehouse will be then either processed by your Data Scientists or sent to other sources such as your BI dashboards.

Data Engineers also play a role in data governance, ensuring that data is handled according to regulations and internal policies. This includes implementing data quality checks, monitoring access, and ensuring that data is properly cataloged and documented.

Let’s finish this article with a use case. As a Product Manager, you need to monitor the cost of user acquisition from marketing campaigns run on Facebook Ads and Google Ads. This data has never been tracked before. Before you can ask your Data Analyst to run an analysis and build a dashboard, you first need to ensure that this data is accessible.

This is where the Data Engineer comes in. The Data Engineer will connect to Facebook and Google APIs to request and retrieve the necessary data. They will then clean and transform the data to remove any discrepancies, errors, or bugs, ensuring that the information is accurate and consistent. Once the data is processed, the Data Engineer will store it in the data warehouse, making it accessible for further analysis.

Finally, the Data Engineer will catalog and document the data, noting where it comes from and how it has been processed. They will also build a data pipeline that automatically updates these metrics, ensuring that your Data Analyst can easily query the data or use it to build a dashboard.

Why is it important to understand what a Data Engineer does? Firstly, if you’re working in a small company that can’t afford dedicated Data Engineers, your Data Analysts or Data Scientists may need to take on some of these responsibilities in addition to their own. This could mean that the analyses you request might take longer to complete. To support them, it’s helpful to plan ahead—anticipating your future data needs and identifying data sources early so they can be integrated into your roadmap. Secondly, fully understanding the Data Engineer’s role and their contributions will enable you to make more informed decisions about how to leverage data within your organization. This understanding also helps in allocating resources effectively, ensuring that data-related initiatives are aligned with business goals and timelines.

In future articles, I plan to dive deeper into other data topics like data ingestion, orchestration, governance and the specific tools. But for now, I wanted to keep things simple and share what I’ve learned about this role to help other non-tech professionals who might be as curious as I was. I hope this introduction has been helpful for you too!


Posted

in

,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *