Components
In Ascend, components are the essential building blocks of data pipelines. They make up flows, which define the sequence and logic of data transformations. Each component is defined within the flow by a YAML, Python, or SQL file, depending on its type and purpose. These components handle specific tasks, from data ingestion to processing and finally to output. Understanding how these components interact within a flow is crucial for building efficient and reliable data pipelines.
This document provides a high-level overview of the different types of components and their roles within a data pipeline.
Components Types
For complete details on the types of components available and their configuration options, see the Component Reference.
Components in Ascend are categorized into several types, each serving a distinct purpose in the data pipeline:
-
Read Components are the starting points of a data pipeline. They connect Ascend to various data sources, such as databases, file storage systems, and cloud services, enabling data ingestion. Once the data is ingested, it is processed within the data plane.
-
Transform Components are responsible for the data processing logic in Ascend. They allow you to define operations, such as data cleaning, aggregation, and enrichment, using SQL or Python. These components are critical for preparing data for analysis or further processing.
-
Write Components are the endpoints of the data pipeline. They export the processed data to external systems, databases, or storage solutions, ensuring that the data is available for downstream applications or storage.
-
Task Components provide the flexibility to execute custom SQL or Python code within a flow. They enable complex processing tasks by allowing you to run multiline SQL statements or Python scripts with minimal modifications.
-
Test Components validate data quality and integrity throughout the data pipeline. These components ensure that data meets specified criteria at various stages, reducing errors and improving reliability.
-
Compound Components are specialized components composed of one or more sub-components. They serve as a group of logically related components that can be managed together as a single unit. Compound components are typically generated by an Application, which passes in a configuration to create the sub-components.
Tests can also be added to individual Components, and are executed automatically as part of the Component's execution.
Interaction of Components in a Flow
The strength of Ascend lies in how its various components work together within a data pipeline, forming a cohesive and efficient flow. Each component plays a specific role, but it's the interaction between them that ensures data is processed accurately and efficiently from start to finish.
Flow Example: From Ingestion to Output
-
Data Ingestion with Read Components:
- The flow begins with Read Components, which connect to external data sources like databases, file storage systems, or cloud services. These components are responsible for ingesting raw data into the Ascend platform.
-
Data Processing with Transform and Task Components:
- Once data is ingested, it typically passes through one or more Transform Components. These components apply data processing logic—such as cleaning, aggregating, or enriching the data—using SQL or Python.
- In more complex scenarios, Task Components might be used alongside or instead of Transform Components to execute custom processing tasks. These tasks might involve running complex SQL queries, executing Python scripts, or integrating with external services.
- Throughout the flow, Tests can be integrated within various components to validate the data's quality and integrity.
-
Data Output with Write Components:
- After the data has been processed, Write Components take over. These components export the transformed data to its final destination, which could be an external database, a cloud storage service, or another application that relies on the processed data.
By combining these components effectively, you can build robust, scalable, and efficient data pipelines that handle everything from data ingestion to complex transformations, all the way through to final output. The flexibility to mix and match components based on your specific needs is what makes Ascend a powerful tool for data engineering.