Projects
Projects serve as the foundational structure for organizing and managing data engineering workflows on the Ascend platform. A Project in Ascend encapsulates the entire lifecycle of data processing activities, from ingestion to transformation, data quality checks to orchestration. It allows users to logically group related data pipelines, workflows, and resources, facilitating better organization, management, and scalability of data engineering tasks.
Benefits
- Organized Workflow Management: Projects allow for a structured approach to data pipeline development, helping teams organize workflows, Components, and configurations logically.
- Collaboration and Version Control: With projects, multiple users can collaborate on data engineering tasks, share resources, and manage versions of pipelines and configurations, ensuring consistency and reducing duplication of effort.
- Isolation and Security: Projects provide a way to isolate resources and manage access permissions, ensuring that sensitive data and critical workflows are protected and accessible only to authorized users.
- Scalability: By organizing data workflows into projects, users can more easily scale their data engineering efforts, managing multiple pipelines and workflows efficiently as their data needs grow.
- Reusability: Projects enable the reuse of common configurations, Components, and components, speeding up the development of new pipelines and reducing the time to insights.
Sample Configuration
For complete details on the options available in the ascend_project.yaml
file, see the Project Reference.
To give a practical example of how a Project is configured in Ascend, let's provide a minimal example. A project is defined by the ascend_project.yaml
file, which is located at the root of the project. This file contains metadata about the project. By default, the file and directory structure looks like this:
ascend_project.yaml
connections/
data/
flows/
profiles/
tests/
vaults/
These defaults, however, can be overridden by the ascend_project.yaml
file. The default contents of this file are as follows:
project:
automations: ["automations/"]
connections: ["connections/"]
data: ["data/"]
flows: ["flows/"]
profiles: ["profiles/"]
tests: ["tests/"]
vaults: ["vaults/"]
This configuration snippet demonstrates a basic Project setup in Ascend. It defines a project with Automations, Connections, Flows, Tests, and Vaults directories. Each of these components plays a crucial role in the data processing lifecycle within Ascend:
- Automations: Define the logic for automations within the project. These can include actions like data flow scheduling, notification and alerting, and more.
- Connections: Link Ascend to external data sources, enabling data ingestion, transformation, and delivery. They include configuration details and credentials for interacting with systems like warehouses, databases, cloud storage, APIs, and more.
- Data: Local data files used by the project.
- Flows: Define sequences of operations required to read, transform, and/or write data, or execute arbitrary code as tasks. Flows are composed of Components which represent individual processing steps.
- Profiles: Define custom parameters to be used by resources within the project.
- Tests: Include data quality and integrity checks to ensure the reliability of the data pipelines.
- Vaults: Securely store sensitive information, like API keys and database credentials, used within the project.
Conclusion
Ascend.io projects are an essential framework for organizing data engineering workflows, offering a comprehensive way to manage the entire lifecycle of data processing activities. By leveraging projects, teams can efficiently structure their data pipelines, foster collaboration, and maintain control over security and scalability. The ability to isolate resources, manage access permissions, and reuse common components significantly enhances the development and management of data workflows, ultimately reducing the time to production. With the flexibility of customizable configurations, Ascend.io projects provide a robust foundation for any data engineering endeavor, empowering teams to scale their efforts as data needs evolve.