Skip to main content

Lab 1: Ingestion & Transformation

Ingest real-world data from APIs and transform it to uncover insights about clean energy patterns.

In this lab, you'll build a pipeline that ingests weather and carbon intensity data daily to analyze how weather patterns impact the UK electricity grid. The goal? Help your company save money and reduce environmental impact by scheduling resource-intensive operations at optimal times based on carbon intensity forecasts.

The Business Case

Your company runs compute-intensive data operations that consume significant energy. By understanding when the electricity grid uses cleaner energy sources (wind, solar) versus dirtier sources (coal, gas), you can:

  • Save significant money by running operations during low-carbon periods
  • Reduce environmental impact and qualify for sustainability tax incentives
  • Optimize scheduling based on predictive weather forecasts

Preliminary research suggests that running operations during low-carbon periods can save your company up to 50% on our energy costs.

Prerequisites

Step 1: Ingest weather and carbon data

Let's start by ingesting data from two public APIs:

  • UK Carbon Intensity API - Real-time and historical carbon intensity of the UK electricity grid
  • Open-Meteo Weather API - Historical and forecast weather data

Open Otto (Ctrl+I) and paste the following prompt:

Hi Otto! I want to build a new pipeline called carbon_weather_analysis that incrementally ingests weather and energy data to analyze how weather patterns impact carbon intensity on the UK electricity grid. Please create python read components for these two data sources:

- UK Carbon Intensity API (https://api.carbonintensity.org.uk)
- Open-Meteo Weather API (https://api.open-meteo.com)

These ingestion components should gracefully handle schema changes, rate limits, and empty data frames. The goal should be to ingest all the data from all the columns.

Please ensure data quality tests do not cause the entire flow to fail and are set to warn instead.

For the initial load, let's backfill the last 30 days of data from both sources. Each component should ingest data in daily intervals.

Please run the flow to ensure it succeeds. If the run encounters an error, please attempt to fix it. Enable a full refresh on every run.

Watch as Otto:

  1. Creates Python Read Components for each API
  2. Handles data quality and error handling
  3. Backfills 90 days of historical data
  4. Runs the Flow to verify everything works
tip

This may take a few minutes as Otto builds and tests the ingestion components. Use this time to explore what Otto is creating in the Files panel.

Step 2: Transform and analyze the data

Now let's create transformations that help us understand clean energy patterns over time.

Paste this prompt into Otto:

Based on the ingested data, create transformations for this flow that help us understand clean energy patterns over time:

- Join weather and carbon intensity data by timestamp
- Categorize weather conditions (temperature, wind, cloudiness)
- Show carbon intensity by weather category
- Show carbon intensity by hour of the day
- Show carbon intensity by Day of the week

Please run the new components to ensure they succeed.

Otto will create a series of Transform Components that:

  • Join the weather and carbon data on timestamp
  • Categorize weather conditions
  • Aggregate carbon intensity by various dimensions
  • Identify optimal time windows for operations
note

This prompt creates several transformations. Otto may take some time to build and iterate on all of them. If you find the agent is getting lost can help to ask agents to focus on one transformation at a time.

Step 3: Add predictive analytics

Now that we have a clear picture of historical patterns, let's look forward to find optimal windows to run our operations next week.

Paste this prompt:

Let's create a different python read component that gets the weather forecast for the next week. 

Then let's do predictive analytics to determine carbon impact by hour for the next 7 days based on the historical carbon intensity data (carbon intensity by hour of the day, day of the week, and by weather category).

Finally, let's identify the low carbon impact time intervals when we should operate our resource intensive operations. We should find the best 5 hour intervals for each day of the week. Then rerun the entire flow to make sure it works end to end.

Otto will:

  1. Create a new Read Component for weather forecasts
  2. Build predictive models based on historical patterns
  3. Identify optimal operation windows for the coming week
  4. Run the complete pipeline end-to-end

Checkpoint

By the end of this lab, your pipeline should include:

  • Read Component: UK Carbon Intensity API (90 days historical)
  • Read Component: Open-Meteo Weather API (90 days historical)
  • Read Component: Weather forecast (next 7 days)
  • Transformations: Joined weather + carbon data, Weather categorization, Carbon intensity analysis (by time, day, month)
  • Transformations: Predictive recommendations for next week
Need help?

Ask a bootcamp instructor or reach out in the Ascend Community Slack.

Next steps

Continue to Lab 2: Orchestration & Automation to schedule your pipeline and set up intelligent alerts!