Python Transform input formats
Data for Python Transform Components can be provided in any of the following formats:
- Ibis table (default) - A portable Python dataframe library that provides a consistent API across different backends
- pandas DataFrame - The popular Python data analysis and manipulation library with DataFrame structures
- Python Dictionary (
input_data_format="dict"
) - Standard Python dictionary format for structured data - PyArrow (
input_data_format="pyarrow"
) - Apache Arrow's Python library for columnar in-memory analytics - DuckDB PyRelation - DuckDB's native Python relation object for efficient in-memory analytical processing
To specify a format other than the default Ibis, use the input_data_format
parameter with the @transform
decorator. For example: input_data_format="pandas"
.
These formats provide flexibility in how you work with data in your Python Components, allowing you to choose the library that best fits your use case and performance requirements.
Specialized Components
- Snowpark components require a Snowflake Data Plane and use the
snowflake.snowpark.DataFrame
input type - PySpark components require a Databricks Data Plane and use the
pyspark.sql.DataFrame
input type
Next steps​
Ready to build your Python Transform Components? Check out these how-to guides:
- Create a Simple Python Transform - Build a basic Python Transform without incremental or smart partitioning
- Create an Incremental Python Transform - Learn how to build incremental transforms for efficient data processing
- Create a Smart Python Transform - Implement smart partitioning strategies for optimized performance