Altair Compose is an open source Python library created by the team at Altair Engineering for building machine learning workflows and pipelines. It provides a drag-and-drop interface to simplify the process of constructing complex workflows for data transformation, modeling, evaluation, and deployment.

With Altair Compose, data scientists and developers can quickly build ML pipelines without having to write lots of boilerplate code. It enables rapid iteration and simplifies the transition from notebooks to production workflows.

Some of the key features and capabilities of Altair Compose include:

  • Drag and drop UI for constructing ML pipelines
  • Integration with popular data science libraries like Pandas, NumPy, Scikit-Learn
  • Built-in visualization and reporting functions
  • Native support for notebooks and Python scripts
  • Simplifies the model development process
  • Open source library with an active community

Overall, Altair Compose streamlines machine learning workflows, makes it easier to generate models, and bridges the gap between experimentation and production. It’s well-suited for use cases like automated machine learning, data preprocessing, model evaluation, and deploying models to production.

Key Features and Capabilities of Altair Compose

As mentioned above, Altair Compose comes with a wide range of features and capabilities that improve the machine learning model development workflow. Here are some of the highlights:

  • Drag and drop interface – The drag and drop UI enables users to quickly construct ML pipelines by visually connecting components, without writing code. This makes iterating on pipelines fast and intuitive.

  • Support for notebooks and Python – Altair Compose pipelines can be built interactively in Jupyter notebooks or defined in Python scripts for reuse. This flexibility is great for iteration and production.

  • Integration with data science libraries – Under the hood, Altair Compose leverages popular Python data science libraries like Pandas, NumPy, Scikit-Learn, so you can leverage your existing knowledge.

  • Built-in visualization – The library comes with native support for visualizations like histograms, scatter plots, and learning curves out-of-the-box to simplify model analysis.

  • Model reporting – You get auto-generated reports on model metrics, parameters, and performance to easily compare models and make decisions.

  • Simplified model development – With its high-level components, Altair Compose abstracts away boilerplate code to accelerate developing machine learning model workflows from start to finish.

These features make Altair Compose a powerful tool for any data scientist looking to improve productivity and streamline the path from ideation to production deployment.

Use Cases and Applications of Altair Compose

Altair Compose is flexible enough to support a wide range of machine learning use cases and applications, including:

  • Building machine learning workflows – The drag and drop interface shines for rapidly developing machine learning pipelines for tasks like data cleaning, feature engineering, modeling, evaluation, and more.

  • Automated machine learning – AutoML capabilities make it easy to automatically test different algorithms and hyperparameters to find the best model.

  • Data transformation – As part of a workflow, Altair Compose can handle typical data manipulation and preprocessing tasks like parsing, standardization, imputation, encoding, etc.

  • Model evaluation and analysis – The built-in visualization and reporting tools simplify evaluating model performance, analyzing results, and comparing runs.

  • Model deployment – Models and supporting artifacts can be exported from Altair Compose to integrate into model deployment and serving platforms.

In summary, Altair Compose supports common machine learning tasks spanning the entire project lifecycle, from initial prototyping to deployment. The visual programming approach makes it accessible to beginners as well.

Core Components and Architecture

Under the hood, Altair Compose has a modular architecture consisting of several core components that enable you to build end-to-end ML workflows:

Sources load and ingest data into the pipeline from various locations and formats.

Transformers manipulate and preprocess data for modeling by cleaning, filtering, normalizing, encoding, etc.

Estimators consume the processed data and train machine learning models, evaluating performance.

Finally, Sinks output the trained models, metrics, plots, and other pipeline artifacts.

These high-level building blocks abstract away the complexities of workflow orchestration. By snapping them together in the UI, you can construct sophisticated pipelines to streamline model development.

Key Modules in Altair Compose

Altair Compose comes equipped with many pre-built modules across the sources, transformers, estimators, and sinks categories that you can leverage to assemble machine learning pipelines tailored to your needs:


The sources provide functionality for loading and ingesting data from various locations and in different formats. Some commonly used sources include:

  • CSV source – Load CSV data from files or URLs
  • JSON source – Ingest JSON serialized data
  • SQL source – Pull data from SQL databases
  • DataLoader – Load preprocessed data from Python
  • Random data – Generate random synthetic dataset


Transformers manipulate and preprocess data in preparation for modeling. Some examples:

  • Column filter – Filter columns based on conditions
  • Data cleaner – Fill missing values, smooth outliers
  • Categorical encoder – Encode categorical data to numbers
  • Text vectorizer – Convert text to numerical vectors
  • Feature selector – Select subset of relevant features


Estimators consume transformed data and train machine learning models. Altair Compose supports many popular ML algorithms:

  • Linear/Logistic regression
  • Decision tree and random forest
  • Boosted trees (XGBoost, LightGBM, CatBoost)
  • KMeans clustering
  • Neural network classifiers


Sinks provide a way to output assets from the pipeline into usable artifacts. Common sinks include:

  • Model extractor – Extract serialized model
  • File sink – Save dataframes/images to files
  • Visualization sink – Render plots and visualizations
  • Report sink – Output model metrics and parameters

This covers the key components available in Altair Compose to help construct end-to-end ML workflows and pipelines with minimal coding.

Developing a Machine Learning Pipeline with Altair Compose

One of the biggest value propositions of Altair Compose is how quick and easy it makes to develop a full machine learning pipeline. Here is walkthrough of a sample workflow for a classification problem:

Loading Data

First, we need to load data into our pipeline. We’ll use the CSV source to ingest our dataset from a CSV file located on disk or at a URL. The source outputs a Pandas dataframe we can then manipulate.

Transforming Data

Next, we’ll apply some transformers to clean and preprocess the data for our model. Steps might include: – Fixing missing values – Converting categorical text features to numbers using one-hot encoding – Normalizing numeric features – Selecting important predictive features for our problem

Applying Estimators to Build Models

Now we’re ready to train some models. We’ll hook up an estimator like a random forest classifier and pass in our preprocessed dataset. The estimator will automatically train and tune the model using cross-validation.

Evaluating Model Performance

Once our candidate models are trained, we can evaluate their performance by adding a model evaluation transformer to score on a test set and generate key metrics like accuracy, AUC, etc.

Saving and Exporting Models

Finally, we can connect a sink to serialize the best model to disk for persistence, save model metrics and parameters to a report, and output any plots visualizing model performance.

Just like that, we've built an end-to-end machine learning pipeline with Altair Compose using its easy drag-and-drop interface and library of pre-made components.

Integrations and Supported Libraries

A key strength of Altair Compose is its level of integration with popular data science libraries:

  • Pandas – Seamlessly work with Pandas dataframes for data manipulation
  • NumPy – Integrates with NumPy for numerical processing
  • Scikit-Learn – Interface with Scikit-Learn tools like pipelines and transformers
  • TensorFlow – Libraries like Keras and TensorFlow are supported for model building
  • ONNX – Models can be exported to the standard ONNX format for interoperability

This makes it easy to incorporate Altair Compose into existing workflows leveraging these common Python libraries. No need to learn specialized tools.

Altair Compose vs. Alternative Options

How does Altair Compose stack up to other similar tools? Here's a quick comparison with some popular alternatives:

  • TensorFlow Extended (TFX) – TFX is more tailored to TensorFlow-based models. Altair is framework agnostic.
  • Kedro – Kedro is more focused on data workflows vs. machine learning. More coding required.
  • Amazon SageMaker – SageMaker only works on AWS whereas Altair Compose is cloud agnostic.
  • PipelineAI – PipelineAI is closed source. Altair Compose is open source.

In summary, Altair Compose hits a nice balance between flexibility, ease-of-use, and machine learning focus that makes it appealing for many use cases.

Limitations and Considerations

While Altair Compose has many advantages, it’s important to keep its limitations in mind:

  • It is not a complete MLOps or model deployment solution. Additional tools would be needed to go fully into production.
  • The abstractions mean you have less flexibility vs. coding pipelines directly. Some custom logic is harder to integrate.
  • Visual workflow debugging can be trickier than debugging code.
  • Currently only supports Python, unlike polyglot options like Apache Beam.

For many modeling use cases, Altair Compose provides the right level of abstraction. But teams with complex needs may desire more control, scalability, and production support.

