Overview

Azure Machine Learning component SDK & CLI contains the core functionality for working with Azure ML components:

  • Manage Component: Capture a user program which can be a step of a workflow: data preprocessing, model training, model scoring, a hyperparameter tuning run, etc. Such that it can be parameterized and then used in different contexts.

  • Author Pipeline: Create and run reusable machine learning workflows that can be used as a template for your machine learning scenarios.

Benefits of using Component SDK

  • Rich Component Types Support:

  • Easy Pipeline Authoring:

    • Easy authoring machine learning workflow with consistent SDK code experience and UI drag-n-drop experience

    • Graph construction and submission in same python script

    • Interactive authoring experience in Jupyter notebook with pipeline graph visualization

For 1P Customer who previously use Aether, additional benefits to highlight:

  • Superior Experiment Tracking:

    • Easy to search, manage and organize all run history in one place

    • Convenient to log parameters, metrics, code and results in machine learning experiments and compare them using an interactive UI.

    • Integration with OSS platforms like MLflow

    • Easy view your code snapshot for component in each experiment run

  • Seamless Data Access:

    • Keep a single copy of data in your storage, referenced by Datasets

    • Run model training components without worrying about connection strings or data paths, data access credential is managed by AML

    • Cosmos/ADLS Dataset Mount: Avoids additional step of copying to and from NFS, before and after training

    • No ITP NFS space limitation, larger dataset could be supported

Next Steps

  • Start with Getting Started to learn how to install Component SDK/CLI and get started on the samples.

  • If you haven’t used Azure ML before, read Introduction to Azure ML to gain the background knowledge.

Introduction to Azure ML

This section contains a curated collection of links, which help you to get the background knowledge of Azure ML. Each link includes a brief introduction, along with an estimate of the time it will take to read / work through the content in the target page.

  • What is Azure ML(≈ 5 mins) High level introduction to Azure ML.

  • Architecture and concepts (≈ 14 minutes) Learn about the architecture, concepts, and workflow for Azure Machine Learning. This page contains a glossary of AML concepts, we suggest you start with:

    • Workspace: the top-level resource for AML. Workspaces contain your Experiments, Datasets, Compute Targets, Pipelines, Models(and more).

    • Datasets: By creating an Azure ML dataset in workspace, you create a reference to the data source location, along with a copy of its metadata. Because the data remains in its existing location, you incur no extra storage cost, and don’t risk the integrity of your data sources. Also datasets are lazily evaluated, which aids in workflow performance speeds. You can create datasets from datastores, public URLs, and Azure Open Datasets.

    • Datastores are attached to workspaces and are used to store connection information to Azure storage services so you can refer to them by name and don’t need to remember the connection information and secret used to connect to the storage services. Examples of supported Azure storage services that can be registered as datastores are Azure Blob Storage, Azure Files, Azure Data Lake Gen1, Azure Data Lake Gen2, SQL database, etc.

    • Compute Targets: is a designated compute resource where you run your training component or host your service deployment. This location might be your local machine or a cloud-based compute resource (e.g., compute cluster). Using compute targets makes it easy for you to later change your compute environment without having to change your code.

    • Environments: The environment specifies the Python packages, environment variables, and software settings around your training and scoring scripts.

    • Experiments: An abstraction layer designed to organize your Runs. Experiments belong to a workspace, which can in general contain many runs.

    • Runs: A run is a single execution of a training script or component.

  • Designer (drag-n-drop ML) (≈ 4 minutes): Azure Machine Learning designer is a tool in the Azure ML workspace for visually connecting datasets and components on an interactive canvas to create machine learning models. To learn how to get started with the designer, see Tutorial: Predict automobile price with the designer.