Component
What is Azure Machine Learning component?
A component is self-contained set of code that performs one step in the ML workflow (pipeline), such as data preprocessing, model training, model scoring and so on. A component is analogous to a function, in that it has a name, parameters, expects certain input, and returns some value.
Component types
Component SDK support multiple component types, for example:
CommandComponent: where you can specify a launching command line with a container so that you can invoke your script (Python, R, shell) or Exe.
DistributedComponent: to help user launch distributed training jobs for popular deep learning frameworks like PyTorch.
ParallelComponent: to create a reusable component to asynchronously process large amount of data in parallel, built on top of AzureML’s ParallelRunStep capability.
SweepComponent: to enable user to automate efficient hyperparameter tuning.
HDInsightComponent: to execute a spark job in a HDInsight cluster.
How to create an Azure ML component
Data scientists or developers can wrap their arbitrary code as Azure ML component by following steps.
Define components with component specification
A component specification in YAML format describes the component in the Azure Machine Learning system. A component definition captures the following information:
Metadata: name, display_name, version, type, etc.
Interface: input/output specifications (name, type, description, default value, etc).
Command, Code & Environment: command, code and environment required to run the component.
Example:
$schema: https://componentsdk.azureedge.net/jsonschema/CommandComponent.json
# Metadata
name: microsoft.com.azureml.samples.train
version: 0.0.5
display_name: Train
type: CommandComponent
description: A dummy training module
tags: {category: Component Tutorial, contact: amldesigner@microsoft.com}
# Interface
inputs:
training_data:
type: path
description: Training data organized in the torchvision format/structure
optional: false
max_epochs:
type: integer
description: Maximum number of epochs for the training
optional: false
learning_rate:
type: float
description: Learning rate, default is 0.01
default: 0.01
optional: false
outputs:
model_output:
type: path
description: The output model
# Command & Environment
command: >-
python train.py --training_data {inputs.training_data} --max_epochs {inputs.max_epochs}
--learning_rate {inputs.learning_rate} --model_output {outputs.model_output}
environment:
name: AzureML-Designer
Define components with python function
Define component using python function make it easier to iterate quickly by letting you build your component code as a python function and generating the component specification for you.
Only some types of components can be defined. For more details, refer to define-components-with-python-function.
Example:
@dsl.command_component(
name='train_component',
description='A dummy train component defined by dsl component.',
version='0.0.1',
)
def train_component_func(
training_data: Input,
max_epochs: int,
model_output: Output,
learning_rate=0.01,
):
lines = [
f'Training data path: {training_data}',
f'Max epochs: {max_epochs}',
f'Learning rate: {learning_rate}',
f'Model output path: {model_output}',
]
for line in lines:
print(line)
# Do the train and save the trained model as a file into the output folder.
# Here only output a dummy data for demo.
model = str(uuid4())
(Path(model_output) / 'model').write_text(model)
Manage components with UI & CLI
After user defined components with component spec, user can manage components with UI & CLI tools.
Azure ML CLI:
User can use a command like
az ml component createto create component into workspace.After creation, the component will be shared to all users that have access to the workspace.
Reference this quick walk through to the azure cli extension that manipulates the component.
-
Adding &flight=cm to url in your browser will enable the
Modulespage which can help create, list, update your components.Note:
Moduleis a legacy name ofComponent. UI renaming change will happen in near future.
How to consume a component using SDK
Load a component
Component.from_yaml: Load component from yaml spec as an anonymous component. Then you can run the component in local to test. If the workspace is not specified, it will load a workspace independent component. You can specify the workspace when submit or validate the pipeline containing the workspace independent components.
Component.load: Load component by name and version or by id from workspace or registry. The component needs to have been created in the workspace using
az ml component create.
# It will register the component as anonymous component in the workspace.
train_component_func = Component.from_yaml(workspace=ws, yaml_file='./components/get-started-train/train.yaml')
# Load workspace independent component. It will postpone anonymous component registration to pipeline.validate(workspace) or pipeline.submit(workspace).
train_component_func = Component.from_yaml(yaml_file='./components/get-started-train/train.yaml')
# Load component from workspace.
score_component_func = Component.load(workspace=ws, name='microsoft.com.azureml.samples.score', version='0.0.1')
# Load component from registry.
basic_component_func = Component.load(workspace=ws, name='basic_module', version='0.0.1', registry='testFeed')
basic_component_func = Component.load(workspace=ws, id="azureml://registries/testFeed/components/basic_module/versions/0.0.1")
Parameterize a component
User can control the runtime behavior of a component by specifying:
inputs and outputs: learn more on available runtime settings in the reference doc inputs and outputs.
runsettings: learn more about runsettings in the reference doc runsettings.
component = train_component_func(
# inputs can be an existing dataset or outputs of other components
training_data=train_data,
# parameters
max_epochs=5,
learning_rate=0.01)
# component runsettings
component.runsettings.target='aml-compute'
# output runtime behavior configuration: the relative path of component's output on datastore
component.outputs.model_output.configure(path_on_datastore="azureml/train_component/{run-id}/{output-name}")
Run a component locally
Component.run: Run component in your local docker container, conda, or host environment. This is useful for quick evaluation, debug and integration test of component.
Reference component.run for more details.
Build and submit a pipeline with the component
Reference pipeline for more details.
What’s the benefit of Component?
Currently, Azure Machine Learning offers PipelineStep as the basic building block of machine learning pipeline. PipelineStep is one-off wrap of code that cannot be reused across different pipelines. Compare to PipelineStep, component greatly simplifies the ML pipeline development lifecycle, enables reproducibility and accelerates the collaboration for all-skill data scientists in a team:
Composability: This is the native benefit of the component. It hides the complicated logic, and only exposes a simple interface. So component consumers don’t need to worry about the underlying implementation. They can easily use components built by themselves or by others. In the meanwhile the ground truth (component specification) of a component is visible, making secondary development easy.
Reusability: Component can be easily reused across different ML pipelines, different ML workspaces, even different organizations.
Easy development, testing & debug: Rich SDK and CLI feature to make component development, testing, and debugging much easier.
Easy pipeline authoring: Once a component is created in the workspace, it can be easily consumed in both Python SDK and drag-and-drop Designer UI.
Reproducibility: By capturing all information in component specification, AML Component can be easily reproduced in different environments. Components can be managed in versions, so it’s easy to trace back if a data scientist wants to reproduce a specific experiment result.
Sharing & collaboration: Data scientists who prefer low-code/no-code can use Designer UI to quickly prototype and export the pipeline to Python code for code-first data scientists for further tuning or check-in. The exported Python notebook can also be submitted to different workspaces for sharing & collaboration.
Next steps
Getting Started
Follow getting started tutorial to learn how to build a pipeline with existing components.
Consume existing components
Component gallery is an open community for data scientists to contribute, share, and find machine learning pipelines as well as custom-built components to be used in Azure Machine Learning. It has more than 50 components for common machine learning tasks. Source code of all components can be found in the gallery.