Define components with python function

Overview

Define component using python function make it easier to iterate quickly by letting you build your component code as a python function and generating the component specification for you. In this document, we will demonstrate how to define components and consume them with python SDK.

Supported type Status Decorator Note
Command Component Preview dsl.command_component
Distributed Component Preview dsl.command_component Specify metadata with distribution.

Define component

Refer to fields in component specification YAML, the necessary fields of a component when using python SDK include two parts: metadata and interface.

# Metadata: defined in decorator
@dsl.command_component(description='a+b')
# Interface: defined by function parameter
def sum_func(a: int, b: int):
    print(a + b)

Metadata

Metadata is declared on dsl component decorator, all accessible fields are listed in the table below.

Field name Type Description
name str The name of the component. If None is set, function name is used.
description str The description of the component. If None is set, the doc string is used.
version str Version of the component, default 0.0.1
display_name str Display name of the component.
is_deterministic bool Specify whether the component will always generate the same result. The default value is None, the component will be reused by default behavior, the same for True value. If False, this component will never be reused.
tags dict Tags of the component.
environment Union[str, pathlib.path, dict, azure.ml.component.Environment] Environment config of component, could be a yaml file path, a dict or an Environment object. If None, a default conda with 'azureml-defaults' and 'azure-ml-component' will be used.
code str The source directory of dsl.component, with default value '.'. i.e. The directory of dsl component file.
distribution dict Only for Distributed component, e.g. distribution={'type': 'mpi'}. All available types are mpi, Pytorch(or alias: torch.distributed).

Interface

Interface is defined by the decorated function. Inputs, outputs and parameters are declared by specific annotations as listed below.

Parameter Type Annotation
Input azure.ml.component.dsl.types.Input
Output azure.ml.component.dsl.types.Output
Parameter int, str, bool, float or any other types in azure.ml.component.dsl.types except Input and Output

Sample

The sample below demonstrates how to define a dummy train component.

from pathlib import Path
from uuid import uuid4

from azure.ml.component import dsl
from azure.ml.component.dsl.types import Input, Output, Float

@dsl.command_component(
    name='train_component', 
    description='A dummy train component defined by dsl component.',
    version='0.0.1',
    # specify distribution type if needed
    # distribution={'type': 'mpi'},
)
def train_component_func(
    training_data: Input,  # define a input port
    max_epochs: int,  # define a parameter with annotation
    model_output: Output,  # define an output port
    learning_rate: Float(min=0.01, max=0.5) = 0.1,  # define a parameter with default
):
    # do the train and save the trained model as a file into the output folder.
    # here only output a dummy data for demo.
    model = str(uuid4())
    (Path(model_output) / 'model').write_text(model)

Consume component

After define the component function, it can be used directly to create components.

Sample

Let’s see how to consume the train_component_func defined above in dsl.pipeline.

# define a dsl pipeline function
@dsl.pipeline(description='train model', default_compute_target='aml-compute')
def training_pipeline_func(input_data, learning_rate):
    train = train_component_func(
        training_data=input_data,
        max_epochs=5,
        learning_rate=learning_rate)
    return train.outputs

# create a pipeline instance
pipeline = training_pipeline_func(input_data=your_dataset, learning_rate=0.01)

# validate and submit the pipeline
pipeline.validate(workspace=your_workspace)
pipeline.submit(workspace=your_workspace)

The component snapshot and the generated YAML spec can be found in the pipeline run detail page.

dsl-component-snapshot

Be aware that you can not pass value to Output types of parameter, which is same as the function returned by Component.from_yaml.

Manage component

The dsl component function can be registered explicitly in your workspace.

Sample

The example below shows how to load or register dsl component function.

component_name = 'train_component'
component_version = "0.0.1"
try:
    # load dsl component with name and version
    registered_train_component_func = Component.load(
        my_workspace, name=component_name, version=component_version
    )
except Exception:
    # register the dsl component
    registered_train_component_func = Component.create(
        train_component_func,
        version=component_version,
        set_as_default=True,
        workspace=my_workspace,
    )

The registered components can be found when using CLI list/show.

Next steps

Learn more about dsl components with our example Jupyter notebook how-to-use-dsl-component.