Pipeline Component

Overview

A PipelineComponent contains several components which connected as a pipeline. PipelineComponent can be treated as a black-box component, and can be defined with the same interface as a Component.

How to define Pipeline Component using dsl.pipeline

In the Component SDK, a PipelineComponent can be defined as a function with @dsl.pipeline decorator imported from azure.ml.component. You can define components(or pipeline components) objects inside the function and connected their inputs and outputs to make a workflow. Reference here for more information.

Just like other types of components, a PipelineComponent must have an interface for inputs/parameters. Since a PipelineComponent is defined as a function in Component SDK, its interface will be defined along with the function’s parameter. There are 2 ways a PipelineComponent’s interface’s decided: by inferring or user annotation.

Inferred interface

The easiest way to define a PipelineComponent’s interface is to not define it. If a dsl.pipeline function’s parameter does not have annotation or default value, we will try to infer it’s meta(type, range, enum, etc.) based on component parameters it assigned to.

@dsl.pipeline(name='sample-pipeline',
              description='a sample pipeline',
              default_compute_target='aml-compute')
def sample_pipeline(pipeline_input, pipeline_enum, pipeline_int) -> Pipeline:
    hello_world = hello_world_component_func(
        input=pipeline_input,
        enum_param=pipeline_enum,
        int_param=pipeline_int
    )
    return hello_world.outputs

The above snippet shows a pipeline contains 1 component hello_world. Every component’s parameter are linked to a parameter of the function(aka pipeline parameter). Since the interface of hello_world’s parameters is already decided when it’s created. We can use component hello_world’s interface to infer pipeline component sample_pipeline’s interface.

Say component hello_world have the following interface:

inputs:
  input:
    type: path
  enum_param:
    type: Enum
    enum:
    - Option1
    - Option2
  int_param:
    type: Integer
    min: 0
    max: 10
    optional: true

The pipeline component sample_pipeline will have similar interface:

inputs:
  pipeline_input:
    type: path
  pipeline_enum:
    type: Enum
    enum:
    - Option1
    - Option2
  pipeline_int:
    type: Integer
    min: 0
    max: 10
    optional: true

User annotated interface

A pipeline component’s interface can also be defined via annotation. The annotation can be:

  • Python basic built in type annotation, supported types are int, float, bool, str and Enum, eg:

    @dsl.pipeline()
    def sample_pipeline(int_param: int, float_param: float, bool_param: bool, str_param: str, enum_param: Enum) -> Pipeline:
      ...
    
  • Default value with basic types, supported types are int, float, bool, str and Enum, eg:

  from enum import Enum

  class EnumOps(Enum):
    Option1 = 'option1'
    Option2 = 'option2'

  @dsl.pipeline()
  def sample_pipeline(int_param=1, float_param=1.0, bool_param=False, str_param="str", enum_param=EnumOps.Option1) -> Pipeline:
    ...
  • dsl.types, user can add additional fields(min, max, enum, optional, description, default) for a parameter, supported types are Input, Integer, Float, Boolean, String, Enum, eg:

from azure.ml.component.dsl.types import Input, Enum, Integer

@dsl.pipeline()
def sample_pipeline(
        pipeline_input: Input(type="path", description="input path"),
        pipeline_enum: Enum(enum=["Option1", "Option2"])="Option1",
        pipeline_int: Integer(min=0, max=10, optional=True)
    ) -> Pipeline:
    ...

Validate logic

Pipeline component validation based on 2 rules:

  1. If a pipeline parameter has annotation, validate it against it’s linked component parameter’s interface.

  2. If a pipeline parameter do not have an annotation, iterate all it’s linked parameters and check if the pipeline parameter linked parameters have different type/range/enum.

Note: if a pipeline parameter annotated as optional, but linked to a required component parameter, there will be validation error.

Create pipeline component

A pipeline component can be created via Component.create.

Reference here for more information, eg:

from azure.ml.component import Component, Pipeline

@dsl.pipeline(name='sample-pipeline',
              description='a sample pipeline',
              default_compute_target='aml-compute')
def sample_pipeline() -> Pipeline:
    ...
component_func = Component.create(sample_pipeline, version="0.0.1")

Naming rule

The name of pipeline component here must be between 1 and 255 characters, start with letters or numbers. Valid characters are letters, numbers, “.”, “-” and “_”.

Create anonymous pipeline component

By default, when submitting a dsl.pipeline, all sub pipelines of current pipeline will be created as anonymous pipeline components, eg:

from azure.ml.component import Component, Pipeline

@dsl.pipeline(name='sub-pipeline',
              default_compute_target='aml-compute')
def sub_pipeline() -> Pipeline:
    ...

@dsl.pipeline(name='sample-pipeline',
              default_compute_target='aml-compute')
def sample_pipeline() -> Pipeline:
    node1 = sub_pipeline()
    ...

pipeline = sample_pipeline()
pipeline.submit()

In above code snippet, when submitting pipeline, sub_pipeline will be created as anonymous pipeline component.