Component concepts & pipeline run reuse behavior

In this article, you will learn how component concepts will affect pipeline run reuse behavior and how to diagnostic cases where a component run did not reuse as expected compare to your previous run. The benefits of component reuse are optimize perf of your pipelines and save compute cost.

Component run reuse

Azure ML has ability to reuse previous component run’s results when same component with same inputs are submitted again.

If component reuse happens, your component will be displayed with a little “Recycle” icon on it like the screenshot below.

image-20210604181729947

You can check outputs/logs/metrics of the current run, and it shows same result as previous reused run.

Pipeline run reuse

To reuse a pipeline, specify dsl.pipeline with is_deterministic=True.

Example:

from azure.ml.component import dsl

@dsl.pipeline(is_deterministic=True)
def pipeline_func():
    # your pipeline function logic
    ...

Force component rerun

There are 2 ways to force component rerun instead of reusing previous outputs. You can set is_deterministic=False in component interface level or regenerate_outputs=True in component run level. Setting any of the 2 will force component rerun.

Component interface level

Set is_deterministic=False in component yaml, the specific component will always rerun. Reference here for more information.

Component run level

  1. Set regenerate_outputs=True for a specific component object, the specific component will rerun when submitted. Reference here for more information.

  2. Set regenerate_outputs=True when submitting pipeline, all component run inside the pipeline will rerun. Reference here for more information.

Find reused run id in UX

You can find the reused run id in the properties JSON of the component.

image-20210621175539266