HDInsight Component
Overivew
A HDInsightComponent is a Component that executes a spark job in a HDInsight cluster.
Scenarios
Use Apache Spark to train your model or analysis your data, multiple popular frameworks are supported:
PySpark
.NET for Spark
JAVA
Limitation
Currently only works on compliant HDI cluster created by Office team.
How to write HDInsightComponent yaml spec
Please refer to HDInsightComponent spec doc.
Please also see HDInsightComponent schema .
Example yaml:
$schema: https://componentsdk.azureedge.net/jsonschema/HDInsightComponent.json
name: microsoft.com.azureml.samples.train-in-spark
version: 0.0.1
display_name: Train in Spark
type: HDInsightComponent
description: Train a Spark ML model using an HDInsight Spark cluster
inputs:
input_path:
type: path
description: Iris csv file
optional: false
regularization_rate:
type: float
description: Regularization rate when training with logistic regression
optional: true
default: 0.01
outputs:
output_path:
type: path
description: The output path to save the trained model to
hdinsight:
file: "train-spark.py"
args: >-
--input_path {inputs.input_path} [--regularization_rate {inputs.regularization_rate}] --output_path {outputs.output_path}
Samples
Follow how to access instructions if you meet 404 error when accessing the samples.
How to use HDInsight component - Demonstrates how to use hdi component to train spark ML models.
Appendix
In legacy module concept, this maps to hdinsight jobType Module.
legacy module spec doc