Troubleshooting component code snapshot
When creating a component to workspace, the component spec, entry file and other additional files required by the component are packed to a snapshot and uploaded to the workspace. Since building & creating component code snapshot is a IO-bound operation which depends on disk and network speed, sometimes it takes a long time to finish. Users may need more information to understand component code snapshot build&create process.
This doc introduces a few techniques you can use to debug the process when build or create a component code snapshot.
Troubleshooting snapshot building process
You can check the snapshot build process by adding --verbose parameter to az ml component create command.
For example, say you have the following folder structure:
src/
library1/
hello.py
library2/
greetings.py
component1/
conda.yaml
component_spec.yaml
run.py
tests/
...
.amlignore
Inside src/component1/component_spec.yaml you specified code: .. setting the snapshot folder to the src/ level.
Inside src/.amlignore you specified tests/ so folder src/tests/ will not be included in the built snapshot.
Running the following command will get detailed information of the snapshot building process.
>>>az ml component build --file component1\component_spec.yaml --verbose
Component project builder version: 0.1.0.0 Python executable: C:\Anaconda3\envs\cli_dev\python.exe
========== Build started: E:/demo/src/.build ==========
Start collecting files in snapshot...
Collected E:\demo\src\.amlignore
Collected E:\demo\src\component1\conda.yaml
Collected E:\demo\src\component1\run.py
Collected E:\demo\src\library1\hello.py
Collected E:\demo\src\library2\greetings.py
Collected E:\demo\src\component1\component_spec.yaml
Ignored E:\demo\src\tests
Collected 6 files in 0.19 seconds, total size 2.11 KB
Successfully built snapshot:E:/demo/src/.build
...
The output shows all files that should be included in the snapshot.
Files specified in .amlignore file will not be included.
As shown in example, the demo snapshot includes 6 files with total size 2.11 KB.
If you found there are unnecessary files listed in building log, you can use ignore files to ignore them.
If you are using code snapshot, reference the following section to learn about how to use it together with ignore file.
Troubleshooting snapshot creating process
Likewise, adding --verbose to az ml component create will show detailed information of the snapshot creating process.
Running the following command will get detailed information of the snapshot building process.
>>>az ml component create --file component1\component_spec.yaml --verbose
Start collecting files in snapshot...
Collected E:\demo\src\.amlignore
Collected E:\demo\src\component1\conda.yaml
Collected E:\demo\src\component1\run.py
Collected E:\demo\src\library1\hello.py
Collected E:\demo\src\library2\greetings.py
Collected E:\demo\src\component1\component_spec.yaml
Ignored E:\demo\src\tests
Collected 6 files in 0.19 seconds, total size 2.11 KB
Collecting snapshot files to upload, only added or modified files will be uploaded.
Added library2\greetings.py
Added library1\hello.py
Added component1\run.py
Added component1\conda.yaml
Added component1\component_spec.yaml
Added .amlignore
Collected 6 files to upload in 0.01 seconds, total size 2.11 KB
Uploaded snapshot in 6.92 seconds.
...
The outputs printed can be divided into 3 parts:
Collecting files in snapshot. This part is the same as the component building process.
Collecting files to upload. When updating an existing snapshot, only added or modified files will be uploaded. This part shows which files will be uploaded. The above example creates a new component to workspace, so all 6 files need to be uploaded. See the following section for example of incremental update.
Uploading the snapshot. This part shows the total time when uploading the built snapshot. Uploading is usually the most time-consuming part. As shown in example, uploading all the files spent 6.92 seconds.
Component code snapshot advanced topics
Component code snapshot incremental update
Creating component code snapshot supports incremental update. After updating an existing component and create it again, only the modified contents will be uploaded.
For example, if we modify src/component1/component_spec.yaml, src/component1/run.py and add a file src/component1/entry.py in above example and run az ml component create again, we will get the following output.
>>>az ml component create --file component1\component_spec.yaml --verbose
Start collecting files in snapshot...
Collected E:\demo\src\.amlignore
Collected E:\demo\src\component1\component_spec.yaml
Collected E:\demo\src\component1\conda.yaml
Collected E:\demo\src\component1\entry.py
Collected E:\demo\src\component1\run.py
Collected E:\demo\src\library1\hello.py
Collected E:\demo\src\library2\greetings.py
Ignored E:\demo\src\tests
Collected 7 files in 0.11 seconds, total size 0.84 KB
Collecting snapshot files to upload, only added or modified files will be uploaded.
Modified component1\run.py
Added component1\entry.py
Modified component1\component_spec.yaml
Collected 3 files to upload in 0.01 seconds, total size 0.38 KB
Uploaded snapshot in 1.32 seconds.
Compared to local cache, we found 2 file modified and 1 file added. So we only uploaded those 3 files.
Component code snapshot cache
Component cache helps to reduce unnecessary call to upload component code snapshot. As long as a component code snapshot has been created once, creating component code snapshot with same content repeatedly will not trigger component code snapshot upload.
We are using snapshot file content and file names as the hash key to determine if a snapshot with the same content exists. Use the following table to check if snapshot cache hits.
| Snapshot folder name | Snapshot item content | Snapshot item file name | Reuse |
|---|---|---|---|
| Same/Not same | Same | Same | True |
| Same/Not same | Not same | Same/Not same | False |
| Same/Not same | Same | Not same | False |
For example, for the following 4 components, assume all components code folder is yaml files folder.
component1:
component/
conda.yaml, content="old content"
component_spec.yaml, content="old content"
run.py, content="old content"
component2:
component_new/
conda.yaml, content="old content"
component_spec.yaml, content="old content"
run.py, content="old content"
component3:
component/
conda.yaml, content="old content"
component_spec_new.yaml, content="old content"
run.py, content="old content"
component4:
component_new/
conda.yaml, content="old content"
component_spec.yaml, content="new content"
run.py, content="old content"
For component1 and component2, the snapshot folder’s name is different(“component” for component1, “component_new” for component2), snapshot item content and file name are the same. They will be considered as the same snapshot.
For component1 and component3, the snapshot item file name is different(“component_spec.yaml” for component1, “component_spec_new.yaml” for component3), snapshot folder name and snapshot item content are the same. They will not be considered as the same snapshot.
For component1 and component4, the snapshot item content is different(for “component_spec.yaml”, the content is “old content” for component1, “new content” for component4), snapshot folder name and snapshot item file name are the same. They will not be considered as the same snapshot.
Note: Windows and Unix has different line endings, so create snapshot with “same” contents in different OS will get different hash key, thus snapshot cache will not hit. If you are using Git, reference configuring Git to handle line endings to help you handle different line endings.
For example, if we copy the src/ folder in above example and paste it to another location in the same machine or another machine, running az ml component create again with the same workspace, we will get the following output.
>>>az ml component create --file component1\component_spec.yaml --verbose
Start collecting files in snapshot...
Collected E:\demo\src\.amlignore
Collected E:\demo\src\component1\component_spec.yaml
Collected E:\demo\src\component1\conda.yaml
Collected E:\demo\src\component1\run.py
Collected E:\demo\src\library1\hello.py
Collected E:\demo\src\library2\greetings.py
Ignored E:\demo\src\tests
Collected 6 files in 0.19 seconds, total size 2.11 KB
Found remote cache of snapshot, reused remote cached snapshot.
The output shows the component code snapshot cache hits, so the snapshot won’t be uploaded again.
Use additional aml ignore file to build proper snapshot
In some cases we are developing multiple components in the same base folder.
For example, we added another component to above example and get the following project structure:
library1/
hello.py
library2/
greetings.py
component1/
conda.yaml
component_spec.yaml
run.py
component2/
component_spec.yaml
run.py
tests/
.amlignore
If we create “component2” to workspace, contents inside src/component1 will also be collected like the following output.
az ml component create --file component2\component_spec.yaml --verbose
Start collecting files in snapshot...
Collected E:\demo\src\.amlignore
Collected E:\demo\src\component1\component_spec.yaml
Collected E:\demo\src\component1\conda.yaml
Collected E:\demo\src\component1\entry.py
Collected E:\demo\src\component1\run.py
Collected E:\demo\src\component2\component_spec.yaml
Collected E:\demo\src\component2\entry.py
Collected E:\demo\src\library1\hello.py
Collected E:\demo\src\library2\greetings.py
Ignored E:\demo\src\tests
Collected 9 files in 0.14 seconds, total size 1.01 KB
...
We may not want that behavior, we can use the --amlignore-file parameter in az ml component create.
Take previous project as an example, if we add a file src/component2/component2.amlignore with the following content, contents inside src/component1 will be excluded.
component1/
az ml component create --file component2\component_spec.yaml --verbose
Start collecting files in snapshot...
Collected E:\demo\src\.amlignore
Ignored E:\demo\src\component1
Collected E:\demo\src\component2\component2.amlignore
Collected E:\demo\src\component2\component_spec.yaml
Collected E:\demo\src\component2\entry.py
Collected E:\demo\src\library1\hello.py
Collected E:\demo\src\library2\greetings.py
Ignored E:\demo\src\tests
Collected 6 files in 0.14 seconds, total size 0.43 KB
...