This document gives you a quick view on the basic usage of Submarine platform. You can finish each step of ML model lifecycle on the platform without messing up with the troublesome environment problems.
- Check dependency page for the compatible version
- helm (Helm v3 is minimum requirement.)
- Start minikube cluster
- Clone the project
- Install the resources by helm chart
- Use kubectl to query the status of pods
- Make sure each pod is
We put the code of this example here.
train.py is our training script, and
build.sh is the script to build a docker image.
Take a simple mnist tensorflow script as an example. We choose
MultiWorkerMirroredStrategy as our distributed strategy.
Build a docker image equipped with the requirement of the environment.
Open submarine workbench and click
+ New Experiment
Fill the form accordingly. Here we set 3 workers.
- Step 1
- Step 2
- Step 3
- The experiment is successfully submitted
In our code, we use
submarine-sdkto record the metrics. To see the result, click
MLflow UIin the workbench.
To compare the metrics of each worker, you can select all workers and then click