Skip to main content
Version: master 🏃


This document gives you a quick view on the basic usage of Submarine platform. You can finish each step of ML model lifecycle on the platform without messing up with the troublesome environment problems.


Prepare a Kubernetes cluster

  1. Prerequisite
  1. Start minikube cluster and install Istio
minikube start --vm-driver=docker --cpus 8 --memory 8192 --kubernetes-version v1.21.2
istioctl install -y
# Or if you want to support Pod Security Policy (, you can use the following command to start cluster
minikube start --extra-config=apiserver.enable-admission-plugins=PodSecurityPolicy --addons=pod-security-policy --vm-driver=docker --cpus 8 --memory 4096 --kubernetes-version v1.21.2

Launch submarine in the cluster

  1. Clone the project
git clone
cd submarine
  1. Create necessary namespaces
kubectl create namespace submarine
kubectl create namespace submarine-user-test
kubectl label namespace submarine istio-injection=enabled
kubectl label namespace submarine-user-test istio-injection=enabled
  1. Install the submarine operator and dependencies by helm chart
helm install submarine ./helm-charts/submarine -n submarine
  1. Create a Submarine custom resource and the operator will create the submarine server, database, etc. for us.
kubectl apply -f submarine-cloud-v2/artifacts/examples/example-submarine.yaml -n submarine-user-test

Ensure submarine is ready

$ kubectl get pods -n submarine
notebook-controller-deployment-66d85984bf-x562z 1/1 Running 0 7h7m
pytorch-operator-7d778f4859-g7xph 2/2 Running 0 7h7m
tf-job-operator-7d895bf77c-75n72 2/2 Running 0 7h7m

$ kubectl get pods -n submarine-user-test
submarine-database-bdcb77549-rq2ds 2/2 Running 0 7h6m
submarine-minio-686b8777ff-zg4d2 2/2 Running 0 7h6m
submarine-mlflow-68c5559dcb-lkq4g 2/2 Running 0 7h6m
submarine-server-7c6d7bcfd8-5p42w 2/2 Running 0 9m33s
submarine-tensorboard-57c5b64778-t4lww 2/2 Running 0 7h6m

Connect to workbench

  1. Exposing service
kubectl port-forward --address -n istio-system service/istio-ingressgateway 32080:80
  1. View workbench

Go to

Example: Submit a mnist distributed example

We put the code of this example here. is our training script, and is the script to build a docker image.

1. Write a python script for distributed training

Take a simple mnist tensorflow script as an example. We choose MultiWorkerMirroredStrategy as our distributed strategy.


import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.keras import layers, models
import submarine

def make_datasets_unbatched():

# Scaling MNIST data from (0, 255] to (0., 1.]
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label

datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True)

return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)

def build_and_compile_cnn_model():
model = models.Sequential()
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))



return model

def main():
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(

BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync

with strategy.scope():
ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat()
options =
options.experimental_distribute.auto_shard_policy = \
ds_train = ds_train.with_options(options)
# Model building/compiling need to be within `strategy.scope()`.
multi_worker_model = build_and_compile_cnn_model()

class MyCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
# monitor the loss and accuracy
submarine.log_metrics({"loss": logs["loss"], "accuracy": logs["accuracy"]}, epoch), epochs=10, steps_per_epoch=70, callbacks=[MyCallback()])

if __name__ == '__main__':

2. Prepare an environment compatible with the training

Build a docker image equipped with the requirement of the environment.

eval $(minikube docker-env)

3. Submit the experiment

  1. Open submarine workbench and click + New Experiment

  2. Choose Define your experiment

  3. Fill the form accordingly. Here we set 3 workers.

    1. Step 1
    2. Step 2
    3. Step 3
    4. The experiment is successfully submitted

4. Monitor the process

  1. In our code, we use submarine from submarine-sdk to record the metrics. To see the result, click corresponding experiment with name mnist-example in the workbench.
  2. To see the metrics of each worker, you can select a worker from the left top list.

5. Serve the model (In development)