Skip to main content
Version: 0.7.0


This document gives you a quick view on the basic usage of Submarine platform. You can finish each step of ML model lifecycle on the platform without messing up with the troublesome environment problems.


Prepare a Kubernetes cluster

  1. Prerequisite
  1. Start minikube cluster
minikube start --vm-driver=docker --cpus 8 --memory 4096 --kubernetes-version v1.21.2

Launch submarine in the cluster

  1. Clone the project
git clone
  1. Install the submarine operator and dependencies by helm chart
cd submarine
helm install submarine ./helm-charts/submarine
  1. Create a Submarine custom resource and the operator will create the submarine server, database, etc. for us.
kubectl apply -f submarine-cloud-v2/artifacts/examples/example-submarine.yaml

Ensure submarine is ready

  1. Use kubectl to query the status of pods
kubectl get pods
  1. Make sure each pod is Running
NAME                                              READY   STATUS    RESTARTS   AGE
notebook-controller-deployment-5d4f5f874c-mnbc8 1/1 Running 0 61m
pytorch-operator-844c866d54-xm8nl 1/1 Running 2 61m
submarine-database-85bd68dbc5-qggtm 1/1 Running 0 11m
submarine-minio-76465444f6-hdgdp 1/1 Running 0 11m
submarine-mlflow-75f86d8f4d-rj2z7 1/1 Running 0 11m
submarine-operator-5dd79cdf86-gpm2p 1/1 Running 0 61m
submarine-server-68985b767-vjdvx 1/1 Running 0 11m
submarine-tensorboard-5df8499fd4-vnklf 1/1 Running 0 11m
submarine-traefik-7cbcfd4bd9-wbf8b 1/1 Running 0 61m
tf-job-operator-6bb69fd44-zmlmr 1/1 Running 1 61m

Connect to workbench

  1. Exposing service

    # Method 1 -- use minikube ip
    minikube ip # you'll get the IP address of minikube, ex:

    # Method 2 -- use port-forwarding
    kubectl port-forward --address service/submarine-traefik 32080:80
  2. View workbench If you use method 1, go to http://{minikube ip}:32080. For example, If you use method 2, go to

Example: Submit a mnist distributed example

We put the code of this example here. is our training script, and is the script to build a docker image.

1. Write a python script for distributed training

Take a simple mnist tensorflow script as an example. We choose MultiWorkerMirroredStrategy as our distributed strategy.


import tensorflow_datasets as tfds
import tensorflow as tf
from tensorflow.keras import layers, models
import submarine

def make_datasets_unbatched():

# Scaling MNIST data from (0, 255] to (0., 1.]
def scale(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label

datasets, _ = tfds.load(name='mnist', with_info=True, as_supervised=True)

return datasets['train'].map(scale).cache().shuffle(BUFFER_SIZE)

def build_and_compile_cnn_model():
model = models.Sequential()
layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))



return model

def main():
strategy = tf.distribute.experimental.MultiWorkerMirroredStrategy(

BATCH_SIZE = BATCH_SIZE_PER_REPLICA * strategy.num_replicas_in_sync

with strategy.scope():
ds_train = make_datasets_unbatched().batch(BATCH_SIZE).repeat()
options =
options.experimental_distribute.auto_shard_policy = \
ds_train = ds_train.with_options(options)
# Model building/compiling need to be within `strategy.scope()`.
multi_worker_model = build_and_compile_cnn_model()

class MyCallback(tf.keras.callbacks.Callback):
def on_epoch_end(self, epoch, logs=None):
# monitor the loss and accuracy
submarine.log_metrics({"loss": logs["loss"], "accuracy": logs["accuracy"]}, epoch), epochs=10, steps_per_epoch=70, callbacks=[MyCallback()])

if __name__ == '__main__':

2. Prepare an environment compatible with the training

Build a docker image equipped with the requirement of the environment.

eval $(minikube docker-env)

3. Submit the experiment

  1. Open submarine workbench and click + New Experiment

  2. Choose Define your experiment

  3. Fill the form accordingly. Here we set 3 workers.

    1. Step 1
    2. Step 2
    3. Step 3
    4. The experiment is successfully submitted

4. Monitor the process

  1. In our code, we use submarine from submarine-sdk to record the metrics. To see the result, click corresponding experiment with name mnist-example in the workbench.
  2. To see the metrics of each worker, you can select a worker from the left top list.

5. Serve the model (In development)