Experiment REST API

Note: The Experiment API is in the alpha stage which is subjected to incompatible changes in future releases.

Create Experiment (Using Anonymous/Embedded Environment)#

POST /api/v1/experiment

Example Request

curl -X POST -H "Content-Type: application/json" -d '
{
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=2048M"
}
}
}
' http://127.0.0.1:32080/api/v1/experiment

Example Response:

{
"status": "OK",
"code": 200,
"result": {
"experimentId": "experiment_1586156073228_0001",
"name": "tf-mnist-json",
"uid": "28e39dcd-77d4-11ea-8dbb-0242ac110003",
"status": "Accepted",
"acceptedTime": "2020-06-13T22:59:29.000+08:00",
"spec": {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=2048M"
}
}
}
}
}

Create Experiment (Using Pre-defined/Stored Environment)#

POST /api/v1/experiment

Example Request

curl -X POST -H "Content-Type: application/json" -d '
{
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"name": "my-submarine-env"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=2048M"
}
}
}
' http://127.0.0.1:32080/api/v1/experiment

Above example assume environment "my-submarine-env" already exists in Submarine. Please refer Environment API Reference doc to Create/Update/Delete/List Environment REST API's

Example Response:

{
"status": "OK",
"code": 200,
"result": {
"experimentId": "experiment_1586156073228_0001",
"name": "tf-mnist-json",
"uid": "28e39dcd-77d4-11ea-8dbb-0242ac110003",
"status": "Accepted",
"acceptedTime": "2020-06-13T22:59:29.000+08:00",
"spec": {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"name": "my-submarine-env"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=2048M"
}
}
}
}
}

List experiment#

GET /api/v1/experiment

Example Request:

curl -X GET http://127.0.0.1:32080/api/v1/experiment

Example Response:

{
"status": "OK",
"code": 200,
"result": [
{
"experimentId": "experiment_1592057447228_0001",
"name": "tf-mnist-json",
"uid": "28e39dcd-77d4-11ea-8dbb-0242ac110003",
"status": "Accepted",
"acceptedTime": "2020-06-13T22:59:29.000+08:00",
"spec": {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=2048M"
}
}
}
},
{
"experimentId": "experiment_1592057447228_0002",
"name": "mnist",
"uid": "38e39dcd-77d4-11ea-8dbb-0242ac110003",
"status": "Accepted",
"acceptedTime": "2020-06-13T22:19:29.000+08:00",
"spec": {
"meta": {
"name": "pytorch-mnist-json",
"namespace": "default",
"framework": "PyTorch",
"cmd": "python /var/mnist.py --backend gloo",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:pytorch-dist-mnist-1.0"
},
"spec": {
"Master": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
}
}
}
}
]
}

Get experiment#

GET /api/v1/experiment/{id}

Example Request:

curl -X GET http://127.0.0.1:32080/api/v1/experiment/experiment_1592057447228_0001

Example Response:

{
"status": "OK",
"code": 200,
"result": {
"experimentId": "experiment_1592057447228_0001",
"name": "tf-mnist-json",
"uid": "28e39dcd-77d4-11ea-8dbb-0242ac110003",
"status": "Accepted",
"acceptedTime": "2020-06-13T22:59:29.000+08:00",
"spec": {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=2048M"
}
}
}
}
}

Patch experiment#

PATCH /api/v1/experiment/{id}

Example Request:

curl -X PATCH -H "Content-Type: application/json" -d '
{
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 2,
"resources": "cpu=1,memory=2048M"
}
}
}
' http://127.0.0.1:32080/api/v1/experiment/experiment_1592057447228_0001

Example Response:

{
"status": "OK",
"code": 200,
"success": true,
"result": {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 2,
"resources": "cpu=1,memory=2048M"
}
}
}
}

Delete experiment#

GET /api/v1/experiment/{id}

Example Request:

curl -X DELETE http://127.0.0.1:32080/api/v1/experiment/experiment_1592057447228_0001

Example Response:

{
"status": "OK",
"code": 200,
"result": {
"experimentId": "experiment_1586156073228_0001",
"name": "tf-mnist-json",
"uid": "28e39dcd-77d4-11ea-8dbb-0242ac110003",
"status": "Accepted",
"acceptedTime": "2020-06-13T22:59:29.000+08:00",
"spec": {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 2,
"resources": "cpu=1,memory=2048M"
}
}
}
}
}

List experiment Log#

GET /api/v1/experiment/logs

Example Request:

curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs

Example Response:

{
"status": "OK",
"code": 200,
"success": null,
"message": null,
"result": [
{
"experimentId": "experiment_1589199154923_0001",
"logContent": [
{
"podName": "mnist-worker-0",
"podLog": null
}
]
},
{
"experimentId": "experiment_1589199154923_0002",
"logContent": [
{
"podName": "pytorch-dist-mnist-gloo-master-0",
"podLog": null
},
{
"podName": "pytorch-dist-mnist-gloo-worker-0",
"podLog": null
}
]
}
],
"attributes": {}
}

Get experiment Log#

GET /api/v1/experiment/logs/{id}

Example Request:

curl -X GET http://127.0.0.1:32080/api/v1/experiment/logs/experiment_1589199154923_0002

Example Response:

{
"status": "OK",
"code": 200,
"success": null,
"message": null,
"result": {
"experimentId": "experiment_1589199154923_0002",
"logContent": [
{
"podName": "pytorch-dist-mnist-gloo-master-0",
"podLog": "Using distributed PyTorch with gloo backend\nDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\nProcessing...\nDone!\nTrain Epoch: 1 [0/60000 (0%)]\tloss=2.3000\nTrain Epoch: 1 [640/60000 (1%)]\tloss=2.2135\nTrain Epoch: 1 [1280/60000 (2%)]\tloss=2.1704\nTrain Epoch: 1 [1920/60000 (3%)]\tloss=2.0766\nTrain Epoch: 1 [2560/60000 (4%)]\tloss=1.8679\nTrain Epoch: 1 [3200/60000 (5%)]\tloss=1.4135\nTrain Epoch: 1 [3840/60000 (6%)]\tloss=1.0003\nTrain Epoch: 1 [4480/60000 (7%)]\tloss=0.7762\nTrain Epoch: 1 [5120/60000 (9%)]\tloss=0.4598\nTrain Epoch: 1 [5760/60000 (10%)]\tloss=0.4860\nTrain Epoch: 1 [6400/60000 (11%)]\tloss=0.4389\nTrain Epoch: 1 [7040/60000 (12%)]\tloss=0.4084\nTrain Epoch: 1 [7680/60000 (13%)]\tloss=0.4602\nTrain Epoch: 1 [8320/60000 (14%)]\tloss=0.4289\nTrain Epoch: 1 [8960/60000 (15%)]\tloss=0.3990\nTrain Epoch: 1 [9600/60000 (16%)]\tloss=0.3852\n"
},
{
"podName": "pytorch-dist-mnist-gloo-worker-0",
"podLog": "Using distributed PyTorch with gloo backend\nDownloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz\nDownloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz\nProcessing...\nDone!\nTrain Epoch: 1 [0/60000 (0%)]\tloss=2.3000\nTrain Epoch: 1 [640/60000 (1%)]\tloss=2.2135\nTrain Epoch: 1 [1280/60000 (2%)]\tloss=2.1704\nTrain Epoch: 1 [1920/60000 (3%)]\tloss=2.0766\nTrain Epoch: 1 [2560/60000 (4%)]\tloss=1.8679\nTrain Epoch: 1 [3200/60000 (5%)]\tloss=1.4135\nTrain Epoch: 1 [3840/60000 (6%)]\tloss=1.0003\nTrain Epoch: 1 [4480/60000 (7%)]\tloss=0.7762\nTrain Epoch: 1 [5120/60000 (9%)]\tloss=0.4598\nTrain Epoch: 1 [5760/60000 (10%)]\tloss=0.4860\nTrain Epoch: 1 [6400/60000 (11%)]\tloss=0.4389\nTrain Epoch: 1 [7040/60000 (12%)]\tloss=0.4084\nTrain Epoch: 1 [7680/60000 (13%)]\tloss=0.4602\nTrain Epoch: 1 [8320/60000 (14%)]\tloss=0.4289\nTrain Epoch: 1 [8960/60000 (15%)]\tloss=0.3990\nTrain Epoch: 1 [9600/60000 (16%)]\tloss=0.3852\n"
}
]
},
"attributes": {}
}