Experiment Client
class ExperimentClient()β
Client of a submarine server that creates and manages experients and logs.
create_experiment(experiment_spec) -> dict
β
Create an experiment.
Param | Type | Description | Default Value |
---|---|---|---|
experiment_spec | Dict | Submarine experiment spec. More detailed information can be found at Experiment API | x |
Returns
The detailed info about the submarine experiment.
Example
from submarine import *
client = ExperimentClient()
client.create_experiment({
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Ps": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
},
"Worker": {
"replicas": 1,
"resources": "cpu=1,memory=1024M"
}
}
})
patch_experiment(id, experiment_spec) -> dict
β
Patch an experiment.
Param | Type | Description | Default Value |
---|---|---|---|
id | String | Submarine experiment id. | x |
experiment_spec | Dict | Submarine experiment spec. More detailed information of Submarine experiment spec can be found at Experiment API. | x |
Returns
The detailed info about the submarine experiment.
Example
client.patch_experiment("experiment_1626160071451_0008", {
"meta": {
"name": "tf-mnist-json",
"namespace": "default",
"framework": "TensorFlow",
"cmd": "python /var/tf_mnist/mnist_with_summaries.py --log_dir=/train/log --learning_rate=0.01 --batch_size=150",
"envVars": {
"ENV_1": "ENV1"
}
},
"environment": {
"image": "apache/submarine:tf-mnist-with-summaries-1.0"
},
"spec": {
"Worker": {
"replicas": 2,
"resources": "cpu=1,memory=1024M"
}
}
})
get_experiment(id) -> dict
β
Get the experiment's detailed info by id.
Param | Type | Description | Default Value |
---|---|---|---|
id | String | Submarine experiment id. | x |
Returns
The detailed info about the submarine experiment.
Example
experiment = client.get_experiment("experiment_1626160071451_0008")
list_experiments(status) -> list[dict]
β
List all experiment for the user.
Param | Type | Description | Default Value |
---|---|---|---|
status | Optional[str] | Accepted, Created, Running, Succeeded, Deleted. | None |
Returns
List of submarine experiments.
Example
experiments = client.list_experiments()
delete_experiment(id) -> dict
β
Delete the submarine experiment.
Param | Type | Description | Default Value |
---|---|---|---|
id | String | Submarine experiment id. | x |
Returns
The detailed info about the deleted submarine experiment.
Example
client.delete_experiment("experiment_1626160071451_0008")
get_log(id, onlyMaster)
β
Print training logs of all pod of the experiment. By default print all the logs of Pod.
Param | Type | Description | Default Value |
---|---|---|---|
id | String | Submarine experiment id. | x |
onlyMaster | Optional[bool] | By default include pod log of "master" which might be Tensorflow PS/Chief or PyTorch master. | x |
Return
- The info of pod logs
Example
client.get_log("experiment_1626160071451_0009")
list_log(status)
β
List experiment log.
Param | Type | Description | Default Value |
---|---|---|---|
status | String | Accepted, Created, Running, Succeeded, Deleted. | x |
Returns
List of submarine experiment logs.
Example
logs = client.list_log("Succeeded")
wait_for_finish(id, polling_interval)
β
Waits until the experiment is finished or failed.
Param | Type | Description | Default Value |
---|---|---|---|
id | String | Submarine experiment id. | x |
polling_interval | Optional[int] | How many seconds between two polls for the status of the experiment. | 10 |
Returns
Submarine experiment logs.
Example
logs = client.wait_for_finish("experiment_1626160071451_0009", 5)