[SageMaker] Built-in algorithms
一些资源
类型:https://aws.amazon.com/ec2/instance-types/
Elastic Inference:便宜的GPU功能。
核心步骤
一、内置 docker images
from sagemaker.amazon.amazon_estimator import get_image_uri container = get_image_uri(boto3.Session().region_name, 'linear-learner')
参数对应的是json文件。
sh-4.2$ ls /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/ algorithm.py content_types.py git_utils.py logs.py parameter.py serializers.py tuner.py amazon debugger image_uri_config metadata_properties.py pipeline.py session.py user_agent.py analytics.py deprecations.py image_uris.py model_metrics.py predictor.py sklearn utils.py apiutils deserializers.py __init__.py model_monitor processing.py spark vpc_utils.py automl estimator.py inputs.py model.py __pycache__ sparkml workflow chainer exceptions.py job.py multidatamodel.py pytorch _studio.py xgboost clarify.py feature_store lineage mxnet rl tensorflow cli fw_utils.py local network.py s3.py transformer.py
sh-4.2$ ls /home/ec2-user/anaconda3/envs/tensorflow_p36/lib/python3.6/site-packages/sagemaker/image_uri_config/ -l total 304 -rw-r--r-- 1 ec2-user ec2-user 1300 Dec 15 22:46 blazingtext.json -rw-r--r-- 1 ec2-user ec2-user 4062 Dec 15 22:46 chainer.json -rw-r--r-- 1 ec2-user ec2-user 1323 Dec 15 22:46 clarify.json -rw-r--r-- 1 ec2-user ec2-user 2728 Dec 15 22:46 coach-mxnet.json -rw-r--r-- 1 ec2-user ec2-user 7536 Dec 15 22:46 coach-tensorflow.json -rw-r--r-- 1 ec2-user ec2-user 1299 Dec 15 22:46 debugger.json -rw-r--r-- 1 ec2-user ec2-user 1311 Dec 15 22:46 factorization-machines.json -rw-r--r-- 1 ec2-user ec2-user 1307 Dec 15 22:46 forecasting-deepar.json -rw-r--r-- 1 ec2-user ec2-user 1309 Dec 15 22:46 image-classification.json -rw-r--r-- 1 ec2-user ec2-user 1257 Dec 15 22:46 image-classification-neo.json -rw-r--r-- 1 ec2-user ec2-user 328 Dec 15 22:46 inferentia-mxnet.json -rw-r--r-- 1 ec2-user ec2-user 330 Dec 15 22:46 inferentia-pytorch.json -rw-r--r-- 1 ec2-user ec2-user 334 Dec 15 22:46 inferentia-tensorflow.json -rw-r--r-- 1 ec2-user ec2-user 1299 Dec 15 22:46 ipinsights.json -rw-r--r-- 1 ec2-user ec2-user 1295 Dec 15 22:46 kmeans.json -rw-r--r-- 1 ec2-user ec2-user 1291 Dec 15 22:46 knn.json -rw-r--r-- 1 ec2-user ec2-user 877 Dec 15 22:46 lda.json -rw-r--r-- 1 ec2-user ec2-user 1303 Dec 15 22:46 linear-learner.json -rw-r--r-- 1 ec2-user ec2-user 1211 Dec 15 22:46 model-monitor.json -rw-r--r-- 1 ec2-user ec2-user 37785 Dec 15 22:46 mxnet.json -rw-r--r-- 1 ec2-user ec2-user 1615 Dec 15 22:46 neo-mxnet.json -rw-r--r-- 1 ec2-user ec2-user 1490 Dec 15 22:46 neo-pytorch.json -rw-r--r-- 1 ec2-user ec2-user 1666 Dec 15 22:46 neo-tensorflow.json -rw-r--r-- 1 ec2-user ec2-user 1292 Dec 15 22:46 ntm.json -rw-r--r-- 1 ec2-user ec2-user 1299 Dec 15 22:46 object2vec.json -rw-r--r-- 1 ec2-user ec2-user 1305 Dec 15 22:46 object-detection.json -rw-r--r-- 1 ec2-user ec2-user 1292 Dec 15 22:46 pca.json -rw-r--r-- 1 ec2-user ec2-user 25738 Dec 15 22:46 pytorch.json -rw-r--r-- 1 ec2-user ec2-user 1304 Dec 15 22:46 randomcutforest.json -rw-r--r-- 1 ec2-user ec2-user 909 Dec 15 22:46 ray-pytorch.json -rw-r--r-- 1 ec2-user ec2-user 7008 Dec 15 22:46 ray-tensorflow.json -rw-r--r-- 1 ec2-user ec2-user 1310 Dec 15 22:46 semantic-segmentation.json -rw-r--r-- 1 ec2-user ec2-user 1296 Dec 15 22:46 seq2seq.json -rw-r--r-- 1 ec2-user ec2-user 2664 Dec 15 22:46 sklearn.json -rw-r--r-- 1 ec2-user ec2-user 2908 Dec 15 22:46 spark.json -rw-r--r-- 1 ec2-user ec2-user 2553 Dec 15 22:46 sparkml-serving.json -rw-r--r-- 1 ec2-user ec2-user 75456 Dec 15 22:46 tensorflow.json -rw-r--r-- 1 ec2-user ec2-user 857 Dec 15 22:46 vw.json -rw-r--r-- 1 ec2-user ec2-user 6536 Dec 15 22:46 xgboost.json -rw-r--r-- 1 ec2-user ec2-user 1244 Dec 15 22:46 xgboost-neo.json
二、配置参数,并训练
注意:不同的Algorithm,应该有属于自己的 参数设置套路,所以
set_hyperparameters() 有一部分是公共的,有一部分则是专指的。
linear = sagemaker.estimator.Estimator(container, role, train_instance_count = 1, train_instance_type = 'ml.c4.xlarge', output_path = output_location, sagemaker_session = sagemaker_session) # We can tune parameters like the number of features that we are passing in, type of predictor like 'regressor' or 'classifier', mini batch size, epochs # Train 32 different versions of the model and will get the best out of them (built-in parameters optimization!) linear.set_hyperparameters(feature_dim = 1, predictor_type = 'regressor', mini_batch_size = 5, epochs = 5, num_models = 32, loss = 'absolute_loss') # Now we are ready to pass in the training data from S3 to train the linear learner model linear.fit({'train': s3_train_data})
-
Spot instance
# We have pass in the container, the type of instance that we would like to use for training # output path and sagemaker session into the Estimator. # We can also specify how many instances we would like to use for training linear = sagemaker.estimator.Estimator(container, role, train_instance_count = 1, train_instance_type = 'ml.c4.xlarge', output_path = output_location, sagemaker_session = sagemaker_session) # We can tune parameters like the number of features that we are passing in, type of predictor like 'regressor' or 'classifier', mini batch size, epochs # Train 32 different versions of the model and will get the best out of them (built-in parameters optimization!) linear.set_hyperparameters(feature_dim = 1, predictor_type = 'regressor', mini_batch_size = 5, epochs = 5, num_models = 32, loss = 'absolute_loss') # Now we are ready to pass in the training data from S3 to train the linear learner model linear.fit({'train': s3_train_data}) # Let's see the progress using cloudwatch logs train_instance_count has been renamed in sagemaker>=2. See: https://sagemaker.readthedocs.io/en/stable/v2.html for details. train_instance_type has been renamed in sagemaker>=2. See: https://sagemaker.readthedocs.io/en/stable/v2.html for details. 2021-01-08 01:51:33 Starting - Starting the training job... 2021-01-08 01:51:57 Starting - Launching requested ML instancesProfilerReport-1610070693: InProgress ......... 2021-01-08 01:53:18 Starting - Preparing the instances for training......... 2021-01-08 01:55:03 Downloading - Downloading input data 2021-01-08 01:55:03 Training - Downloading the training image... 2021-01-08 01:55:34 Uploading - Uploading generated training modelDocker entrypoint called with argument(s): train Running default environment configuration script [01/08/2021 01:55:29 INFO 140319456966464] Reading default configuration from /opt/amazon/lib/python2.7/site-packages/algorithm/resources/default-input.json: {u'loss_insensitivity': u'0.01', u'epochs': u'15', u'feature_dim': u'auto', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'auto', u'target_recall': u'0.8', u'num_models': u'auto', u'early_stopping_patience': u'3', u'momentum': u'auto', u'unbias_label': u'auto', u'wd': u'auto', u'optimizer': u'auto', u'_tuning_objective_metric': u'', u'early_stopping_tolerance': u'0.001', u'learning_rate': u'auto', u'_kvstore': u'auto', u'normalize_data': u'true', u'binary_classifier_model_selection_criteria': u'accuracy', u'use_lr_scheduler': u'true', u'target_precision': u'0.8', u'unbias_data': u'auto', u'init_scale': u'0.07', u'bias_wd_mult': u'auto', u'f_beta': u'1.0', u'mini_batch_size': u'1000', u'huber_delta': u'1.0', u'num_classes': u'1', u'beta_1': u'auto', u'loss': u'auto', u'beta_2': u'auto', u'_enable_profiler': u'false', u'normalize_label': u'auto', u'_num_gpus': u'auto', u'balance_multiclass_weights': u'false', u'positive_example_weight_mult': u'1.0', u'l1': u'auto', u'margin': u'1.0'} [01/08/2021 01:55:29 INFO 140319456966464] Merging with provided configuration from /opt/ml/input/config/hyperparameters.json: {u'loss': u'absolute_loss', u'mini_batch_size': u'5', u'predictor_type': u'regressor', u'epochs': u'5', u'feature_dim': u'1', u'num_models': u'32'} [01/08/2021 01:55:29 INFO 140319456966464] Final configuration: {u'loss_insensitivity': u'0.01', u'epochs': u'5', u'feature_dim': u'1', u'init_bias': u'0.0', u'lr_scheduler_factor': u'auto', u'num_calibration_samples': u'10000000', u'accuracy_top_k': u'3', u'_num_kv_servers': u'auto', u'use_bias': u'true', u'num_point_for_scaler': u'10000', u'_log_level': u'info', u'quantile': u'0.5', u'bias_lr_mult': u'auto', u'lr_scheduler_step': u'auto', u'init_method': u'uniform', u'init_sigma': u'0.01', u'lr_scheduler_minimum_lr': u'auto', u'target_recall': u'0.8', u'num_models': u'32', u'early_stopping_patience': u'3', u'momentum': u'auto', u'unbias_label': u'auto', u'wd': u'auto', u'optimizer': u'auto', u'_tuning_objective_metric': u'', u'early_stopping_tolerance': u'0.001', u'learning_rate': u'auto', u'_kvstore': u'auto', u'normalize_data': u'true', u'binary_classifier_model_selection_criteria': u'accuracy', u'use_lr_scheduler': u'true', u'target_precision': u'0.8', u'unbias_data': u'auto', u'init_scale': u'0.07', u'bias_wd_mult': u'auto', u'f_beta': u'1.0', u'mini_batch_size': u'5', u'huber_delta': u'1.0', u'num_classes': u'1', u'predictor_type': u'regressor', u'beta_1': u'auto', u'loss': u'absolute_loss', u'beta_2': u'auto', u'_enable_profiler': u'false', u'normalize_label': u'auto', u'_num_gpus': u'auto', u'balance_multiclass_weights': u'false', u'positive_example_weight_mult': u'1.0', u'l1': u'auto', u'margin': u'1.0'} [01/08/2021 01:55:29 WARNING 140319456966464] Loggers have already been setup. Process 1 is a worker. [01/08/2021 01:55:29 INFO 140319456966464] Using default worker. [01/08/2021 01:55:29 INFO 140319456966464] Checkpoint loading and saving are disabled. [2021-01-08 01:55:29.810] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 0, "duration": 14, "num_examples": 1, "num_bytes": 240} [01/08/2021 01:55:29 INFO 140319456966464] Create Store: local [2021-01-08 01:55:29.885] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 1, "duration": 73, "num_examples": 6, "num_bytes": 1344} [01/08/2021 01:55:29 INFO 140319456966464] Scaler algorithm parameters <algorithm.scaler.ScalerAlgorithmStable object at 0x7f9e542c4150> [01/08/2021 01:55:29 INFO 140319456966464] Scaling model computed with parameters: {'stdev_weight': [3.447675] <NDArray 1 @cpu(0)>, 'stdev_label': [30817.012] <NDArray 1 @cpu(0)>, 'mean_label': [86815.484] <NDArray 1 @cpu(0)>, 'mean_weight': [6.6559997] <NDArray 1 @cpu(0)>} [01/08/2021 01:55:29 INFO 140319456966464] nvidia-smi took: 0.0252740383148 secs to identify 0 gpus [01/08/2021 01:55:29 INFO 140319456966464] Number of GPUs being used: 0 #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Batches Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Number of Records Since Last Reset": {"count": 1, "max": 0, "sum": 0.0, "min": 0}, "Total Batches Seen": {"count": 1, "max": 7, "sum": 7.0, "min": 7}, "Total Records Seen": {"count": 1, "max": 33, "sum": 33.0, "min": 33}, "Max Records Seen Between Resets": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Reset Count": {"count": 1, "max": 2, "sum": 2.0, "min": 2}}, "EndTime": 1610070929.981967, "Dimensions": {"Host": "algo-1", "Meta": "init_train_data_iter", "Operation": "training", "Algorithm": "Linear Learner"}, "StartTime": 1610070929.981927} [2021-01-08 01:55:30.098] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 4, "duration": 115, "num_examples": 6, "num_bytes": 1344} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8914727306365967, "sum": 0.8914727306365967, "min": 0.8914727306365967}}, "EndTime": 1610070930.098491, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.098407} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9107946586608887, "sum": 0.9107946586608887, "min": 0.9107946586608887}}, "EndTime": 1610070930.098586, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.098566} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8938133049011231, "sum": 0.8938133049011231, "min": 0.8938133049011231}}, "EndTime": 1610070930.098642, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.098627} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9098072338104248, "sum": 0.9098072338104248, "min": 0.9098072338104248}}, "EndTime": 1610070930.09871, "Dimensions": {"model": 3, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.098692} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7638106727600098, "sum": 0.7638106727600098, "min": 0.7638106727600098}}, "EndTime": 1610070930.098775, "Dimensions": {"model": 4, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.098758} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7721358394622803, "sum": 0.7721358394622803, "min": 0.7721358394622803}}, "EndTime": 1610070930.098836, "Dimensions": {"model": 5, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.09882} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8552776432037353, "sum": 0.8552776432037353, "min": 0.8552776432037353}}, "EndTime": 1610070930.098897, "Dimensions": {"model": 6, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.09888} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8259963607788086, "sum": 0.8259963607788086, "min": 0.8259963607788086}}, "EndTime": 1610070930.098959, "Dimensions": {"model": 7, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.098943} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9457238101959229, "sum": 0.9457238101959229, "min": 0.9457238101959229}}, "EndTime": 1610070930.099023, "Dimensions": {"model": 8, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099005} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8797975444793701, "sum": 0.8797975444793701, "min": 0.8797975444793701}}, "EndTime": 1610070930.099084, "Dimensions": {"model": 9, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099067} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8862988662719726, "sum": 0.8862988662719726, "min": 0.8862988662719726}}, "EndTime": 1610070930.099144, "Dimensions": {"model": 10, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099127} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8719225311279297, "sum": 0.8719225311279297, "min": 0.8719225311279297}}, "EndTime": 1610070930.099205, "Dimensions": {"model": 11, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099188} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7714014148712158, "sum": 0.7714014148712158, "min": 0.7714014148712158}}, "EndTime": 1610070930.099264, "Dimensions": {"model": 12, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099247} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7968630409240722, "sum": 0.7968630409240722, "min": 0.7968630409240722}}, "EndTime": 1610070930.099322, "Dimensions": {"model": 13, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099306} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7701293849945068, "sum": 0.7701293849945068, "min": 0.7701293849945068}}, "EndTime": 1610070930.099384, "Dimensions": {"model": 14, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099367} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8032244873046875, "sum": 0.8032244873046875, "min": 0.8032244873046875}}, "EndTime": 1610070930.099443, "Dimensions": {"model": 15, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099426} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8564266204833985, "sum": 0.8564266204833985, "min": 0.8564266204833985}}, "EndTime": 1610070930.099502, "Dimensions": {"model": 16, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099485} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9061824989318847, "sum": 0.9061824989318847, "min": 0.9061824989318847}}, "EndTime": 1610070930.099561, "Dimensions": {"model": 17, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099544} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8827006435394287, "sum": 0.8827006435394287, "min": 0.8827006435394287}}, "EndTime": 1610070930.099618, "Dimensions": {"model": 18, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099601} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8800777053833008, "sum": 0.8800777053833008, "min": 0.8800777053833008}}, "EndTime": 1610070930.099676, "Dimensions": {"model": 19, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099661} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8192932605743408, "sum": 0.8192932605743408, "min": 0.8192932605743408}}, "EndTime": 1610070930.099734, "Dimensions": {"model": 20, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099717} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8341743469238281, "sum": 0.8341743469238281, "min": 0.8341743469238281}}, "EndTime": 1610070930.099798, "Dimensions": {"model": 21, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.09978} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7712790107727051, "sum": 0.7712790107727051, "min": 0.7712790107727051}}, "EndTime": 1610070930.099862, "Dimensions": {"model": 22, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099844} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8598332786560059, "sum": 0.8598332786560059, "min": 0.8598332786560059}}, "EndTime": 1610070930.09992, "Dimensions": {"model": 23, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099903} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8583495712280274, "sum": 0.8583495712280274, "min": 0.8583495712280274}}, "EndTime": 1610070930.099982, "Dimensions": {"model": 24, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.099965} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9238637828826904, "sum": 0.9238637828826904, "min": 0.9238637828826904}}, "EndTime": 1610070930.100066, "Dimensions": {"model": 25, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.100028} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9109923553466797, "sum": 0.9109923553466797, "min": 0.9109923553466797}}, "EndTime": 1610070930.100132, "Dimensions": {"model": 26, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.100114} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9007723808288575, "sum": 0.9007723808288575, "min": 0.9007723808288575}}, "EndTime": 1610070930.100193, "Dimensions": {"model": 27, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.100176} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9988768482208252, "sum": 0.9988768482208252, "min": 0.9988768482208252}}, "EndTime": 1610070930.100255, "Dimensions": {"model": 28, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.100238} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9991144561767578, "sum": 0.9991144561767578, "min": 0.9991144561767578}}, "EndTime": 1610070930.100318, "Dimensions": {"model": 29, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.1003} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9532950592041015, "sum": 0.9532950592041015, "min": 0.9532950592041015}}, "EndTime": 1610070930.100376, "Dimensions": {"model": 30, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.10036} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9513899612426758, "sum": 0.9513899612426758, "min": 0.9513899612426758}}, "EndTime": 1610070930.100444, "Dimensions": {"model": 31, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070930.100427} [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, epoch=0, train absolute_loss_objective <loss>=0.891472730637 [01/08/2021 01:55:30 INFO 140319456966464] #early_stopping_criteria_metric: host=algo-1, epoch=0, criteria=absolute_loss_objective, value=0.76381067276 [01/08/2021 01:55:30 INFO 140319456966464] Epoch 0: Loss improved. Updating best model [01/08/2021 01:55:30 INFO 140319456966464] Saving model for epoch: 0 [01/08/2021 01:55:30 INFO 140319456966464] Saved checkpoint to "/tmp/tmpdIq1JP/mx-mod-0000.params" [01/08/2021 01:55:30 INFO 140319456966464] #progress_metric: host=algo-1, completed 20 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Batches Since Last Reset": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Records Since Last Reset": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Total Batches Seen": {"count": 1, "max": 13, "sum": 13.0, "min": 13}, "Total Records Seen": {"count": 1, "max": 61, "sum": 61.0, "min": 61}, "Max Records Seen Between Resets": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Reset Count": {"count": 1, "max": 3, "sum": 3.0, "min": 3}}, "EndTime": 1610070930.113387, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 0}, "StartTime": 1610070929.982211} [01/08/2021 01:55:30 INFO 140319456966464] #throughput_metric: host=algo-1, train throughput=213.25872216 records/second [2021-01-08 01:55:30.202] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 6, "duration": 88, "num_examples": 6, "num_bytes": 1344} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.863304328918457, "sum": 0.863304328918457, "min": 0.863304328918457}}, "EndTime": 1610070930.202355, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.202271} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8826262474060058, "sum": 0.8826262474060058, "min": 0.8826262474060058}}, "EndTime": 1610070930.202428, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.202411} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8656449127197265, "sum": 0.8656449127197265, "min": 0.8656449127197265}}, "EndTime": 1610070930.202499, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20248} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8816387939453125, "sum": 0.8816387939453125, "min": 0.8816387939453125}}, "EndTime": 1610070930.202566, "Dimensions": {"model": 3, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.202549} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.4536254262924194, "sum": 0.4536254262924194, "min": 0.4536254262924194}}, "EndTime": 1610070930.202636, "Dimensions": {"model": 4, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.202618} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.46048373222351074, "sum": 0.46048373222351074, "min": 0.46048373222351074}}, "EndTime": 1610070930.202707, "Dimensions": {"model": 5, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20269} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7776957988739014, "sum": 0.7776957988739014, "min": 0.7776957988739014}}, "EndTime": 1610070930.202778, "Dimensions": {"model": 6, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.202761} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7680116891860962, "sum": 0.7680116891860962, "min": 0.7680116891860962}}, "EndTime": 1610070930.202849, "Dimensions": {"model": 7, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20283} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9175552082061768, "sum": 0.9175552082061768, "min": 0.9175552082061768}}, "EndTime": 1610070930.202917, "Dimensions": {"model": 8, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.2029} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.851629867553711, "sum": 0.851629867553711, "min": 0.851629867553711}}, "EndTime": 1610070930.202986, "Dimensions": {"model": 9, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.202969} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8581311321258545, "sum": 0.8581311321258545, "min": 0.8581311321258545}}, "EndTime": 1610070930.203046, "Dimensions": {"model": 10, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20303} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8437549877166748, "sum": 0.8437549877166748, "min": 0.8437549877166748}}, "EndTime": 1610070930.203103, "Dimensions": {"model": 11, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203087} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.46013286590576175, "sum": 0.46013286590576175, "min": 0.46013286590576175}}, "EndTime": 1610070930.203159, "Dimensions": {"model": 12, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203144} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7585850334167481, "sum": 0.7585850334167481, "min": 0.7585850334167481}}, "EndTime": 1610070930.203216, "Dimensions": {"model": 13, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203201} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.45908591270446775, "sum": 0.45908591270446775, "min": 0.45908591270446775}}, "EndTime": 1610070930.203264, "Dimensions": {"model": 14, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203251} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7606857442855834, "sum": 0.7606857442855834, "min": 0.7606857442855834}}, "EndTime": 1610070930.203319, "Dimensions": {"model": 15, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203302} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8283766746520996, "sum": 0.8283766746520996, "min": 0.8283766746520996}}, "EndTime": 1610070930.203376, "Dimensions": {"model": 16, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203361} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8780510807037354, "sum": 0.8780510807037354, "min": 0.8780510807037354}}, "EndTime": 1610070930.203431, "Dimensions": {"model": 17, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203415} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8546057891845703, "sum": 0.8546057891845703, "min": 0.8546057891845703}}, "EndTime": 1610070930.203491, "Dimensions": {"model": 18, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203474} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8519870853424072, "sum": 0.8519870853424072, "min": 0.8519870853424072}}, "EndTime": 1610070930.203548, "Dimensions": {"model": 19, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203532} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8046707248687744, "sum": 0.8046707248687744, "min": 0.8046707248687744}}, "EndTime": 1610070930.203603, "Dimensions": {"model": 20, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203587} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8087980937957764, "sum": 0.8087980937957764, "min": 0.8087980937957764}}, "EndTime": 1610070930.203661, "Dimensions": {"model": 21, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203644} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7922470092773437, "sum": 0.7922470092773437, "min": 0.7922470092773437}}, "EndTime": 1610070930.203722, "Dimensions": {"model": 22, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203705} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8162001514434815, "sum": 0.8162001514434815, "min": 0.8162001514434815}}, "EndTime": 1610070930.203787, "Dimensions": {"model": 23, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203769} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8707226181030273, "sum": 0.8707226181030273, "min": 0.8707226181030273}}, "EndTime": 1610070930.203844, "Dimensions": {"model": 24, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20383} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8967666244506836, "sum": 0.8967666244506836, "min": 0.8967666244506836}}, "EndTime": 1610070930.203897, "Dimensions": {"model": 25, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203883} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.886590166091919, "sum": 0.886590166091919, "min": 0.886590166091919}}, "EndTime": 1610070930.203927, "Dimensions": {"model": 26, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20392} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8860177230834961, "sum": 0.8860177230834961, "min": 0.8860177230834961}}, "EndTime": 1610070930.203973, "Dimensions": {"model": 27, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.203958} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9558624076843262, "sum": 0.9558624076843262, "min": 0.9558624076843262}}, "EndTime": 1610070930.204027, "Dimensions": {"model": 28, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.20401} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.955223560333252, "sum": 0.955223560333252, "min": 0.955223560333252}}, "EndTime": 1610070930.204115, "Dimensions": {"model": 29, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.204103} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 1.1211384010314942, "sum": 1.1211384010314942, "min": 1.1211384010314942}}, "EndTime": 1610070930.204166, "Dimensions": {"model": 30, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.204152} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 1.124903163909912, "sum": 1.124903163909912, "min": 1.124903163909912}}, "EndTime": 1610070930.20422, "Dimensions": {"model": 31, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.204205} [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, epoch=1, train absolute_loss_objective <loss>=0.863304328918 [01/08/2021 01:55:30 INFO 140319456966464] #early_stopping_criteria_metric: host=algo-1, epoch=1, criteria=absolute_loss_objective, value=0.453625426292 [01/08/2021 01:55:30 INFO 140319456966464] Epoch 1: Loss improved. Updating best model [01/08/2021 01:55:30 INFO 140319456966464] Saving model for epoch: 1 [01/08/2021 01:55:30 INFO 140319456966464] Saved checkpoint to "/tmp/tmpP1PqeI/mx-mod-0000.params" [01/08/2021 01:55:30 INFO 140319456966464] #progress_metric: host=algo-1, completed 40 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Batches Since Last Reset": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Records Since Last Reset": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Total Batches Seen": {"count": 1, "max": 19, "sum": 19.0, "min": 19}, "Total Records Seen": {"count": 1, "max": 89, "sum": 89.0, "min": 89}, "Max Records Seen Between Resets": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Reset Count": {"count": 1, "max": 4, "sum": 4.0, "min": 4}}, "EndTime": 1610070930.210985, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 1}, "StartTime": 1610070930.113669} [01/08/2021 01:55:30 INFO 140319456966464] #throughput_metric: host=algo-1, train throughput=287.364746587 records/second [2021-01-08 01:55:30.325] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 8, "duration": 114, "num_examples": 6, "num_bytes": 1344} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8409023666381836, "sum": 0.8409023666381836, "min": 0.8409023666381836}}, "EndTime": 1610070930.325966, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.325867} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8601664543151856, "sum": 0.8601664543151856, "min": 0.8601664543151856}}, "EndTime": 1610070930.326072, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326051} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8432429027557373, "sum": 0.8432429027557373, "min": 0.8432429027557373}}, "EndTime": 1610070930.326134, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326117} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8591790199279785, "sum": 0.8591790199279785, "min": 0.8591790199279785}}, "EndTime": 1610070930.326204, "Dimensions": {"model": 3, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326186} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.363139181137085, "sum": 0.363139181137085, "min": 0.363139181137085}}, "EndTime": 1610070930.326267, "Dimensions": {"model": 4, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.32625} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.3625786876678467, "sum": 0.3625786876678467, "min": 0.3625786876678467}}, "EndTime": 1610070930.326332, "Dimensions": {"model": 5, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326314} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.3816796064376831, "sum": 0.3816796064376831, "min": 0.3816796064376831}}, "EndTime": 1610070930.326393, "Dimensions": {"model": 6, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326376} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.3859238004684448, "sum": 0.3859238004684448, "min": 0.3859238004684448}}, "EndTime": 1610070930.326455, "Dimensions": {"model": 7, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326437} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8951538372039795, "sum": 0.8951538372039795, "min": 0.8951538372039795}}, "EndTime": 1610070930.326518, "Dimensions": {"model": 8, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.3265} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8291716480255127, "sum": 0.8291716480255127, "min": 0.8291716480255127}}, "EndTime": 1610070930.326579, "Dimensions": {"model": 9, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326562} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8357306098937989, "sum": 0.8357306098937989, "min": 0.8357306098937989}}, "EndTime": 1610070930.326639, "Dimensions": {"model": 10, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326622} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8212968921661377, "sum": 0.8212968921661377, "min": 0.8212968921661377}}, "EndTime": 1610070930.326701, "Dimensions": {"model": 11, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326684} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.358167200088501, "sum": 0.358167200088501, "min": 0.358167200088501}}, "EndTime": 1610070930.326763, "Dimensions": {"model": 12, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326746} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.3888401031494141, "sum": 0.3888401031494141, "min": 0.3888401031494141}}, "EndTime": 1610070930.326822, "Dimensions": {"model": 13, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326805} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.358795747756958, "sum": 0.358795747756958, "min": 0.358795747756958}}, "EndTime": 1610070930.326885, "Dimensions": {"model": 14, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326868} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.38814703464508055, "sum": 0.38814703464508055, "min": 0.38814703464508055}}, "EndTime": 1610070930.326943, "Dimensions": {"model": 15, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326926} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8061903381347656, "sum": 0.8061903381347656, "min": 0.8061903381347656}}, "EndTime": 1610070930.327002, "Dimensions": {"model": 16, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.326986} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8557177829742432, "sum": 0.8557177829742432, "min": 0.8557177829742432}}, "EndTime": 1610070930.327063, "Dimensions": {"model": 17, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327046} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8323697471618652, "sum": 0.8323697471618652, "min": 0.8323697471618652}}, "EndTime": 1610070930.327121, "Dimensions": {"model": 18, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327104} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8296988773345947, "sum": 0.8296988773345947, "min": 0.8296988773345947}}, "EndTime": 1610070930.32718, "Dimensions": {"model": 19, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327163} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5758429336547851, "sum": 0.5758429336547851, "min": 0.5758429336547851}}, "EndTime": 1610070930.327238, "Dimensions": {"model": 20, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327222} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5748823738098144, "sum": 0.5748823738098144, "min": 0.5748823738098144}}, "EndTime": 1610070930.327296, "Dimensions": {"model": 21, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327279} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5792332744598389, "sum": 0.5792332744598389, "min": 0.5792332744598389}}, "EndTime": 1610070930.327357, "Dimensions": {"model": 22, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.32734} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5735988473892212, "sum": 0.5735988473892212, "min": 0.5735988473892212}}, "EndTime": 1610070930.327416, "Dimensions": {"model": 23, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.3274} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8856345367431641, "sum": 0.8856345367431641, "min": 0.8856345367431641}}, "EndTime": 1610070930.327478, "Dimensions": {"model": 24, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.32746} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8813429832458496, "sum": 0.8813429832458496, "min": 0.8813429832458496}}, "EndTime": 1610070930.327539, "Dimensions": {"model": 25, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327522} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.880330810546875, "sum": 0.880330810546875, "min": 0.880330810546875}}, "EndTime": 1610070930.327599, "Dimensions": {"model": 26, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327583} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8892419147491455, "sum": 0.8892419147491455, "min": 0.8892419147491455}}, "EndTime": 1610070930.327651, "Dimensions": {"model": 27, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327637} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.947380781173706, "sum": 0.947380781173706, "min": 0.947380781173706}}, "EndTime": 1610070930.3277, "Dimensions": {"model": 28, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327687} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.948472547531128, "sum": 0.948472547531128, "min": 0.948472547531128}}, "EndTime": 1610070930.327749, "Dimensions": {"model": 29, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327736} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 1.0670235443115235, "sum": 1.0670235443115235, "min": 1.0670235443115235}}, "EndTime": 1610070930.327799, "Dimensions": {"model": 30, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327785} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 1.0600868606567382, "sum": 1.0600868606567382, "min": 1.0600868606567382}}, "EndTime": 1610070930.32785, "Dimensions": {"model": 31, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.327836} [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, epoch=2, train absolute_loss_objective <loss>=0.840902366638 [01/08/2021 01:55:30 INFO 140319456966464] #early_stopping_criteria_metric: host=algo-1, epoch=2, criteria=absolute_loss_objective, value=0.358167200089 [01/08/2021 01:55:30 INFO 140319456966464] Epoch 2: Loss improved. Updating best model [01/08/2021 01:55:30 INFO 140319456966464] Saving model for epoch: 2 [01/08/2021 01:55:30 INFO 140319456966464] Saved checkpoint to "/tmp/tmpimFBbW/mx-mod-0000.params" [01/08/2021 01:55:30 INFO 140319456966464] #progress_metric: host=algo-1, completed 60 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Batches Since Last Reset": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Records Since Last Reset": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Total Batches Seen": {"count": 1, "max": 25, "sum": 25.0, "min": 25}, "Total Records Seen": {"count": 1, "max": 117, "sum": 117.0, "min": 117}, "Max Records Seen Between Resets": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Reset Count": {"count": 1, "max": 5, "sum": 5.0, "min": 5}}, "EndTime": 1610070930.337564, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 2}, "StartTime": 1610070930.211231} [01/08/2021 01:55:30 INFO 140319456966464] #throughput_metric: host=algo-1, train throughput=221.420850498 records/second [2021-01-08 01:55:30.440] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 10, "duration": 102, "num_examples": 6, "num_bytes": 1344} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8205788230895996, "sum": 0.8205788230895996, "min": 0.8205788230895996}}, "EndTime": 1610070930.44091, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.440814} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8395682048797607, "sum": 0.8395682048797607, "min": 0.8395682048797607}}, "EndTime": 1610070930.441011, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.44099} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.822910099029541, "sum": 0.822910099029541, "min": 0.822910099029541}}, "EndTime": 1610070930.441085, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441065} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8385847091674805, "sum": 0.8385847091674805, "min": 0.8385847091674805}}, "EndTime": 1610070930.441151, "Dimensions": {"model": 3, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441132} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.27286155700683595, "sum": 0.27286155700683595, "min": 0.27286155700683595}}, "EndTime": 1610070930.441216, "Dimensions": {"model": 4, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441198} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.2707618999481201, "sum": 0.2707618999481201, "min": 0.2707618999481201}}, "EndTime": 1610070930.44128, "Dimensions": {"model": 5, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441262} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.24053574323654175, "sum": 0.24053574323654175, "min": 0.24053574323654175}}, "EndTime": 1610070930.441351, "Dimensions": {"model": 6, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441333} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.21084488153457642, "sum": 0.21084488153457642, "min": 0.21084488153457642}}, "EndTime": 1610070930.441419, "Dimensions": {"model": 7, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.4414} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8746162700653076, "sum": 0.8746162700653076, "min": 0.8746162700653076}}, "EndTime": 1610070930.441481, "Dimensions": {"model": 8, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441464} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8086997032165527, "sum": 0.8086997032165527, "min": 0.8086997032165527}}, "EndTime": 1610070930.441544, "Dimensions": {"model": 9, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441527} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8154304599761963, "sum": 0.8154304599761963, "min": 0.8154304599761963}}, "EndTime": 1610070930.441606, "Dimensions": {"model": 10, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.44159} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8008563995361329, "sum": 0.8008563995361329, "min": 0.8008563995361329}}, "EndTime": 1610070930.441669, "Dimensions": {"model": 11, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441652} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.23946476936340333, "sum": 0.23946476936340333, "min": 0.23946476936340333}}, "EndTime": 1610070930.441732, "Dimensions": {"model": 12, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441715} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.22254417181015015, "sum": 0.22254417181015015, "min": 0.22254417181015015}}, "EndTime": 1610070930.441793, "Dimensions": {"model": 13, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441777} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.2682810807228088, "sum": 0.2682810807228088, "min": 0.2682810807228088}}, "EndTime": 1610070930.441854, "Dimensions": {"model": 14, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441837} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.21968680381774902, "sum": 0.21968680381774902, "min": 0.21968680381774902}}, "EndTime": 1610070930.441915, "Dimensions": {"model": 15, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441897} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.786369857788086, "sum": 0.786369857788086, "min": 0.786369857788086}}, "EndTime": 1610070930.441984, "Dimensions": {"model": 16, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.441965} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8354051971435547, "sum": 0.8354051971435547, "min": 0.8354051971435547}}, "EndTime": 1610070930.442045, "Dimensions": {"model": 17, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442028} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8123897171020508, "sum": 0.8123897171020508, "min": 0.8123897171020508}}, "EndTime": 1610070930.442107, "Dimensions": {"model": 18, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.44209} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8095406723022461, "sum": 0.8095406723022461, "min": 0.8095406723022461}}, "EndTime": 1610070930.442166, "Dimensions": {"model": 19, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.44215} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5520488548278809, "sum": 0.5520488548278809, "min": 0.5520488548278809}}, "EndTime": 1610070930.442229, "Dimensions": {"model": 20, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442212} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5546118736267089, "sum": 0.5546118736267089, "min": 0.5546118736267089}}, "EndTime": 1610070930.442293, "Dimensions": {"model": 21, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442275} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5631085777282715, "sum": 0.5631085777282715, "min": 0.5631085777282715}}, "EndTime": 1610070930.442353, "Dimensions": {"model": 22, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442336} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5496264362335205, "sum": 0.5496264362335205, "min": 0.5496264362335205}}, "EndTime": 1610070930.442411, "Dimensions": {"model": 23, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442395} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8959474754333496, "sum": 0.8959474754333496, "min": 0.8959474754333496}}, "EndTime": 1610070930.442473, "Dimensions": {"model": 24, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442456} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8775205802917481, "sum": 0.8775205802917481, "min": 0.8775205802917481}}, "EndTime": 1610070930.442537, "Dimensions": {"model": 25, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442519} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8854738807678223, "sum": 0.8854738807678223, "min": 0.8854738807678223}}, "EndTime": 1610070930.442597, "Dimensions": {"model": 26, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.44258} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8897949504852295, "sum": 0.8897949504852295, "min": 0.8897949504852295}}, "EndTime": 1610070930.442659, "Dimensions": {"model": 27, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442641} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9039207935333252, "sum": 0.9039207935333252, "min": 0.9039207935333252}}, "EndTime": 1610070930.442719, "Dimensions": {"model": 28, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442701} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9046483516693116, "sum": 0.9046483516693116, "min": 0.9046483516693116}}, "EndTime": 1610070930.442783, "Dimensions": {"model": 29, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442764} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8604156303405762, "sum": 0.8604156303405762, "min": 0.8604156303405762}}, "EndTime": 1610070930.442842, "Dimensions": {"model": 30, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442825} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8566427040100097, "sum": 0.8566427040100097, "min": 0.8566427040100097}}, "EndTime": 1610070930.442903, "Dimensions": {"model": 31, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.442887} [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, epoch=3, train absolute_loss_objective <loss>=0.82057882309 [01/08/2021 01:55:30 INFO 140319456966464] #early_stopping_criteria_metric: host=algo-1, epoch=3, criteria=absolute_loss_objective, value=0.210844881535 [01/08/2021 01:55:30 INFO 140319456966464] Epoch 3: Loss improved. Updating best model [01/08/2021 01:55:30 INFO 140319456966464] Saving model for epoch: 3 [01/08/2021 01:55:30 INFO 140319456966464] Saved checkpoint to "/tmp/tmpnmOEYx/mx-mod-0000.params" [01/08/2021 01:55:30 INFO 140319456966464] #progress_metric: host=algo-1, completed 80 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Batches Since Last Reset": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Records Since Last Reset": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Total Batches Seen": {"count": 1, "max": 31, "sum": 31.0, "min": 31}, "Total Records Seen": {"count": 1, "max": 145, "sum": 145.0, "min": 145}, "Max Records Seen Between Resets": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Reset Count": {"count": 1, "max": 6, "sum": 6.0, "min": 6}}, "EndTime": 1610070930.453254, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 3}, "StartTime": 1610070930.337824} [01/08/2021 01:55:30 INFO 140319456966464] #throughput_metric: host=algo-1, train throughput=242.296258701 records/second [2021-01-08 01:55:30.610] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 12, "duration": 157, "num_examples": 6, "num_bytes": 1344} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8000025177001953, "sum": 0.8000025177001953, "min": 0.8000025177001953}}, "EndTime": 1610070930.611077, "Dimensions": {"model": 0, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.610907} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8187112617492676, "sum": 0.8187112617492676, "min": 0.8187112617492676}}, "EndTime": 1610070930.611199, "Dimensions": {"model": 1, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611154} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8023337745666503, "sum": 0.8023337745666503, "min": 0.8023337745666503}}, "EndTime": 1610070930.611274, "Dimensions": {"model": 2, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611254} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8177277374267579, "sum": 0.8177277374267579, "min": 0.8177277374267579}}, "EndTime": 1610070930.611376, "Dimensions": {"model": 3, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611323} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.3305640697479248, "sum": 0.3305640697479248, "min": 0.3305640697479248}}, "EndTime": 1610070930.611447, "Dimensions": {"model": 4, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611428} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.27910862445831297, "sum": 0.27910862445831297, "min": 0.27910862445831297}}, "EndTime": 1610070930.611539, "Dimensions": {"model": 5, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611493} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.27654102325439456, "sum": 0.27654102325439456, "min": 0.27654102325439456}}, "EndTime": 1610070930.611635, "Dimensions": {"model": 6, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611614} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.2291654682159424, "sum": 0.2291654682159424, "min": 0.2291654682159424}}, "EndTime": 1610070930.611701, "Dimensions": {"model": 7, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611683} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8540436172485352, "sum": 0.8540436172485352, "min": 0.8540436172485352}}, "EndTime": 1610070930.611803, "Dimensions": {"model": 8, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611783} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7878474044799805, "sum": 0.7878474044799805, "min": 0.7878474044799805}}, "EndTime": 1610070930.611869, "Dimensions": {"model": 9, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.61185} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7948585796356201, "sum": 0.7948585796356201, "min": 0.7948585796356201}}, "EndTime": 1610070930.611966, "Dimensions": {"model": 10, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.611946} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7800041866302491, "sum": 0.7800041866302491, "min": 0.7800041866302491}}, "EndTime": 1610070930.612029, "Dimensions": {"model": 11, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612011} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.2517468237876892, "sum": 0.2517468237876892, "min": 0.2517468237876892}}, "EndTime": 1610070930.612144, "Dimensions": {"model": 12, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612124} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.25843575716018674, "sum": 0.25843575716018674, "min": 0.25843575716018674}}, "EndTime": 1610070930.612207, "Dimensions": {"model": 13, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612189} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.2720328712463379, "sum": 0.2720328712463379, "min": 0.2720328712463379}}, "EndTime": 1610070930.612307, "Dimensions": {"model": 14, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612287} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.23000093698501586, "sum": 0.23000093698501586, "min": 0.23000093698501586}}, "EndTime": 1610070930.612367, "Dimensions": {"model": 15, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.61235} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7663582992553711, "sum": 0.7663582992553711, "min": 0.7663582992553711}}, "EndTime": 1610070930.612457, "Dimensions": {"model": 16, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612439} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8150055885314942, "sum": 0.8150055885314942, "min": 0.8150055885314942}}, "EndTime": 1610070930.612519, "Dimensions": {"model": 17, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612501} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.792313928604126, "sum": 0.792313928604126, "min": 0.792313928604126}}, "EndTime": 1610070930.612621, "Dimensions": {"model": 18, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612601} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.7892002582550048, "sum": 0.7892002582550048, "min": 0.7892002582550048}}, "EndTime": 1610070930.612686, "Dimensions": {"model": 19, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612668} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5276792049407959, "sum": 0.5276792049407959, "min": 0.5276792049407959}}, "EndTime": 1610070930.612776, "Dimensions": {"model": 20, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612757} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5239265823364258, "sum": 0.5239265823364258, "min": 0.5239265823364258}}, "EndTime": 1610070930.612835, "Dimensions": {"model": 21, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612818} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5456685256958008, "sum": 0.5456685256958008, "min": 0.5456685256958008}}, "EndTime": 1610070930.61293, "Dimensions": {"model": 22, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.612907} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.5144282531738281, "sum": 0.5144282531738281, "min": 0.5144282531738281}}, "EndTime": 1610070930.612997, "Dimensions": {"model": 23, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.61298} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8984403991699219, "sum": 0.8984403991699219, "min": 0.8984403991699219}}, "EndTime": 1610070930.613057, "Dimensions": {"model": 24, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613041} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8811629104614258, "sum": 0.8811629104614258, "min": 0.8811629104614258}}, "EndTime": 1610070930.613147, "Dimensions": {"model": 25, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613129} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.8903963279724121, "sum": 0.8903963279724121, "min": 0.8903963279724121}}, "EndTime": 1610070930.613203, "Dimensions": {"model": 26, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613187} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.885976848602295, "sum": 0.885976848602295, "min": 0.885976848602295}}, "EndTime": 1610070930.613296, "Dimensions": {"model": 27, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613274} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9245861530303955, "sum": 0.9245861530303955, "min": 0.9245861530303955}}, "EndTime": 1610070930.613362, "Dimensions": {"model": 28, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613345} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 0.9248851108551025, "sum": 0.9248851108551025, "min": 0.9248851108551025}}, "EndTime": 1610070930.613422, "Dimensions": {"model": 29, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613406} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 1.001427755355835, "sum": 1.001427755355835, "min": 1.001427755355835}}, "EndTime": 1610070930.613513, "Dimensions": {"model": 30, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613493} #metrics {"Metrics": {"train_absolute_loss_objective": {"count": 1, "max": 1.0884630012512206, "sum": 1.0884630012512206, "min": 1.0884630012512206}}, "EndTime": 1610070930.613571, "Dimensions": {"model": 31, "Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.613554} [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, epoch=4, train absolute_loss_objective <loss>=0.8000025177 [01/08/2021 01:55:30 INFO 140319456966464] #early_stopping_criteria_metric: host=algo-1, epoch=4, criteria=absolute_loss_objective, value=0.229165468216 [01/08/2021 01:55:30 INFO 140319456966464] Saving model for epoch: 4 [01/08/2021 01:55:30 INFO 140319456966464] Saved checkpoint to "/tmp/tmpFIrV0r/mx-mod-0000.params" [01/08/2021 01:55:30 INFO 140319456966464] #progress_metric: host=algo-1, completed 100 % of epochs #metrics {"Metrics": {"Max Batches Seen Between Resets": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Batches Since Last Reset": {"count": 1, "max": 6, "sum": 6.0, "min": 6}, "Number of Records Since Last Reset": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Total Batches Seen": {"count": 1, "max": 37, "sum": 37.0, "min": 37}, "Total Records Seen": {"count": 1, "max": 173, "sum": 173.0, "min": 173}, "Max Records Seen Between Resets": {"count": 1, "max": 28, "sum": 28.0, "min": 28}, "Reset Count": {"count": 1, "max": 7, "sum": 7.0, "min": 7}}, "EndTime": 1610070930.62195, "Dimensions": {"Host": "algo-1", "Meta": "training_data_iter", "Operation": "training", "Algorithm": "Linear Learner", "epoch": 4}, "StartTime": 1610070930.453524} [01/08/2021 01:55:30 INFO 140319456966464] #throughput_metric: host=algo-1, train throughput=166.12303292 records/second [01/08/2021 01:55:30 WARNING 140319456966464] wait_for_all_workers will not sync workers since the kv store is not running distributed [01/08/2021 01:55:30 WARNING 140319456966464] wait_for_all_workers will not sync workers since the kv store is not running distributed [2021-01-08 01:55:30.623] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 14, "duration": 0, "num_examples": 1, "num_bytes": 240} [2021-01-08 01:55:30.632] [tensorio] [info] epoch_stats={"data_pipeline": "/opt/ml/input/data/train", "epoch": 16, "duration": 7, "num_examples": 6, "num_bytes": 1344} [01/08/2021 01:55:30 INFO 140319456966464] #train_score (algo-1) : ('absolute_loss_objective', 6244.388811383928) [01/08/2021 01:55:30 INFO 140319456966464] #train_score (algo-1) : ('mse', 59303201.71428572) [01/08/2021 01:55:30 INFO 140319456966464] #train_score (algo-1) : ('absolute_loss', 6244.388811383928) [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, train absolute_loss_objective <loss>=6244.38881138 [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, train mse <loss>=59303201.7143 [01/08/2021 01:55:30 INFO 140319456966464] #quality_metric: host=algo-1, train absolute_loss <loss>=6244.38881138 [01/08/2021 01:55:30 INFO 140319456966464] Best model found for hyperparameters: {"lr_scheduler_step": 100, "wd": 0.0001, "optimizer": "adam", "lr_scheduler_factor": 0.99, "l1": 0.0, "learning_rate": 0.1, "lr_scheduler_minimum_lr": 0.0001} [01/08/2021 01:55:30 INFO 140319456966464] Saved checkpoint to "/tmp/tmpNWFXWb/mx-mod-0000.params" [01/08/2021 01:55:30 INFO 140319456966464] Test data is not provided. #metrics {"Metrics": {"totaltime": {"count": 1, "max": 1200.800895690918, "sum": 1200.800895690918, "min": 1200.800895690918}, "finalize.time": {"count": 1, "max": 11.447906494140625, "sum": 11.447906494140625, "min": 11.447906494140625}, "initialize.time": {"count": 1, "max": 185.23406982421875, "sum": 185.23406982421875, "min": 185.23406982421875}, "check_early_stopping.time": {"count": 5, "max": 1.3442039489746094, "sum": 4.8656463623046875, "min": 0.23293495178222656}, "setuptime": {"count": 1, "max": 25.728940963745117, "sum": 25.728940963745117, "min": 25.728940963745117}, "update.time": {"count": 5, "max": 165.38596153259277, "sum": 619.9460029602051, "min": 95.11899948120117}, "epochs": {"count": 1, "max": 5, "sum": 5.0, "min": 5}}, "EndTime": 1610070930.639218, "Dimensions": {"Host": "algo-1", "Operation": "training", "Algorithm": "Linear Learner"}, "StartTime": 1610070929.794508} 2021-01-08 01:56:00 Completed - Training job completed ProfilerReport-1610070693: NoIssuesFound Training seconds: 52 Billable seconds: 52
Ref: 托管 Spot 训练:最高可节省 90% 的 Amazon SageMaker 训练作业成本
linear = sagemaker.estimator.Estimator(container, role, train_instance_count = 1, train_instance_type = 'ml.c4.xlarge', output_path = output_location, sagemaker_session = sagemaker_session, train_use_spot_instances = True, train_max_run = 300, train_max_wait = 600)
节省成本的效果:
2021-01-08 02:32:33 Uploading - Uploading generated training model 2021-01-08 02:32:33 Completed - Training job completed Training seconds: 48 Billable seconds: 21 Managed Spot Training savings: 56.2%
三、部署,并推断
但没有“触发”。
第一步,部署过程,需要点时间。
# Deploying the model to perform inference linear_regressor = linear.deploy(initial_instance_count = 1, instance_type = 'ml.m4.xlarge')
第二步,设置 input data的相关参数。
from sagemaker.predictor import csv_serializer, json_deserializer # Content type overrides the data that will be passed to the deployed model, since the deployed model expects data in text/csv format. # Serializer accepts a single argument, the input data, and returns a sequence of bytes in the specified content type # Deserializer accepts two arguments, the result data and the response content type, and return a sequence of bytes in the specified content type. # Reference: https://sagemaker.readthedocs.io/en/stable/predictors.html # linear_regressor.content_type = 'text/csv' linear_regressor.serializer = csv_serializer linear_regressor.deserializer = json_deserializer
第三步,开始推断。
# making prediction on the test data result = linear_regressor.predict(X_test)
第四步,参数部署节点。
# Delete the end-point linear_regressor.delete_endpoint()
目标检测
Ref: amazon-sagemaker-examples/introduction_to_amazon_algorithms/object_detection_birds
内置套路写法,同样适用。既然是内置,所以是aws自家的: mxnet。
sagemaker.estimator.Estimator(container, ... )
可见,依然是类似上述例子的套路。
from sagemaker.amazon.amazon_estimator import get_image_uri
# 获取内置镜像 training_image = get_image_uri(sess.boto_region_name, 'object-detection', repo_version='latest') print (training_image) # 训练输出路径 s3_output_location = 's3://{}/{}/output'.format(bucket, prefix) # 设置硬件相关部分 od_model = sagemaker.estimator.Estimator(training_image, role, train_instance_count=1, train_instance_type='ml.p3.2xlarge', train_volume_size = 50, train_max_run = 360000, input_mode= 'File', output_path=s3_output_location, sagemaker_session=sess) # 设置网络细节,软件部分 def set_hyperparameters(num_epochs, lr_steps): num_classes = classes_df.shape[0] num_training_samples = train_df.shape[0] print('num classes: {}, num training images: {}'.format(num_classes, num_training_samples)) od_model.set_hyperparameters(base_network='resnet-50', use_pretrained_model=1, num_classes=num_classes, mini_batch_size=16, epochs=num_epochs, learning_rate=0.001, lr_scheduler_step=lr_steps, lr_scheduler_factor=0.1, optimizer='sgd', momentum=0.9, weight_decay=0.0005, overlap_threshold=0.5, nms_threshold=0.45, image_shape=512, label_width=350, num_training_samples=num_training_samples) set_hyperparameters(10, '33,67') # s3上准备好数据 train_data = sagemaker.session.s3_input(s3_train_data, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') validation_data = sagemaker.session.s3_input(s3_validation_data, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') data_channels = {'train': train_data, 'validation': validation_data}
目标分类
Step 5: Train a Model【两种API简介】
Create and Run a Training Job (Amazon SageMaker Python SDK) 【highlevel】
Create and Run a Training Job (AWS SDK for Python (Boto3))
fulltraining版本代码中默认提供了preparing data on s3。
一、SageMaker Python SDK
-
基本套路
准备内置 Image。
from sagemaker.amazon.amazon_estimator import get_image_uri training_image = get_image_uri(sess.boto_region_name, 'image-classification', repo_version="latest") print (training_image)
硬件设置。
s3_output_location = 's3://{}/{}/output'.format(bucket, prefix)
ic = sagemaker.estimator.Estimator(training_image, role, train_instance_count=1, # train_instance_type='ml.p2.xlarge', train_volume_size = 50, train_max_run = 360000, input_mode= 'File', output_path=s3_output_location, sagemaker_session=sess)
软体设置。
ic.set_hyperparameters(num_layers=18, image_shape = "3,224,224", num_classes=257, num_training_samples=15420, mini_batch_size=128, epochs=5, learning_rate=0.01, top_k=2, precision_dtype='float32')
开始训练。
train_data = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') validation_data = sagemaker.session.s3_input(s3validation, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') data_channels = {'train': train_data, 'validation': validation_data} ic.fit(inputs=data_channels, logs=True)
-
Amazon SageMaker Neo
Amazon SageMaker Neo uses Apache TVM and partner-provided compilers and acceleration libraries to deliver the best available performance for a given model and hardware target.
就是将 训练结果文件再 转化(优化) 一遍。
optimized_ic = ic
if ic.create_model().check_neo_region(boto3.Session().region_name) is False: print('Neo is not currently supported in', boto3.Session().region_name) else:
# 如果支持的话 output_path = '/'.join(ic.output_path.split('/')[:-1]) optimized_ic = ic.compile_model(target_instance_family='ml_m4', input_shape={'data': [1, 3, 224, 224]}, # Batch size 1, 3 channels, 224x224 Images. output_path=output_path, framework='mxnet', framework_version='1.2.1')
optimized_ic.image = get_image_uri(sess.boto_region_name, 'image-classification-neo', repo_version="latest") optimized_ic.name = 'deployed-image-classification'
-
可能的问题
该API下,可能会带来如下问题。
ic_classifier.content_type = 'application/x-image' # AttributeError: can't set attribute
result = json.loads(ic_classifier.predict(payload))
二、AWS SDK (Boto3)
核心就是完成training_params这个参数结构。
# create the Amazon SageMaker training job sagemaker = boto3.client(service_name='sagemaker') sagemaker.create_training_job(**training_params) # # 创建后,开始监控 # # confirm that the training job has started status = sagemaker.describe_training_job(TrainingJobName=job_name)['TrainingJobStatus'] print('Training job current status: {}'.format(status))
三、Incremental Training
Ref: Incremental Training in Amazon SageMaker
You don't need to train a new model from scratch. 再继续train 就可以了。
参数 use_pretrained_model,也要注意下。
train_data = sagemaker.session.s3_input(s3train, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') validation_data = sagemaker.session.s3_input(s3validation, distribution='FullyReplicated', content_type='application/x-recordio', s3_data_type='S3Prefix') data_channels = {'train': train_data, 'validation': validation_data}
如下,可见第二次,多了一个 'model': model_data。
ic.fit(inputs=data_channels, logs=True)
# Print the location of the model data from previous training print(ic.model_data) # 这是个 path # Prepare model channel in addition to train and validation model_data = sagemaker.session.s3_input(ic.model_data, distribution='FullyReplicated', content_type='application/x-sagemaker-model', s3_data_type='S3Prefix') data_channels = {'train': train_data, 'validation': validation_data, 'model': model_data}
End.