[ERROR: json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 182]
# in file pipeline.yaml
- name: TF_CONFIG
value: "{
\"cluster\": {
\"worker\": [\"dist-strat-example-worker-0:5000\",\"dist-strat-example-worker-1:5000\"],
\"ps\": [\"dist-strat-example-ps-0:5000\"],
### '],}', the middle ',' should not exist
\"chief\": [\"dist-strat-example-chief:5000\"],},
\"task\": {
\"type\": \"worker\",
\"index\": \"0\"
}
}"
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ kubectl logs dist-strat-example-worker-0-7mqqg
2024-02-13 16:32:23.777522: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
Traceback (most recent call last):
File "/tf_std_server.py", line 67, in <module>
main()
File "/tf_std_server.py", line 41, in main
if cluster_resolver.task_type in ("worker", "ps"):
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/distribute/cluster_resolver/tfconfig_cluster_resolver.py", line 104, in task_type
task_info = _get_value_in_tfconfig(_TASK_KEY, {})
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/distribute/cluster_resolver/tfconfig_cluster_resolver.py", line 43, in _get_value_in_tfconfig
tf_config = _load_tf_config()
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/tensorflow/python/distribute/cluster_resolver/tfconfig_cluster_resolver.py", line 39, in _load_tf_config
return json.loads(os.environ.get(_TF_CONFIG_ENV, '{}'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 182 (char 181)
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$ kubectl describe pod dist-strat-example-worker-0-7mqqg
Name: dist-strat-example-worker-0-7mqqg
Namespace: default
Priority: 0
Service Account: default
Node: maye-inspiron-5547/192.168.0.104
Start Time: Wed, 14 Feb 2024 00:32:12 +0800
Labels: job=worker
name=dist-strat-example
task=0
Annotations: <none>
Status: Running
IP: 10.244.0.179
IPs:
IP: 10.244.0.179
Controlled By: ReplicationController/dist-strat-example-worker-0
Containers:
tensorflow:
Container ID: containerd://613170ce2079886726f3984679a45f8387b0bd9d49b3fe7b97c48b08db905655
Image: tf_std_server:v1
Image ID: sha256:117ff425f04f86b62e85a1a7ca654d0c36e9c8ac3bcc78f413984e5cbddb8421
Port: 5000/TCP
Host Port: 0/TCP
Command:
/usr/bin/python
/tf_std_server.py
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 14 Feb 2024 00:33:08 +0800
Finished: Wed, 14 Feb 2024 00:33:11 +0800
Ready: False
Restart Count: 3
Environment:
TF_CONFIG: { "cluster": { "worker": ["dist-strat-example-worker-0:5000","dist-strat-example-worker-1:5000"], "ps": ["dist-strat-example-ps-0:5000"], "chief": ["dist-strat-example-chief:5000"],}, "task": { "type": "worker", "index": "0" } }
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-khlnz (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-khlnz:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 86s default-scheduler Successfully assigned default/dist-strat-example-worker-0-7mqqg to maye-inspiron-5547
Normal Pulled 30s (x4 over 85s) kubelet Container image "tf_std_server:v1" already present on machine
Normal Created 30s (x4 over 83s) kubelet Created container tensorflow
Normal Started 30s (x4 over 83s) kubelet Started container tensorflow
Warning BackOff 0s (x6 over 71s) kubelet Back-off restarting failed container tensorflow in pod dist-strat-example-worker-0-7mqqg_default(5a1300ff-81b5-44bf-9232-f5d335b9e6fc)
(base) maye@maye-Inspiron-5547:~/github_repository/tensorflow_ecosystem/distribution_strategy$
[SOLUTION]
This error is due to that in file "pipeline.yaml", there is "," in "],}", there should be no "," before "}".