Debug: tfx: ERROR:absl:[StatisticsGen] Input resolution error

[ERROR:absl:[StatisticsGen] Input resolution error: Error while resolving inputs for StatisticsGen: InputSpec min_count has not met: inputs[examples] has min_count = 1 but only got 0 artifacts. (Artifact IDs: [])]

When running tfx pipeline using kubeflow,

kubectl logs StatisticsGen-pod-name -n kubeflow

-->
ERROR:absl:[StatisticsGen] Input resolution error: Error while resolving inputs for StatisticsGen: InputSpec min_count has not met: inputs[examples] has min_count = 1 but only got 0 artifacts. (Artifact IDs: [])
Traceback (most recent call last):
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/launcher.py", line 259, in _prepare_execution
resolved_inputs = inputs_utils.resolve_input_artifacts(
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/inputs_utils.py", line 66, in resolve_input_artifacts
resolved = node_inputs_resolver.resolve(metadata_handler, node_inputs)
File "/usr/local/lib/python3.8/dist-packages/tfx/orchestration/portable/input_resolution/node_inputs_resolver.py", line 479, in resolve
raise exceptions.InsufficientInputError(
tfx.orchestration.portable.input_resolution.exceptions.InsufficientInputError: Error while resolving inputs for StatisticsGen: InputSpec min_count has not met: inputs[examples] has min_count = 1 but only got 0 artifacts. (Artifact IDs: [])

[ANALYSIS]

1. see python source code of the sentence who raised the exception, to find out how tfx resolves input artifacts.

# tfx/orchestration/portable/input_resolution/node_inputs_resolver.py",
def resolve(
    handle_like: mlmd_cm.HandleLike,
    node_inputs: pipeline_pb2.NodeInputs,
) -> List[typing_utils.ArtifactMultiMap]:
  """Resolve a NodeInputs."""
  resolved: Dict[str, List[_Entry]] = collections.defaultdict(list)

  for input_key in _topologically_sorted_input_keys(
      node_inputs.inputs, node_inputs.input_graphs):
    # This input_key may have been already resolved while resolving for another
    # input key.
    if input_key in resolved:
      continue

    input_spec = node_inputs.inputs[input_key]

    if input_spec.channels:
      artifacts = channel_resolver.resolve_union_channels(     ####（1）
          handle_like, input_spec.channels)
      resolved[input_key] = [
          (partition_utils.NO_PARTITION, _filter_live(artifacts))
      ]
    
    ...  #### deleted not related sentences

    if input_spec.min_count:
      for _, artifacts in resolved[input_key]:
        if len(artifacts) < input_spec.min_count:
          raise exceptions.InsufficientInputError(
              'InputSpec min_count has not met: '
              f'inputs[{input_key}] has min_count = {input_spec.min_count} '
              f'but only got {len(artifacts)} artifacts. '
              f'(Artifact IDs: {[a.id for a in artifacts]})')
  ...

# tfx/orchestration/portable/input_resolution/channel_resolver.py  
def resolve_union_channels(
    mlmd_handle: mlmd_cm.HandleLike,
    channels: Sequence[pipeline_pb2.InputSpec.Channel],
) -> List[types.Artifact]:
  """Evaluate InputSpec.channels."""
  seen = set()
  result = []
  for channel in channels:
    for artifact in resolve_single_channel(mlmd_handle, channel):     ####（2）
      if artifact.id not in seen:
        seen.add(artifact.id)
        result.append(artifact)
  return result

# tfx/orchestration/portable/input_resolution/channel_resolver.py 
# TODO(b/234806996): Migrate to MLMD filter query.
def resolve_single_channel(
    handle_like: mlmd_cm.HandleLike,
    channel: pipeline_pb2.InputSpec.Channel,
) -> List[types.Artifact]:
  """Evaluate a single InputSpec.Channel."""
  store = mlmd_cm.get_handle(handle_like).store
  contexts = []
  for context_query in channel.context_queries:
    maybe_context = _get_context_from_context_query(store, context_query)
    if maybe_context is None:
      # If the context does not exist, it means no artifacts satisfy the given
      # context query. Returns empty.
      return []
    else:
      contexts.append(maybe_context)
  executions = _get_executions_by_all_contexts(store, contexts)
  if not executions:
    return []
  artifacts = _get_output_artifacts(store, executions, channel.output_key)    #### (3)
  if not artifacts:
    return []
  artifacts, artifact_type = _filter_by_artifact_query(
      store, artifacts, channel.artifact_query)

  return artifact_utils.deserialize_artifacts(artifact_type, artifacts)

# tfx/orchestration/portable/input_resolution/channel_resolver.py 
# TODO(b/233044350): Move to a general metadata utility.
def _get_output_artifacts(
    store: mlmd.MetadataStore,
    executions: Sequence[metadata_store_pb2.Execution],
    output_key: Optional[str] = None,
) -> List[metadata_store_pb2.Artifact]:
  """Get all output artifacts of the given executions."""
  executions_ids = [e.id for e in executions]
  events = [
      v for v in store.get_events_by_execution_ids(executions_ids)
      if event_lib.is_valid_output_event(v, output_key)
  ]
  artifact_ids = [v.artifact_id for v in events]     #### (4) 
  return store.get_artifacts_by_id(artifact_ids)

In summary, from the upper function call chain, it can be known that the way tfx resolve input artifacts is:

input_spec = node_inputs.inputs[input_key]
channels = input_spec.channels
store = mlmd_cm.get_handle(handle_like).store
contexts = []
for context_query in channel.context_queries:
maybe_context = _get_context_from_context_query(store, context_query)
contexts.append(maybe_context)
executions = _get_executions_by_all_contexts(store, contexts)

executions_ids = [e.id for e in executions]
events = [
v for v in store.get_events_by_execution_ids(executions_ids)
if event_lib.is_valid_output_event(v, output_key)
]
artifact_ids = [v.artifact_id for v in events]
artifacts = store.get_artifacts_by_id(artifact_ids)

2. since at last, artifact_ids are gotten from event, see if event record in database metadb is ok.

kubectl exec -it mysql-pod-name -n kubeflow -- bash

--> entered pod mysql:

mysql> use metadb;
mysql> select * from Event;

Empty set (0.00 sec) #### [EXCEPTION]: table Event is empty, so getting
#### artifact_ids from event will get None.

mysql> select * from EventPath;

+----------+---------------+------------+----------+
| event_id | is_index_step | step_index | step_key |
+----------+---------------+------------+----------+
| -1 | 0 | NULL | examples | #### [EXCEPTION]: event_id
| -1 | 1 | 0 | NULL | #### are all -1.
| -1 | 0 | NULL | examples |
| -1 | 1 | 0 | NULL |
+----------+---------------+------------+----------+
82 rows in set (0.00 sec)

And compare with running the same tfx pipeline using LocalDagRunner, which runs ok:

tfx.orchestration.LocalDagRunner().run(
    _create_schema_pipeline(
        pipeline_name=SCHEMA_PIPELINE_NAME,
        pipeline_root=SCHEMA_PIPELINE_ROOT,
        data_root=DATA_ROOT,
        schema_path=SCHEMA_PATH,
        metadata_path=SCHEMA_METADATA_PATH,
        module_file=_trainer_module_file,
        serving_model_dir=SERVING_MODEL_DIR,
    )
)

sqlite> select * from Event;

1|1|2|4|1703656727002
2|1|3|3|1703656727354
3|2|3|4|1703656755803
4|2|4|3|1703656756122
5|3|4|4|1703656756758
6|4|5|4|1703658256917
7|4|6|3|1703658257028
8|5|6|4|1703658281239

sqlite> PRAGMA table_info(Event);

0|id|INTEGER|0||1
1|artifact_id|INT|1||0
2|execution_id|INT|1||0
3|type|INT|1||0
4|milliseconds_since_epoch|INT|0||0

sqlite> select * from EventPath;

...
745|0||examples
745|1|0|
sqlite>

sqlite> PRAGMA table_info(EventPath);

0|event_id|INT|1||0
1|is_index_step|TINYINT(1)|1||0
2|step_index|INT|0||0
3|step_key|TEXT|0||0
sqlite>

It can be seen that for LocalDagRunner, table Event is not empty, and event_id in table EventPath is corresponding id in table Event, not -1. so it can be inferred that the cause of the ERROR is empty table Event.

3. So when to Insert event into table Event

A reasonable inference is that after component ImportExampleGen (the upper component of StatisticsGen) completes, its metadata， including metadata of output artifacts, execution, contexts, events, will be
stored in database metadb.

kubectl logs ImportExampleGen-pod-name -n kubeflow

-->

$ kubectl logs detect-anomolies-on-wafer-tfdv-schema-9xwkk-2791313833 -n kubeflow

time="2024-01-27T10:00:51.999Z" level=info msg="capturing logs" argo=true
2024-01-27 10:01:01.688802: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
INFO:root:Component ImportExampleGen is running.
INFO:absl:Running launcher for node_info {
type {
name: "tfx.components.example_gen.import_example_gen.component.ImportExampleGen"
}
id: "ImportExampleGen"
}
contexts {
contexts {
type {
name: "pipeline"
}
name {
field_value {
string_value: "detect_anomolies_on_wafer_tfdv_schema"
}
}
}
contexts {
type {
name: "pipeline_run"
}
name {
field_value {
string_value: "detect-anomolies-on-wafer-tfdv-schema-9xwkk"
}
}
}
contexts {
type {
name: "node"
}
name {
field_value {
string_value: "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
}
}
}
}
outputs {
outputs {
key: "examples"
value {
artifact_spec {
type {
name: "Examples"
properties {
key: "span"
value: INT
}
properties {
key: "split_names"
value: STRING
}
properties {
key: "version"
value: INT
}
base_type: DATASET
}
}
}
}
}
parameters {
parameters {
key: "input_base"
value {
field_value {
string_value: "/maye/trainEvalData"
}
}
}
parameters {
key: "input_config"
value {
field_value {
string_value: "{\n "splits": [\n {\n "name": "train",\n "pattern": "train"\n },\n {\n "name": "eval",\n "pattern": "eval"\n }\n ]\n}"
}
}
}
parameters {
key: "output_config"
value {
field_value {
string_value: "{}"
}
}
}
parameters {
key: "output_data_format"
value {
field_value {
int_value: 6
}
}
}
parameters {
key: "output_file_format"
value {
field_value {
int_value: 5
}
}
}
}
downstream_nodes: "StatisticsGen"
downstream_nodes: "Trainer"
downstream_nodes: "Transform"
execution_options {
caching_options {
}
}

INFO:absl:MetadataStore with gRPC connection initialized
INFO:absl:[ImportExampleGen] Resolved inputs: ({},)
INFO:root:Adding KFP pod name detect-anomolies-on-wafer-tfdv-schema-9xwkk-2791313833 to execution
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:select span and version = (0, None)
INFO:absl:latest span and version = (0, None)
INFO:absl:MetadataStore with gRPC connection initialized
INFO:absl:Going to run a new execution 84
INFO:absl:Going to run a new execution: ExecutionInfo(execution_id=84, input_dict={}, output_dict=defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema/ImportExampleGen/examples/84"
custom_properties {
key: "input_fingerprint"
value {
string_value: "split:train,num_files:1,total_bytes:48638225,xor_checksum:1703857824,sum_checksum:1703857824\nsplit:eval,num_files:1,total_bytes:12654996,xor_checksum:1703857858,sum_checksum:1703857858"
}
}
custom_properties {
key: "span"
value {
int_value: 0
}
}
, artifact_type: name: "Examples"
properties {
key: "span"
value: INT
}
properties {
key: "split_names"
value: STRING
}
properties {
key: "version"
value: INT
}
base_type: DATASET
)]}), exec_properties={'input_config': '{\n "splits": [\n {\n "name": "train",\n "pattern": "train"\n },\n {\n "name": "eval",\n "pattern": "eval"\n }\n ]\n}', 'output_config': '{}', 'output_data_format': 6, 'input_base': '/maye/trainEvalData', 'output_file_format': 5, 'span': 0, 'version': None, 'input_fingerprint': 'split:train,num_files:1,total_bytes:48638225,xor_checksum:1703857824,sum_checksum:1703857824\nsplit:eval,num_files:1,total_bytes:12654996,xor_checksum:1703857858,sum_checksum:1703857858'}, execution_output_uri='/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema/ImportExampleGen/.system/executor_execution/84/executor_output.pb', stateful_working_dir='/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema/ImportExampleGen/.system/stateful_working_dir/detect-anomolies-on-wafer-tfdv-schema-9xwkk', tmp_dir='/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema/ImportExampleGen/.system/executor_execution/84/.temp/', pipeline_node=node_info {
type {
name: "tfx.components.example_gen.import_example_gen.component.ImportExampleGen"
}
id: "ImportExampleGen"
}
contexts {
contexts {
type {
name: "pipeline"
}
name {
field_value {
string_value: "detect_anomolies_on_wafer_tfdv_schema"
}
}
}
contexts {
type {
name: "pipeline_run"
}
name {
field_value {
string_value: "detect-anomolies-on-wafer-tfdv-schema-9xwkk"
}
}
}
contexts {
type {
name: "node"
}
name {
field_value {
string_value: "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
}
}
}
}
outputs {
outputs {
key: "examples"
value {
artifact_spec {
type {
name: "Examples"
properties {
key: "span"
value: INT
}
properties {
key: "split_names"
value: STRING
}
properties {
key: "version"
value: INT
}
base_type: DATASET
}
}
}
}
}
parameters {
parameters {
key: "input_base"
value {
field_value {
string_value: "/maye/trainEvalData"
}
}
}
parameters {
key: "input_config"
value {
field_value {
string_value: "{\n "splits": [\n {\n "name": "train",\n "pattern": "train"\n },\n {\n "name": "eval",\n "pattern": "eval"\n }\n ]\n}"
}
}
}
parameters {
key: "output_config"
value {
field_value {
string_value: "{}"
}
}
}
parameters {
key: "output_data_format"
value {
field_value {
int_value: 6
}
}
}
parameters {
key: "output_file_format"
value {
field_value {
int_value: 5
}
}
}
}
downstream_nodes: "StatisticsGen"
downstream_nodes: "Trainer"
downstream_nodes: "Transform"
execution_options {
caching_options {
}
}
, pipeline_info=id: "detect_anomolies_on_wafer_tfdv_schema"
, pipeline_run_id='detect-anomolies-on-wafer-tfdv-schema-9xwkk')
INFO:absl:Generating examples.
INFO:root:Missing pipeline option (runner). Executing pipeline using the default runner: DirectRunner.
INFO:absl:Reading input TFRecord data /maye/trainEvalData/train.
INFO:absl:Reading input TFRecord data /maye/trainEvalData/eval.
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function annotate_downstream_side_inputs at 0x7fb846cf8550> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function fix_side_input_pcoll_coders at 0x7fb846cf8670> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function pack_combiners at 0x7fb846cf8b80> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function lift_combiners at 0x7fb846cf8c10> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function expand_sdf at 0x7fb846cf8dc0> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function expand_gbk at 0x7fb846cf8e50> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sink_flattens at 0x7fb846cf8f70> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function greedily_fuse at 0x7fb846cf9040> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function read_to_impulse at 0x7fb846cf90d0> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function impulse_to_input at 0x7fb846cf9160> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function sort_stages at 0x7fb846cf93a0> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function add_impulse_to_dangling_transforms at 0x7fb846cf94c0> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function setup_timer_mapping at 0x7fb846cf9310> ====================
INFO:apache_beam.runners.portability.fn_api_runner.translations:==================== <function populate_data_channel_coders at 0x7fb846cf9430> ====================
INFO:apache_beam.runners.worker.statecache:Creating state cache with size 104857600
INFO:apache_beam.runners.portability.fn_api_runner.worker_handlers:Created Worker handler <apache_beam.runners.portability.fn_api_runner.worker_handlers.EmbeddedWorkerHandler object at 0x7fb844962400> for environment ref_Environment_default_environment_1 (beam:env:embedded_python:v1, b'')
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:apache_beam.io.filebasedsink:Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
INFO:apache_beam.io.filebasedsink:Renamed 1 shards in 0.09 seconds.
INFO:root:Default Python SDK image for environment is apache/beam_python3.8_sdk:2.46.0
INFO:apache_beam.io.filebasedsink:Starting finalize_write threads with num_shards: 1 (skipped: 0), batches: 1, num_threads: 1
INFO:apache_beam.io.filebasedsink:Renamed 1 shards in 0.02 seconds.
INFO:absl:Examples generated.
INFO:absl:Value type <class 'NoneType'> of key version in exec_properties is not supported, going to drop it
INFO:absl:Value type <class 'list'> of key _beam_pipeline_args in exec_properties is not supported, going to drop it
INFO:absl:Cleaning up stateless execution info.
INFO:absl:Execution 84 succeeded.
INFO:absl:Cleaning up stateful execution info.
INFO:absl:Publishing output artifacts defaultdict(<class 'list'>, {'examples': [Artifact(artifact: uri: "/tfx/tfx_pv/pipelines/detect_anomolies_on_wafer_tfdv_schema/ImportExampleGen/examples/84"
custom_properties {
key: "input_fingerprint"
value {
string_value: "split:train,num_files:1,total_bytes:48638225,xor_checksum:1703857824,sum_checksum:1703857824\nsplit:eval,num_files:1,total_bytes:12654996,xor_checksum:1703857858,sum_checksum:1703857858"
}
}
custom_properties {
key: "span"
value {
int_value: 0
}
}
, artifact_type: name: "Examples"
properties {
key: "span"
value: INT
}
properties {
key: "split_names"
value: STRING
}
properties {
key: "version"
value: INT
}
base_type: DATASET
)]}) for execution 84
INFO:absl:MetadataStore with gRPC connection initialized
INFO:root:Component ImportExampleGen is finished.
time="2024-01-27T10:04:49.143Z" level=info msg="sub-process exited" argo=true error=""
time="2024-01-27T10:04:49.143Z" level=info msg="/mlpipeline-ui-metadata.json -> /var/run/argo/outputs/artifacts/mlpipeline-ui-metadata.json.tgz" argo=true
time="2024-01-27T10:04:49.144Z" level=info msg="Taring /mlpipeline-ui-metadata.json"
(base) maye@maye-Inspiron-5547:~$

It can be seen from log of ImportExampleGen, after completing execution of the component, “INFO:absl:Publishing output artifacts...", here should be the place to store metadata to database.

In pycharm, search "Publishing output artifacts" in project tfx:

# tfx/orchestration/portable/launcher.py
  def launch(self) -> Optional[data_types.ExecutionInfo]:
    """Executes the component, includes driver, executor and publisher.

    Returns:
      The metadata of this execution that is registered in MLMD. It can be None
      if the driver decides not to run the execution.

    Raises:
      Exception: If the executor fails.
    """
    logging.info('Running launcher for %s', self._pipeline_node)
    if self._system_node_handler:
      # If this is a system node, runs it and directly return.
      return self._system_node_handler.run(self._mlmd_connection,
                                           self._pipeline_node,
                                           self._pipeline_info,
                                           self._pipeline_runtime_spec)

    # Runs as a normal node.
    execution_preparation_result = self._prepare_execution()
    (execution_info, contexts,
     is_execution_needed) = (execution_preparation_result.execution_info,
                             execution_preparation_result.contexts,
                             execution_preparation_result.is_execution_needed)
    if is_execution_needed:
      executor_watcher = None
      try:
        if self._executor_operator:
          # Create an execution watcher and save an in memory copy of the
          # Execution object to execution to it. Launcher calls executor
          # operator in process, thus there won't be race condition between the
          # execution watcher and the launcher to write to MLMD.
          executor_watcher = execution_watcher.ExecutionWatcher(
              port=portpicker.pick_unused_port(),
              mlmd_connection=self._mlmd_connection,
              execution=execution_preparation_result.execution_metadata,
              creds=grpc.local_server_credentials())
          self._executor_operator.with_execution_watcher(
              executor_watcher.address)
          executor_watcher.start()
        executor_output = self._run_executor(execution_info)
      except Exception as e:  # pylint: disable=broad-except
        execution_output = (
            e.executor_output if isinstance(e, _ExecutionFailedError) else None)
        self._publish_failed_execution(execution_info.execution_id, contexts,
                                       execution_output)
        logging.error('Execution %d failed.', execution_info.execution_id)
        raise
      finally:
        self._clean_up_stateless_execution_info(execution_info)
        if executor_watcher:
          executor_watcher.stop()

      logging.info('Execution %d succeeded.', execution_info.execution_id)
      self._clean_up_stateful_execution_info(execution_info)

      logging.info('Publishing output artifacts %s for execution %s',
                   execution_info.output_dict, execution_info.execution_id)

      self._publish_successful_execution(execution_info.execution_id, contexts,  ## (1) 
                                         execution_info.output_dict,
                                         executor_output)
    return execution_info

# tfx/orchestration/portable/launcher.py
  def _publish_successful_execution(
      self, execution_id: int, contexts: List[metadata_store_pb2.Context],
      output_dict: typing_utils.ArtifactMultiMap,
      executor_output: execution_result_pb2.ExecutorOutput) -> None:
    """Publishes succeeded execution result to ml metadata."""
    with self._mlmd_connection as m:
      execution_publish_utils.publish_succeeded_execution(     ## (2)
          metadata_handler=m,
          execution_id=execution_id,
          contexts=contexts,
          output_artifacts=output_dict,
          executor_output=executor_output)

# tfx/orchetration/portable/execution_publish_utils.py
def publish_succeeded_execution(
    metadata_handler: metadata.Metadata,
    execution_id: int,
    contexts: Sequence[metadata_store_pb2.Context],
    output_artifacts: Optional[typing_utils.ArtifactMultiMap] = None,
    executor_output: Optional[execution_result_pb2.ExecutorOutput] = None
) -> Optional[typing_utils.ArtifactMultiMap]:
  """Marks an existing execution as success.

  Also publishes the output artifacts produced by the execution. This method
  will also merge the executor produced info into system generated output
  artifacts. The `last_know_state` of the execution will be changed to
  `COMPLETE` and the output artifacts will be marked as `LIVE`.
  """
  unpacked_output_artifacts = None if executor_output is None else (
      data_types_utils.unpack_executor_output_artifacts(
          executor_output.output_artifacts))
  output_artifacts_to_publish = merge_utils.merge_updated_output_artifacts(
      output_artifacts, unpacked_output_artifacts)

  for artifact in itertools.chain(*output_artifacts_to_publish.values()):
    # Mark output artifact as PUBLISHED (LIVE in MLMD) if it was not in state
    # REFERENCE.
    if artifact.state != types.artifact.ArtifactState.REFERENCE:
      artifact.state = types.artifact.ArtifactState.PUBLISHED

  [execution] = metadata_handler.store.get_executions_by_id([execution_id])
  execution.last_known_state = metadata_store_pb2.Execution.COMPLETE
  if executor_output:
    for key, value in executor_output.execution_properties.items():
      execution.custom_properties[key].CopyFrom(value)
  set_execution_result_if_not_empty(executor_output, execution)

  execution_lib.put_execution(    ##(3)
      metadata_handler,
      execution,
      contexts,
      output_artifacts=output_artifacts_to_publish)

  return output_artifacts_to_publish


def put_execution(
    metadata_handler: metadata.Metadata,
    execution: metadata_store_pb2.Execution,
    contexts: Sequence[metadata_store_pb2.Context],
    input_artifacts: Optional[typing_utils.ArtifactMultiMap] = None,
    output_artifacts: Optional[typing_utils.ArtifactMultiMap] = None,
    input_event_type: metadata_store_pb2.Event.Type = metadata_store_pb2.Event
    .INPUT,
    output_event_type: metadata_store_pb2.Event.Type = metadata_store_pb2.Event
    .OUTPUT
) -> metadata_store_pb2.Execution:
  """Writes an execution-centric subgraph to MLMD.

  This function mainly leverages metadata.put_execution() method to write the
  execution centric subgraph to MLMD.
  """
  start_time = time.time()
  artifact_and_events = []
  ...
  if output_artifacts:
    outputs_utils.tag_output_artifacts_with_version(output_artifacts)
    artifact_and_events.extend(
        _create_artifact_and_event_pairs(
            metadata_handler=metadata_handler,
            artifact_dict=output_artifacts,
            event_type=output_event_type))
  execution_id, artifact_ids, contexts_ids = (
      metadata_handler.store.put_execution(        ## (4)
          execution=execution,
          artifact_and_events=artifact_and_events,
          contexts=contexts,
          reuse_context_if_already_exist=True,
          reuse_artifact_if_already_exist_by_external_id=True))
  ...
 
 # ml-metadata/ml_metadata/metadata_store/metadata_store.py 
  def put_execution(
      self,
      execution: proto.Execution,
      artifact_and_events: Sequence[
          Tuple[proto.Artifact, Optional[proto.Event]]
      ],
      contexts: Optional[Sequence[proto.Context]],
      reuse_context_if_already_exist: bool = False,
      reuse_artifact_if_already_exist_by_external_id: bool = False,
      force_reuse_context: bool = False,
      force_update_time: bool = False,
      extra_options: Optional[ExtraOptions] = None,
  ) -> Tuple[int, List[int], List[int]]:
    """Inserts or updates an Execution with artifacts, events and contexts.

    In contrast with other put methods, the method update an
    execution atomically with its input/output artifacts and events and adds
    attributions and associations to related contexts.

    If an execution_id, artifact_id or context_id is specified, it is an update,
    otherwise it does an insertion.

    It is not guaranteed that the created or updated executions, artifacts,
    contexts and events will share the same `create_time_since_epoch`,
    `last_update_time_since_epoch`, or `milliseconds_since_epoch` timestamps.
    """
    del extra_options
    request = metadata_store_service_pb2.PutExecutionRequest(
        execution=execution,
        contexts=contexts,
        options=metadata_store_service_pb2.PutExecutionRequest.Options(
            reuse_context_if_already_exist=reuse_context_if_already_exist,
            reuse_artifact_if_already_exist_by_external_id=(
                reuse_artifact_if_already_exist_by_external_id
            ),
            force_reuse_context=force_reuse_context,
            force_update_time=force_update_time,
        ),
    )
    # Add artifact_and_event pairs to the request.
    for pair in artifact_and_events:
      if pair:
        request.artifact_event_pairs.add(
            artifact=pair[0], event=pair[1] if len(pair) == 2 else None)
    response = metadata_store_service_pb2.PutExecutionResponse()
    self._call('PutExecution', request, response)                 ## (5)
    artifact_ids = list(response.artifact_ids)
    context_ids = list(response.context_ids)
    return response.execution_id, artifact_ids, context_ids


# ml-metadata/ml_metadata/metadata_store/metadata_store.py 
  def _call(self, method_name, request, response, extra_options=None):
    """Calls method with retry when Aborted error is returned.
    """
    del extra_options
    num_retries = self._max_num_retries
    avg_delay_sec = 2
    while True:
      try:
        return self._call_method(method_name, request, response)      ## (6)
      except errors.AbortedError:
        num_retries -= 1
        if num_retries == 0:
          logging.log(logging.ERROR, '%s failed after retrying %d times.',
                      method_name, self._max_num_retries)
          raise
        wait_seconds = random.expovariate(1.0 / avg_delay_sec)
        logging.log(logging.INFO, 'mlmd client retry in %f secs', wait_seconds)
        time.sleep(wait_seconds)


  def _call_method(self, method_name, request, response) -> None:
    """Calls method using wrapped C++ library or gRPC.

    Args:
      method_name: the method to call in wrapped C++ library or gRPC.
      request: a protobuf message, serialized and sent to the method.
      response: a protobuf message, filled from the return value of the method.
    """
    if self._using_db_connection:
      cc_method = getattr(metadata_store_serialized, method_name)     ## (7)
      self._pywrap_cc_call(cc_method, request, response)
    else:
      grpc_method = getattr(self._metadata_store_stub, method_name)   ## (7)
      try:
        response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec))
      except grpc.RpcError as e:
        # RpcError code uses a tuple to specify error code and short
        # description.
        # https://grpc.github.io/grpc/python/_modules/grpc.html#StatusCode
        raise errors.make_exception(e.details(), e.code().value[0]) from e  # pytype: disable=attribute-error


# github-repository  ml-metadata/ml_metadata/metadata_store/metadata_store.cc
absl::Status MetadataStore::PutExecution(const PutExecutionRequest& request,   ## (7)
                                         PutExecutionResponse* response) {
  return transaction_executor_->Execute([this, &request,
                                         &response]() -> absl::Status {
    response->Clear();
    if (!request.has_execution()) {
      return absl::InvalidArgumentError(
          absl::StrCat("No execution is found: ", request.DebugString()));
    }

    std::vector<PutExecutionRequest::ArtifactAndEvent> artifact_event_pairs(
        request.artifact_event_pairs().begin(),
        request.artifact_event_pairs().end());

    // 1. Upsert Artifacts.
    for (PutExecutionRequest::ArtifactAndEvent& artifact_and_event :
         artifact_event_pairs) {
      if (!artifact_and_event.has_artifact()) continue;

      int64_t artifact_id = -1;
      MLMD_RETURN_IF_ERROR(UpsertArtifact(
          artifact_and_event.artifact(), metadata_access_object_.get(),
          /*skip_type_and_property_validation=*/false,
          google::protobuf::FieldMask(),
          request.options().reuse_artifact_if_already_exist_by_external_id(),
          &artifact_id));
      artifact_and_event.mutable_artifact()->set_id(artifact_id);
    }

    // 2. Upsert Execution.
    int64_t execution_id = -1;
    MLMD_RETURN_IF_ERROR(
        UpsertExecution(request.execution(), metadata_access_object_.get(),
                        /*skip_type_and_property_validation=*/false,
                        request.options().force_update_time(),
                        google::protobuf::FieldMask(), &execution_id));
    response->set_execution_id(execution_id);

    // 3. Insert events.
    for (const PutExecutionRequest::ArtifactAndEvent& artifact_and_event :
         artifact_event_pairs) {
      MLMD_RETURN_IF_ERROR(InsertEvent(artifact_and_event, execution_id,      ## (8)
                                       metadata_access_object_.get()));

      if (artifact_and_event.has_artifact()) {
        response->add_artifact_ids(artifact_and_event.artifact().id());
      } else if (artifact_and_event.has_event()) {
        response->add_artifact_ids(artifact_and_event.event().artifact_id());
      } else {
        // It is valid to have empty artifact and event pair, i.e. both artifact
        // and event are missing. In such a case, we return -1.
        response->add_artifact_ids(-1);
      }
    }

    // 4. Upsert contexts and insert associations and attributions.
    absl::flat_hash_set<int64_t> artifact_ids(response->artifact_ids().begin(),
                                              response->artifact_ids().end());
    for (const Context& context : request.contexts()) {
      int64_t context_id = -1;

      if (context.has_id() && request.options().force_reuse_context()) {
        context_id = context.id();
        std::vector<Context> contexts;
        const absl::Status status =
            metadata_access_object_->FindContextsById({context_id}, &contexts);
        MLMD_RETURN_IF_ERROR(status);
        if ((contexts.size() != 1) || (contexts[0].id() != context_id)) {
          return absl::NotFoundError(absl::StrCat(
              "Context with ID ", context_id, " was not found in MLMD"));
        }
      } else {
        const absl::Status status = UpsertContextWithOptions(
            context, metadata_access_object_.get(),
            request.options().reuse_context_if_already_exist(),
            /*skip_type_and_property_validation=*/false, &context_id);
        MLMD_RETURN_IF_ERROR(status);
      }

      response->add_context_ids(context_id);
      MLMD_RETURN_IF_ERROR(InsertAssociationIfNotExist(
          context_id, response->execution_id(), /*is_already_validated=*/true,
          metadata_access_object_.get()));
      for (const int64_t artifact_id : artifact_ids) {
        MLMD_RETURN_IF_ERROR(InsertAttributionIfNotExist(
            context_id, artifact_id, /*is_already_validated=*/true,
            metadata_access_object_.get()));
      }
    }
    return absl::OkStatus();
  },
  request.transaction_options());
}


# github-repository  ml-metadata/ml_metadata/metadata_store/metadata_store.cc
absl::Status InsertEvent(
    const PutExecutionRequest::ArtifactAndEvent& artifact_and_event,
    const int64_t execution_id, MetadataAccessObject* metadata_access_object) {
  ...
  // Validate execution and event.
  Event event(artifact_and_event.event());
  ...
  event.set_execution_id(execution_id);

  // Validate artifact and event.
  if (artifact_and_event.has_artifact()) {
    ...  
    event.set_artifact_id(artifact_and_event.artifact().id());
  } 
  ... 
  // Create an Event.
  int64_t dummy_event_id = -1;               #### event_id = 1 comes from here.
  MLMD_RETURN_IF_ERROR(metadata_access_object->CreateEvent(    ## (9)
      event,
      /*is_already_validated=*/true, &dummy_event_id));

  return absl::OkStatus();
}


absl::Status RDBMSMetadataAccessObject::CreateEvent(    ## (9)
    const Event& event, const bool is_already_validated, int64_t* event_id) {
  // validate the given event
  ...

  // insert an event and get its given id
  int64_t event_time = event.has_milliseconds_since_epoch()
                           ? event.milliseconds_since_epoch()
                           : absl::ToUnixMillis(absl::Now());

  const absl::Status status =
      executor_->InsertEvent(event.artifact_id(), event.execution_id(),   ## (10) 
                             event.type(), event_time, event_id           
                                                                      
  if (IsUniqueConstraintViolated(status)) {
    return absl::AlreadyExistsError(
        absl::StrCat("Given event already exists: ", event.DebugString(),
                     status.ToString()));
  }
  // insert event paths
  for (const Event::Path::Step& step : event.path().steps()) {
    // step value oneof
    MLMD_RETURN_IF_ERROR(executor_->InsertEventPath(*event_id, step));
  }
  return absl::OkStatus();
}



ml-metadata/ml_metadata/metatada_store/query_config_executor.h
 absl::Status InsertEvent(int64_t artifact_id, int64_t execution_id,
                           int event_type, int64_t event_time_milliseconds,
                           int64_t* event_id) final {
    return ExecuteQuerySelectLastInsertID(
        query_config_.insert_event(),
        {Bind(artifact_id), Bind(execution_id), Bind(event_type),
         Bind(event_time_milliseconds)},
        event_id);
  }
 
 
  
ml-metadata/ml_metadata/util/metadata_source_query_config.cc
  insert_event {
    query: " INSERT INTO Event( "
           "   artifact_id, execution_id, type, "
           "   milliseconds_since_epoch "
           ") VALUES($0, $1, $2, $3);"
    parameter_num: 4
  }

Note:

const absl::Status status = 
executor_->InsertEvent(event.artifact_id(), event.execution_id(),   ## (10) , 
                             event.type(), event_time, event_id

This sentence is not ok, and its status is database error message, and here, only checked IsUniqueConstraintViolated(status), other kind of database error has not been checked, just continue executing the next sentence, this is why in table EventPath, event_id are all -1, and other columns are ok.

All other sentences are ok since in database metadb, tables Artifact, Contexts, Executions, Association, EventPath, have right records in them.

4. put a print in function self._call(), to see if it is called by both execution_lib.put_execution() and tfx.orchestration.LocalDagRunner().run()

# in pycharm
if __name__ == "__main__":
    host = "10.244.0.119"
    port = 3306
    database="metadb"
    username="root"
    password=""

    mysql_metadb_connection_config = mysql_metadata_connection_config(host, port, database, username, password)
   
    metadata_handler = get_mlmd_handle(mysql_metadb_connection_config)

    execution_id = 84
    with metadata_handler:
        [execution] = metadata_handler.store.get_executions_by_id([execution_id])

    context_pipeline = metadata_store_pb2.Context()

    context_pipeline.type = "pipeline"
    context_pipeline.name = "detect_anomolies_on_wafer_tfdv_schema"
    context_pipeline.type_id = 11

    context_pipeline_run = metadata_store_pb2.Context()
    context_pipeline_run.type = "pipeline_run"
    context_pipeline_run.name = "detect-anomolies-on-wafer-tfdv-schema-9xwkk"
    context_pipeline_run.type_id = 12

    context_node = metadata_store_pb2.Context()
    context_node.type = "node"
    context_node.name = "detect_anomolies_on_wafer_tfdv_schema.ImportExampleGen"
    context_node.type_id = 13

    contexts = [context_pipeline, context_pipeline_run, context_node]

    artifact_ids = [20]
    with metadata_handler:
        artifacts = artifact_lib.get_artifacts_by_ids(
            metadata_handler, artifact_ids
        )

    artifact_multimap = {'examples': artifacts}

    with metadata_handler:
        put_execution(                    #### this is execution_lib.put_execution()
            metadata_handler=metadata_handler,
            execution=execution,
            contexts=contexts,
            output_artifacts=artifact_multimap,
        )

maye: response after self._call('PutExecution', request, respoonse):
execution_id: 84
artifact_ids: 20
context_ids: 16
context_ids: 59
context_ids: 19
Process finished with exit code 0

Though in this case, self._using_db_connection = True, it has the same exception: table Event is empty, event_id in table EventPath are all -1, other tables in database metadb are ok.

Note:

if self._using_db_connection:
      cc_method = getattr(metadata_store_serialized, method_name)     ## (7)
      self._pywrap_cc_call(cc_method, request, response)
    else:
      grpc_method = getattr(self._metadata_store_stub, method_name)   ## (7)
      try:
        response.CopyFrom(grpc_method(request, timeout=self._grpc_timeout_sec))

tfx.orchestration.LocalDagRunner().run(
    _create_schema_pipeline(
        pipeline_name=SCHEMA_PIPELINE_NAME,
        pipeline_root=SCHEMA_PIPELINE_ROOT,
        data_root=DATA_ROOT,
        schema_path=SCHEMA_PATH,
        metadata_path=SCHEMA_METADATA_PATH,
        module_file=_trainer_module_file,
        serving_model_dir=SERVING_MODEL_DIR,
    )
) 

maye: response after self._call('PutExecution', request, respoonse):
execution_id: 293
artifact_ids: 452
context_ids: 1
context_ids: 258
context_ids: 59
context_ids: 60

In summary, both execution_lib.put_execution() (which is called by running tfx pipeline using kubeflow) and tfx.orchestration.LocalDagRunner().run() call the same function self._call(), and the following calls have no need for different branches for them, namely, they walk the same way from self._call(), so the ERROR should comes from mysql (metadata database server used by kubeflow, tfx.orchestration.LocalDagRunner().run() uses sqlite3).

5. insert event into table Event using sql sentence, to see if ok.

mysql> INSERT INTO Event (artifact_id, execution_id, type, milliseconds_since_epoch) VALUES(20, 84, 4, 1706351771717);

ERROR 1264 (22003): Out of range value for column 'milliseconds_since_epoch' at row 1
mysql>

This error is due to that the precision of value of column milliseconds_since_epoch is millisecond, which needs int64 to store, but my table Event, column milliseconds_since_epoch is int.

And, only IsUniqueConstraintViolated(status) is checked, do nothing with other database error, so the "ERROR 1264 (22003): Out of range value for column 'milliseconds_since_epoch'" is not raised.
(here, status is database error message)

Note:

const absl::Status status =
      executor_->InsertEvent(event.artifact_id(), event.execution_id(),   ## (10) 
                             event.type(), event_time, event_id           
                                                                      
  if (IsUniqueConstraintViolated(status)) {
    return absl::AlreadyExistsError(
        absl::StrCat("Given event already exists: ", event.DebugString(),
                     status.ToString()));
  }

[SOLUTION]

mysql> ALTER TABLE Event MODIFY COLUMN milliseconds_since_epoch BIGINT;

posted on 2024-01-29 19:30 zhenxia-jiuyou 阅读(21) 评论(0) 编辑收藏举报

刷新页面返回顶部

导航

Debug: tfx: ERROR:absl:[StatisticsGen] Input resolution error

[ERROR:absl:[StatisticsGen] Input resolution error: Error while resolving inputs for StatisticsGen: InputSpec min_count has not met: inputs[examples] has min_count = 1 but only got 0 artifacts. (Artifact IDs: [])]

[ANALYSIS]

1. see python source code of the sentence who raised the exception, to find out how tfx resolves input artifacts.

2. since at last, artifact_ids are gotten from event, see if event record in database metadb is ok.

3. So when to Insert event into table Event

4. put a print in function self._call(), to see if it is called by both execution_lib.put_execution() and tfx.orchestration.LocalDagRunner().run()

5. insert event into table Event using sql sentence, to see if ok.

[SOLUTION]