DFS Pro Fusion Tasks

Fusion tasks combine data from multiple datasets. Use them when one workflow needs to match, merge, compare, enrich, or reconcile records from different sources.

Examples:

combine sensor history with maintenance history;
match inspection findings to work orders;
align source-system asset IDs with FactVerse assets;
merge operational records from several sites;
prepare a reviewed output dataset for predictive maintenance or AI Agent workflows.

Prerequisites

Before creating a fusion task, confirm:

input datasets exist and are accessible;
each input dataset has a known steward or source owner;
key fields, timestamp fields, and identity fields are understood;
expected output dataset name and owner are defined;
matching mode and review threshold are agreed;
reviewers are available when the task can produce conflicts or low-confidence results.

Fusion task flow

Open Data Fusion

Go to:

Data Integration > Data Fusion

The page shows fusion tasks, mode, status, output dataset, and run actions.

Fusion modes

Mode	Use when
Rule Matching	Matching logic is deterministic, such as same asset ID, same timestamp window, or known key columns.
Semantic Matching	Names, aliases, descriptions, or relationships need to be compared.
LLM Assisted	The task needs language-based assistance and every uncertain result will be reviewed.

Use rule matching first when stable keys are available. Use semantic or LLM-assisted modes when source records use different names, aliases, or descriptions.

Large-source fusion controls

Large operational datasets can be fused through background execution instead of a browser-sized result transfer. For supported methods such as merge_by_natural_key, DFS Pro can run asynchronously, stream source records in chunks, and persist output rows directly into the target dataset.

In the UI this still appears as one fusion run. The run may stay in queued or running status while source rows are processed. Review the run history after completion for totals, conflict counts, persisted row counts, and any error message.

Use this path when:

source tables are too large for preview-style execution;
the output should be written to a governed dataset;
reviewers need run history and conflict counts rather than a downloaded result file;
the same task will be scheduled or rerun after source refresh.

Source row filters

Some sources contain rows outside the scope of a specific fusion task. A fusion method can carry source_row_filters in its method configuration so the run keeps only the intended source slice before matching.

Example:

{
  "source_row_filters": {
    "APCM": {
      "any": [
        { "field": "告警类型", "in": ["MMSG告警"] },
        { "field": "告警等级", "in": ["中高", "高"] }
      ]
    }
  }
}

The filter is keyed by source label. A source without a matching filter passes through unchanged. any keeps a row when at least one clause matches; all keeps a row only when every clause matches. Each clause can use in or not_in against a field value.

Treat source row filters as governed task configuration:

document the business rule behind each filter;
sample the source rows before and after filtering;
rerun baseline totals before enabling the filter on a scheduled task;
keep the raw source data available for audit and later review.

Deployments may keep source_row_filters disabled until the data owner approves the scope and capacity settings. If the environment setting is not enabled, the filter setting is ignored during dispatch.

Published rulesets and conflict fields

For methods backed by a published DFS ruleset, review the live ruleset before changing an operating task. The ruleset is the operational source for field extraction, matching rules, survivorship rules, confidence weights, and any AI-assist threshold used by the workflow.

Choose conflict fields that reflect business disagreement. Structured fields such as governed identity, asset class, operating status, severity, batch context, equipment state, timestamp bucket, or maintained object usually make better conflict signals than free-text messages or source-specific codes. Keep verbose message text and raw source codes in the evidence record so reviewers can audit the decision without inflating the conflict count.

Async execution and recovery

Fusion runs are dispatched in the background. For large streaming runs, DFS Pro accepts the job and checks for the result after dispatch, keeping the user action responsive.

If a service restart or dependency failure leaves an old run in RUNNING, the scheduler can mark the stale run as failed and unblock the task for retry. Operators should use run history to confirm the failure reason, then retry after the source, method, or capacity issue has been addressed.

Create a fusion task

Open Data Fusion.
Select Create Fusion Task.
Enter a task name.
Add a description.
Choose a fusion mode.
Select input datasets.
Select a method when the task requires reusable processing logic.
Set the output dataset name or output dataset.
Configure conflict threshold when available.
Save the task.

Use a name that describes the business output.

Examples:

Asset sensor and work order alignment
Inspection finding to maintenance record match
Equipment alias reconciliation
Predictive maintenance signal feature merge

Run the task

Use Run from the fusion task list or detail page.

During execution, a task may move through statuses such as queued, running, completed, failed, cancelled, or review.

After starting a task:

Watch status.
Open run history.
Review total, matched, and conflict counts.
Open review queue if status indicates review.
Use output dataset only after required review is complete.

Review run history

Run history helps users understand what happened during execution.

Check:

triggered by;
started at;
duration;
total records;
matched records;
conflict records;
error message when failed.

If a task fails, fix the dataset, method, mapping, or output dataset issue before retrying.

Review uncertain output

Fusion can produce conflicts, source disagreements, low-confidence matches, or manual flags. These should go through the review queue.

Reviewers should compare:

input dataset records;
matching keys;
source timestamps;
confidence;
conflict reason;
output record;
downstream impact.

Retry or cancel

Use retry after fixing a failed task. Use cancel when a queued or running task should stop because the input or configuration is wrong.

Before retry:

confirm input datasets exist and are accessible;
confirm output dataset is writable;
confirm method status is usable;
confirm any source_row_filters still match the intended source labels and field names;
check the last error message;
check whether review items remain open.

Output dataset

A completed fusion task can produce an output dataset. Treat that output as governed data:

preview rows;
profile columns;
assign a steward if it will be reused;
validate the dataset after review;
check lineage before replacing or deprecating it.

When governed identity is part of a fusion output, include the stable MDM entity ID in the output records. For example, a reliability workflow can resolve a normalized registration, tag, serial number, or maintainable-object ID through the MDM alias ledger and write the resulting entity ID beside the fused event or reliability record. Unresolved or ambiguous rows should remain visible as exceptions for steward review.

Next step

Use Review Queue to resolve fusion conflicts and rejected rows.

Prerequisites​

Fusion task flow​

Open Data Fusion​

Fusion modes​

Large-source fusion controls​

Source row filters​

Published rulesets and conflict fields​

Async execution and recovery​

Create a fusion task​

Run the task​

Review run history​

Review uncertain output​

Retry or cancel​

Output dataset​

Next step​