Skip to content

Details & weights

Click any node in the Tree and the Blade slides in with a Details tab. This page is where you see the shape of what you’re about to send into a semantic dataset build — including how much weight each leaf carries in the final vector set.

What the Details tab shows

FieldWhat it means
Label 1 … NThe breadcrumb path of labels leading to the clicked node, with generic root markers stripped out.
SelectedThe label of the node you clicked.
TypeThe node’s type, when set (e.g. column data type).
WeightThis node’s share of the current selection across the whole active tree, as a percentage.
# of VariablesHow many leaves are selected under this node, out of the total leaves under it.

How weight is calculated

The platform looks at the entire active tree and counts how many leaf nodes are currently ticked. Call that N. Then for the node you clicked:

  • If it’s a leaf and it’s selected: weight = 1 / N.
  • If it’s a leaf and it’s not selected: weight = 0.
  • If it’s a branch: weight = (selected leaves under this branch) / N.

The display reads as: 25.0% (1 / 4 selected) — the percentage plus the raw counts so you can sanity-check at a glance. Weights across every selected leaf in the tree always sum to 100%.

Why weighting matters

When a semantic dataset is built, every selected leaf becomes part of the vector representation. The signal each leaf carries into that representation is proportional to its weight.

In other words: the more columns you select, the more diluted each individual column becomes. That’s the trade-off curation is asking you to make.

4 columns selected → each carries 25% of the meaning
8 columns selected → each carries 12.5%
20 columns selected → each carries 5%

A semantic dataset built from a wide selection covers more ground but treats every column as roughly equal. A narrow selection lets a few high-signal columns dominate the vectors — better when you know exactly what matters and want the agent to lean on it.

When fewer is better

Pick a small selection when:

  • The dataset has a few critical columns and a long tail of noise (free-text descriptions, dates, IDs).
  • You want the agent to answer questions specifically about those critical fields.
  • You’ve used the AI Data Advisor and it has flagged the rest as low-signal.

A selection of 3 – 6 carefully chosen columns is often more useful for downstream inference than a selection of 30 columns “just in case”.

When more is better

Pick a wider selection when:

  • You don’t yet know which columns matter — you want the agent to discover patterns.
  • The data is reasonably uniform and every field carries some signal.
  • You need broad coverage for an exploratory session.

In those cases the dilution is the point: no one column dominates and the agent’s responses balance everything in the set.

Reading weight as you curate

The numbers update live as you tick and untick leaves anywhere in the tree. Click a leaf to see what its share looks like in the current selection. Click a parent to see how much of the overall selection is concentrated in that branch — useful when you’re weighing whether one source dataset is over-represented compared to another.

What’s next