Semantic dataset
A semantic dataset is what RebelCore™ builds when you press Build Semantic Dataset on a Tree node. It’s the artefact that turns curated, labelled gold data into something the RebelCore™ Agent can actually reason over.
Where it sits in the flow
Imported data → Tree → Semantic dataset → RebelCore™ Agent(bronze/silver) (gold) (vectorised gold) (RAG inference)- Imported data is the raw upload — files, sheets, columns, rows.
- The Tree is the gold abstraction over that data — labelled, curated, hierarchical.
- The semantic dataset is the gold layer turned into vectors.
- The Agent queries those vectors to answer questions.
If the Tree is the map of your data, the semantic dataset is the territory the agent walks across.
What it actually represents
A semantic dataset is a named, versioned bundle of:
- A spec — the curation choices you made: which sheets, which columns, which labels, and the weight each one carries.
- The vectors — embedding rows produced from your selected data.
- Metadata — when it was built, who built it, how many rows it covers, and its run status.
Each project can have multiple semantic datasets — different curated slices for different questions. They live side-by-side in the Tree’s Data tab.
How to build one
- Open a project in the Tree.
- Tick the leaves you want to include — use the Details panel to watch the weights as you go.
- Click any node to open the Blade, then switch to the Data tab.
- Type a name and press Build Semantic Dataset.
- Watch the status pill change as the run progresses.
The build runs as a background workflow, so you can keep working in the Tree while it executes.
Lifecycle
Each semantic dataset moves through a small set of states, visible as a coloured pill on the card:
| State | Meaning |
|---|---|
| Pending / Running | The build workflow is in flight. The card shows progress and you can’t query it yet. |
| Ready | Vectors are written and the dataset is queryable. The agent will pick it up automatically. |
| Failed | Something went wrong during build. The card shows the error message; rebuild after fixing the upstream problem. |
The cards on the Data tab show name, short ID, relative time, and current state at a glance.
Querying a semantic dataset
Once a semantic dataset is Ready, the RebelCore™ Agent can query it. From the Tree:
- The Question in RebelCore™ Agent button on each Ready card opens the agent in a new tab with the right project and dataset preselected.
- The agent’s responses are grounded in the vectors of that dataset — that’s the RAG pattern.
You can also reach the agent directly at https://agent.rebelcore.ai and select the dataset from the agent’s UI.
Deleting a semantic dataset
Each card has a hover-only X that deletes the dataset, its spec, and its vector data. The action is gated by the Delete Semantic Dataset permission — see Roles & permissions for the full permission map.
Deleting is permanent at the gold layer. The underlying silver datasets and bronze imports are untouched.
Why this design
Splitting “what the data means” (the Tree) from “what the agent queries” (the semantic dataset) gives you three useful things:
- Reproducibility. The spec records exactly what went into a vector set. Rebuilding from the same selection gives you the same dataset.
- Multiple slices. A project can carry several semantic datasets — one tightly scoped, one broad, one experimental — and the agent can switch between them per session.
- A clean audit boundary. Every agent prompt is tied to a specific semantic dataset and is captured by the Audit trail. You always know which curated slice produced which answer.
What’s next
- Details & weights — how to read the selection weight before pressing Build.
- Tree chat & AI suggestions — let the advisor refine which leaves go into the semantic dataset.
- RebelCore™ Agent — how the agent consumes the dataset once it’s Ready.