POST /infer
Send a natural-language prompt and one or more spec IDs. Get back a curated answer.
Endpoint
POST /api/v1/inferHeaders
| Header | Value |
|---|---|
Content-Type | application/json |
Authorization | Bearer <token> (from /api/v1/authorize) |
Request body
{ "prompt": "Brief affordability assessment for the customer.", "spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"], "session_id": "optional-string"}| Field | Type | Required | Notes |
|---|---|---|---|
prompt | string | yes | The question or instruction for the agent. At least one character. |
spec_ids | string array | yes | One or more semantic spec IDs the agent should query against. All specs must belong to your customer. |
session_id | string | no | Optional client-supplied identifier for a logical session. If omitted, the API generates one per call. |
Where do spec_ids come from?
Spec IDs are created when a Customer Super Admin trains a semantic spec from a project’s data in the portal. The spec ID is the unique identifier of that trained model. Your integration either:
- Hardcodes specific spec IDs that match your use case (e.g. a “credit risk” spec).
- Discovers them dynamically — your operator hands you the spec ID at integration time.
Spec IDs are stable; they don’t change once a spec is trained.
About session_id
Each call is treated as stateless. Including a session_id doesn’t automatically prepend prior turns to the prompt — if you want multi-turn continuity, include the relevant prior context in the prompt field yourself.
The session_id is still useful for grouping calls under one logical conversation in the audit panel.
Successful response
200 OK
{ "correlation_id": "api:7f1ee5...", "result": "<the agent's answer, plain text>", "exposure": "advisory", "spec_count": 1}| Field | Type | Notes |
|---|---|---|
correlation_id | string | Unique identifier for this call. Quote it when filing support tickets — it ties to a specific entry in the audit panel. |
result | string | The agent’s curated answer. Plain text; render however your application needs. |
exposure | string | The active LLM exposure level for this call: full, limited, or advisory. |
spec_count | integer | Number of specs the call was scoped to (after access checks). |
The exposure field tells your application what to expect in result:
full— exact values, records, and aggregates may appear in the answer.limited— qualitative summaries; specific values are masked.advisory— qualitative business guidance only; no numbers or record details.
The agent already gates its output to match the active exposure — you don’t need to filter the response. Use the field to decide how to present the answer (e.g. don’t try to extract numbers from an advisory response).
Errors
| Status | Body | When |
|---|---|---|
400 | { "detail": "spec_ids resolved to empty after access scoping" } | The supplied spec_ids were all invalid or filtered out. |
400 | validation error | The request body is missing a field or has the wrong type. |
401 | { "detail": "missing Authorization header" } | No Authorization header on the request. |
401 | { "detail": "invalid Authorization header" } | Header present but not in Bearer <token> format. |
401 | { "detail": "invalid token" } | Token doesn’t validate. Re-authorise. |
401 | { "detail": "token expired" } | The token is past its 24-hour lifetime. Call /api/v1/authorize again and retry. |
403 | { "detail": "one or more spec_ids are not owned by your customer" } | At least one spec ID belongs to a different customer. The whole call is rejected — no partial answers. |
503 | { "detail": "database unavailable" } | Transient infrastructure error. Retry with backoff. |
A 403 is loud on purpose. Unlike the portal — which silently drops specs the user can’t access — the API rejects any cross-customer attempt outright. If you see this, audit which spec IDs your application is sending.
Latency
A single call typically takes between 5 and 60 seconds. The agent runs a multi-step inference workflow (router, data tools, evaluation, curation) so latency varies with prompt complexity and the size of the data being queried.
Set generous client-side timeouts (90 seconds is a safe default for production). Don’t retry mid-flight — the same correlation ID won’t appear in audit if you abort and restart.
Example: end-to-end
curl (using the token from authorize)
TOKEN=$(curl -s https://api.rebelcore.ai/api/v1/authorize \ -H "Content-Type: application/json" \ -d '{"username":"...@api.rebelcore.local","password":"..."}' \ | jq -r .token)
curl https://api.rebelcore.ai/api/v1/infer \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d '{ "prompt": "Brief affordability assessment for ABC500", "spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"] }'Python (httpx)
import httpx
BASE = "https://api.rebelcore.ai"
with httpx.Client(timeout=90.0) as client: auth = client.post( f"{BASE}/api/v1/authorize", json={"username": "...", "password": "..."}, ) auth.raise_for_status() token = auth.json()["token"]
r = client.post( f"{BASE}/api/v1/infer", headers={"Authorization": f"Bearer {token}"}, json={ "prompt": "Brief affordability assessment for ABC500", "spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"], }, ) if r.status_code == 401: # Token expired or invalid — re-authorise and retry once. ... r.raise_for_status() print(r.json()["result"])Node.js (fetch)
const BASE = "https://api.rebelcore.ai";
const auth = await fetch(`${BASE}/api/v1/authorize`, { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ username: "...", password: "..." }),}).then((r) => r.json());
const out = await fetch(`${BASE}/api/v1/infer`, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${auth.token}`, }, body: JSON.stringify({ prompt: "Brief affordability assessment for ABC500", spec_ids: ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"], }),}).then((r) => r.json());
console.log(out.result);Best practices
- Cache the token. Store the bearer in memory and reuse it for the full 24 hours. Don’t call
/authorizeon every request. - Re-authorise on
401, not before. Trying to predict expiry is fragile; just react to the status code. - Send tight
spec_ids. Include only the specs relevant to the question. Fewer specs = faster inference and cleaner audit trail. - Persist the
correlation_id. Log it alongside your application’s own request id. When something needs investigating, the audit panel and your logs can be cross-referenced in seconds. - Surface the
exposurefield to your end-users. If you build a UI, telling the user “this answer is based on advisory-level access” prevents misinterpretation.
Next
- Provisioning more API users — one per environment / service is a good baseline.
- LLM exposure levels — full breakdown of what each level allows.
- Audit panel — where API calls show up for review.