Skip to content

POST /infer

Send a natural-language prompt and one or more spec IDs. Get back a curated answer.

Endpoint

POST /api/v1/infer

Headers

HeaderValue
Content-Typeapplication/json
AuthorizationBearer <token> (from /api/v1/authorize)

Request body

{
"prompt": "Brief affordability assessment for the customer.",
"spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"],
"session_id": "optional-string"
}
FieldTypeRequiredNotes
promptstringyesThe question or instruction for the agent. At least one character.
spec_idsstring arrayyesOne or more semantic spec IDs the agent should query against. All specs must belong to your customer.
session_idstringnoOptional client-supplied identifier for a logical session. If omitted, the API generates one per call.

Where do spec_ids come from?

Spec IDs are created when a Customer Super Admin trains a semantic spec from a project’s data in the portal. The spec ID is the unique identifier of that trained model. Your integration either:

  • Hardcodes specific spec IDs that match your use case (e.g. a “credit risk” spec).
  • Discovers them dynamically — your operator hands you the spec ID at integration time.

Spec IDs are stable; they don’t change once a spec is trained.

About session_id

Each call is treated as stateless. Including a session_id doesn’t automatically prepend prior turns to the prompt — if you want multi-turn continuity, include the relevant prior context in the prompt field yourself.

The session_id is still useful for grouping calls under one logical conversation in the audit panel.

Successful response

200 OK

{
"correlation_id": "api:7f1ee5...",
"result": "<the agent's answer, plain text>",
"exposure": "advisory",
"spec_count": 1
}
FieldTypeNotes
correlation_idstringUnique identifier for this call. Quote it when filing support tickets — it ties to a specific entry in the audit panel.
resultstringThe agent’s curated answer. Plain text; render however your application needs.
exposurestringThe active LLM exposure level for this call: full, limited, or advisory.
spec_countintegerNumber of specs the call was scoped to (after access checks).

The exposure field tells your application what to expect in result:

  • full — exact values, records, and aggregates may appear in the answer.
  • limited — qualitative summaries; specific values are masked.
  • advisory — qualitative business guidance only; no numbers or record details.

The agent already gates its output to match the active exposure — you don’t need to filter the response. Use the field to decide how to present the answer (e.g. don’t try to extract numbers from an advisory response).

Errors

StatusBodyWhen
400{ "detail": "spec_ids resolved to empty after access scoping" }The supplied spec_ids were all invalid or filtered out.
400validation errorThe request body is missing a field or has the wrong type.
401{ "detail": "missing Authorization header" }No Authorization header on the request.
401{ "detail": "invalid Authorization header" }Header present but not in Bearer <token> format.
401{ "detail": "invalid token" }Token doesn’t validate. Re-authorise.
401{ "detail": "token expired" }The token is past its 24-hour lifetime. Call /api/v1/authorize again and retry.
403{ "detail": "one or more spec_ids are not owned by your customer" }At least one spec ID belongs to a different customer. The whole call is rejected — no partial answers.
503{ "detail": "database unavailable" }Transient infrastructure error. Retry with backoff.

A 403 is loud on purpose. Unlike the portal — which silently drops specs the user can’t access — the API rejects any cross-customer attempt outright. If you see this, audit which spec IDs your application is sending.

Latency

A single call typically takes between 5 and 60 seconds. The agent runs a multi-step inference workflow (router, data tools, evaluation, curation) so latency varies with prompt complexity and the size of the data being queried.

Set generous client-side timeouts (90 seconds is a safe default for production). Don’t retry mid-flight — the same correlation ID won’t appear in audit if you abort and restart.

Example: end-to-end

curl (using the token from authorize)

Terminal window
TOKEN=$(curl -s https://api.rebelcore.ai/api/v1/authorize \
-H "Content-Type: application/json" \
-d '{"username":"...@api.rebelcore.local","password":"..."}' \
| jq -r .token)
curl https://api.rebelcore.ai/api/v1/infer \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"prompt": "Brief affordability assessment for ABC500",
"spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"]
}'

Python (httpx)

import httpx
BASE = "https://api.rebelcore.ai"
with httpx.Client(timeout=90.0) as client:
auth = client.post(
f"{BASE}/api/v1/authorize",
json={"username": "...", "password": "..."},
)
auth.raise_for_status()
token = auth.json()["token"]
r = client.post(
f"{BASE}/api/v1/infer",
headers={"Authorization": f"Bearer {token}"},
json={
"prompt": "Brief affordability assessment for ABC500",
"spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"],
},
)
if r.status_code == 401:
# Token expired or invalid — re-authorise and retry once.
...
r.raise_for_status()
print(r.json()["result"])

Node.js (fetch)

const BASE = "https://api.rebelcore.ai";
const auth = await fetch(`${BASE}/api/v1/authorize`, {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ username: "...", password: "..." }),
}).then((r) => r.json());
const out = await fetch(`${BASE}/api/v1/infer`, {
method: "POST",
headers: {
"Content-Type": "application/json",
Authorization: `Bearer ${auth.token}`,
},
body: JSON.stringify({
prompt: "Brief affordability assessment for ABC500",
spec_ids: ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"],
}),
}).then((r) => r.json());
console.log(out.result);

Best practices

  • Cache the token. Store the bearer in memory and reuse it for the full 24 hours. Don’t call /authorize on every request.
  • Re-authorise on 401, not before. Trying to predict expiry is fragile; just react to the status code.
  • Send tight spec_ids. Include only the specs relevant to the question. Fewer specs = faster inference and cleaner audit trail.
  • Persist the correlation_id. Log it alongside your application’s own request id. When something needs investigating, the audit panel and your logs can be cross-referenced in seconds.
  • Surface the exposure field to your end-users. If you build a UI, telling the user “this answer is based on advisory-level access” prevents misinterpretation.

Next