POST /infer

Send a natural-language prompt and one or more spec IDs. Get back a curated answer.

Endpoint

POST /api/v1/infer

Headers

Header	Value
`Content-Type`	`application/json`
`Authorization`	`Bearer <token>` (from `/api/v1/authorize`)

Request body

{
  "prompt": "Brief affordability assessment for the customer.",
  "spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"],
  "session_id": "optional-string"
}

Field	Type	Required	Notes
`prompt`	string	yes	The question or instruction for the agent. At least one character.
`spec_ids`	string array	yes	One or more semantic spec IDs the agent should query against. All specs must belong to your customer.
`session_id`	string	no	Optional client-supplied identifier for a logical session. If omitted, the API generates one per call.

Where do `spec_ids` come from?

Spec IDs are created when a Customer Super Admin trains a semantic spec from a project’s data in the portal. The spec ID is the unique identifier of that trained model. Your integration either:

Hardcodes specific spec IDs that match your use case (e.g. a “credit risk” spec).
Discovers them dynamically — your operator hands you the spec ID at integration time.

Spec IDs are stable; they don’t change once a spec is trained.

About `session_id`

Each call is treated as stateless. Including a session_id doesn’t automatically prepend prior turns to the prompt — if you want multi-turn continuity, include the relevant prior context in the prompt field yourself.

The session_id is still useful for grouping calls under one logical conversation in the audit panel.

Successful response

200 OK

{
  "correlation_id": "api:7f1ee5...",
  "result": "<the agent's answer, plain text>",
  "exposure": "advisory",
  "spec_count": 1
}

Field	Type	Notes
`correlation_id`	string	Unique identifier for this call. Quote it when filing support tickets — it ties to a specific entry in the audit panel.
`result`	string	The agent’s curated answer. Plain text; render however your application needs.
`exposure`	string	The active LLM exposure level for this call: `full`, `limited`, or `advisory`.
`spec_count`	integer	Number of specs the call was scoped to (after access checks).

The exposure field tells your application what to expect in result:

full — exact values, records, and aggregates may appear in the answer.
limited — qualitative summaries; specific values are masked.
advisory — qualitative business guidance only; no numbers or record details.

The agent already gates its output to match the active exposure — you don’t need to filter the response. Use the field to decide how to present the answer (e.g. don’t try to extract numbers from an advisory response).

Errors

Status	Body	When
`400`	`{ "detail": "spec_ids resolved to empty after access scoping" }`	The supplied `spec_ids` were all invalid or filtered out.
`400`	validation error	The request body is missing a field or has the wrong type.
`401`	`{ "detail": "missing Authorization header" }`	No `Authorization` header on the request.
`401`	`{ "detail": "invalid Authorization header" }`	Header present but not in `Bearer <token>` format.
`401`	`{ "detail": "invalid token" }`	Token doesn’t validate. Re-authorise.
`401`	`{ "detail": "token expired" }`	The token is past its 24-hour lifetime. Call `/api/v1/authorize` again and retry.
`403`	`{ "detail": "one or more spec_ids are not owned by your customer" }`	At least one spec ID belongs to a different customer. The whole call is rejected — no partial answers.
`503`	`{ "detail": "database unavailable" }`	Transient infrastructure error. Retry with backoff.

A 403 is loud on purpose. Unlike the portal — which silently drops specs the user can’t access — the API rejects any cross-customer attempt outright. If you see this, audit which spec IDs your application is sending.

Latency

A single call typically takes between 5 and 60 seconds. The agent runs a multi-step inference workflow (router, data tools, evaluation, curation) so latency varies with prompt complexity and the size of the data being queried.

Set generous client-side timeouts (90 seconds is a safe default for production). Don’t retry mid-flight — the same correlation ID won’t appear in audit if you abort and restart.

Example: end-to-end

curl (using the token from authorize)

TOKEN=$(curl -s https://api.rebelcore.ai/api/v1/authorize \
  -H "Content-Type: application/json" \
  -d '{"username":"...@api.rebelcore.local","password":"..."}' \
  | jq -r .token)

curl https://api.rebelcore.ai/api/v1/infer \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Brief affordability assessment for ABC500",
    "spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"]
  }'

Python (httpx)

import httpx

BASE = "https://api.rebelcore.ai"

with httpx.Client(timeout=90.0) as client:
    auth = client.post(
        f"{BASE}/api/v1/authorize",
        json={"username": "...", "password": "..."},
    )
    auth.raise_for_status()
    token = auth.json()["token"]

    r = client.post(
        f"{BASE}/api/v1/infer",
        headers={"Authorization": f"Bearer {token}"},
        json={
            "prompt": "Brief affordability assessment for ABC500",
            "spec_ids": ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"],
        },
    )
    if r.status_code == 401:
        # Token expired or invalid — re-authorise and retry once.
        ...
    r.raise_for_status()
    print(r.json()["result"])

Node.js (fetch)

const BASE = "https://api.rebelcore.ai";

const auth = await fetch(`${BASE}/api/v1/authorize`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ username: "...", password: "..." }),
}).then((r) => r.json());

const out = await fetch(`${BASE}/api/v1/infer`, {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    Authorization: `Bearer ${auth.token}`,
  },
  body: JSON.stringify({
    prompt: "Brief affordability assessment for ABC500",
    spec_ids: ["3f1a8c91-2b6e-4f3d-9a12-cd456ef78901"],
  }),
}).then((r) => r.json());

console.log(out.result);

Best practices

Cache the token. Store the bearer in memory and reuse it for the full 24 hours. Don’t call /authorize on every request.
Re-authorise on 401, not before. Trying to predict expiry is fragile; just react to the status code.
Send tight spec_ids. Include only the specs relevant to the question. Fewer specs = faster inference and cleaner audit trail.
Persist the correlation_id. Log it alongside your application’s own request id. When something needs investigating, the audit panel and your logs can be cross-referenced in seconds.
Surface the exposure field to your end-users. If you build a UI, telling the user “this answer is based on advisory-level access” prevents misinterpretation.

Provisioning more API users — one per environment / service is a good baseline.
LLM exposure levels — full breakdown of what each level allows.
Audit panel — where API calls show up for review.