Predict API
The Predict API lets any HTTP caller hand a trained brain its in-context data and get predictions back. It’s the same endpoint the basic-mode Try-it form calls.
Endpoint
POST /projects/<projectId>/runs/<runIdOrSlug>/predictprojectId— the UUID of the project containing the runrunIdOrSlug— either the run’s UUID or its slug (e.g.v0_1)
For share-link callers (no auth needed):
POST /shares/<token>/predictAuth
Authorization: Bearer <PFNSTUDIO_TOKEN>Get a token from /api-tokens — see API tokens.
Share-link callers omit the Authorization header; the token in the URL is the credential.
Request body
The shape varies by task type. The Try-it form picks the right shape automatically; HTTP callers must pick.
Regression
{ "context": { "x": [[1.0, 2.5], [3.0, 1.8], [4.5, 0.9]], "y": [5.1, 4.3, 2.0] }, "query": { "x": [[2.2, 1.4], [3.8, 0.7]] }}context.x is a 2-D array (rows × features). context.y is a 1-D array (one scalar per context row). query.x is rows of the same width as context.x.
Classification
{ "context": { "x": [[0.1, 0.4, ...], [0.9, 0.2, ...]], "labels": [0, 1, 1, 0, 1] }, "query": { "x": [[0.2, 0.5, ...]] }}labels is a 1-D array of category indices.
Time-series (forecast)
For prior types like AR(2) or TabPFN-TS, the basic-mode form sends a single column of past values; the studio’s predict packer transforms it into the model’s expected feature shape (lags + calendar features for TabPFN-TS style).
If you’re calling the API directly for a time-series brain, send the already-engineered features:
{ "context": { "x": [[lag_1, lag_2, ..., sin_phase, cos_phase, step_pct]], "y": [target_value] }, "query": { "x": [[...]] }}Inspect the brain’s model spec via GET /projects/<id>/models/<modelId> to see the exact d_in.
Discovery (causal)
Not supported via the Predict API. Use the advanced view at /projects/<id>/runs/<runId> directly.
Response
{ "predictions": [5.1, 4.3, 2.0, ...]}predictions is a 1-D array, one entry per query row. Type depends on the task:
- Regression → number
- Classification → number (0 or 1, or class index)
Confidence bands — not yet exposed
Brains trained with the Show confidence capability internally produce a full posterior (the defining feature of PFNs), but the predict endpoint currently returns only the point estimate in predictions. Surfacing the posterior bounds (lower / upper fields, or full quantile arrays) is on the roadmap — when it lands, the response shape will extend additively, so callers reading only predictions won’t break.
If you need the posterior today, use the advanced run-detail Try-it (/projects/:id/runs/:runId) which can render the model’s raw output, or call POST /priors/<id>/sample to inspect the prior’s full output schema.
Errors
{ "error": "Model expects 30 features per row but got 28."}Common errors:
| Error | Cause | Fix |
|---|---|---|
Model expects N features per row but got M | context.x row width doesn’t match the model’s d_in | Add/remove columns to match N |
context.x and context.y have different lengths | Mismatched row counts | Make sure len(x) == len(y) |
Run is not in completed state | The brain hasn’t finished training | Wait for status → completed |
Unauthorized | Invalid or missing token | Generate a fresh one at /api-tokens |
The basic-mode Try-it form parses Model expects N features and uses N to re-template the form with the right number of columns.
SDK / CLI
A Python SDK is in development; for now, use any HTTP client:
import requests
resp = requests.post( "https://cloud.pfnstudio.com/projects/<id>/runs/v0_1/predict", headers={"Authorization": "Bearer <token>"}, json={ "context": {"x": [[1.0, 2.5], [3.0, 1.8]], "y": [5.1, 4.3]}, "query": {"x": [[2.2, 1.4]]}, },)print(resp.json()["predictions"])curl -X POST https://cloud.pfnstudio.com/projects/<id>/runs/v0_1/predict \ -H "Authorization: Bearer <token>" \ -H "Content-Type: application/json" \ -d '{"context": {"x": [[1.0, 2.5]], "y": [5.1]}, "query": {"x": [[2.2]]}}'Latency expectations
- Single-row predict (small context, single query): ~50-200 ms
- 100-row context, 10 queries: ~100-300 ms
- Larger payloads scale roughly with
context_size + query_size
Predicts run on the brain’s hosted worker pool, not on Vast.ai. Marathon-trained brains predict at the same speed as Standard-trained brains of the same architecture.