AI Studio Inference & Playground
Hyperstack AI Studio provides tools for running inference, allowing you to use trained models to generate predictions or text from input prompts. This guide covers how to use the interactive Playground and the API to test base and fine-tuned models, adjust generation parameters, deploy custom adapters, and make programmatic inference requests.
In this article
Playground – Model Inference Using the UI
The Hyperstack AI Studio Playground is an interactive chat-style interface for testing and exploring model behavior. You can run inference with supported base models or your own fine-tuned deployments, apply system prompts, and tweak generation parameters in real time. The Playground also supports side-by-side model comparisons, making it easy to evaluate the effects of fine-tuning or prompt changes. It’s a versatile tool for iterating, validating, and refining model performance directly in the UI.
How to Use the Playground
-
Open the Playground
Navigate to the Playground page from the Hyperstack AI Studio.
-
Select a Model
From the dropdown menu, select either a fine-tuned model you’ve trained and deployed, or choose from the available base models.
-
(Optional) Enter a System Prompt
Use the System Prompt field to guide the model’s behavior.
-
(Optional) Adjust Playground Parameters
Adjust the model’s generation behavior using the sliders and fields below. If you don’t modify a parameter, the default value shown in the table will be used.
Parameter Description Default Value Value Range Max Tokens Limits the maximum number of tokens in the model’s response. Combined with the input, the total must stay within the model's 8192-token context limit. null
1–4095
Temperature Controls randomness in the model’s output. Lower values (e.g. 0.1
) produce more focused, deterministic responses, while higher values (e.g.0.9
) produce more creative or varied outputs.1.0
0.0–2.0
Top-P Enables nucleus sampling by restricting token selection to a subset with cumulative probability ≤ top_p
.null
0.0–1.0
Top-K Limits token sampling to the top K
most likely options.-1
-1–200
Presence Penalty Penalizes tokens based on whether they appear in the prompt. Higher values encourage the model to introduce new concepts. 0.0
-2.0–2.0
Repetition Penalty Penalizes repeated tokens in the model’s output. Higher values reduce the likelihood of repeating the same phrases. 1.0
-2.0–2.0
-
Enter a Prompt
Type your query into the text input box and hit Enter to get a response.
The total number of tokens in your input (system prompt, prior messages, and current prompt) plus the model’s generated output must not exceed the maximum context length of 8192 tokens.
Compare Side-by-Side
Click Compare Side-by-Side to evaluate how your fine-tuned model responds to the same input compared to another model.
- Select the two models from the dropdowns.
- Enter your prompt.
- View and compare both outputs side-by-side in real time.
Model Inference API
POST https://console.hyperstack.cloud/ai/api/v1/chat/completions
Use this endpoint to perform text generation with your chosen model in Hyperstack AI Studio. Requests must include your API key and a sequence of chat messages. Responses can be returned as a full message or streamed in real time. See below for request structure and available parameters.
Replace the following variables before running the command:
-
API_KEY
: Your API key. -
model
: Specifies the model to use for the operation.- For fine-tuned models, provide the
model_name
obtained from the/models
API. - For base models, provide the Hugging Face repository name as the
hf_repo
(e.g.,"mistralai/Mistral-Small-24B-Instruct-2501"
).
Use the
/models
API endpoint to retrieve validmodel_name
values for fine-tuned models orhf_repo
values for base models. - For fine-tuned models, provide the
-
stream
: - Set totrue
to return the response as a stream of data chunks as they are generated. Set tofalse
to receive a single complete message once generation is finished. -
messages
: The prompt or user input for inference. For expected format, see here. -
To control model behavior, see the Optional Parameters section below.
The total number of tokens in your input (system prompt, prior messages, and current prompt) plus the model’s generated output must not exceed the maximum context length of 8192 tokens.
curl -X POST "https://console.hyperstack.cloud/ai/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "api_key: API_KEY" \
-d '{
"model": "your-model-name",
"messages": [
{"role": "user", "content": "YOUR TEXT HERE"}
],
"stream": true,
"max_tokens": 100,
"temperature": 0.5,
"top_p": 0.5,
"top_k": 40,
"presence_penalty": 0,
"repetition_penalty": 0.5
}'
Required Parameters
model
(string)
– The name of the model you want to use.messages
(array of objects)
– A sequence of message objects representing user/system dialogue history. Each message must include arole
andcontent
field.stream
(boolean)
- If set totrue
, the API will return the response as a stream of data chunks as they are generated. Iffalse
, the response will be returned as a single complete message once generation is finished.
Optional Parameters
max_tokens
(integer|null)
– Maximum number of tokens to generate in the model’s response. Default:null
. Range:1–4095
.temperature
(float)
– Controls randomness in the model’s output. Default:1.0
. Range:0.0–2.0
.top_p
(float|null)
– Nucleus sampling parameter. Default:null
. Range:0.0–1.0
.top_k
(integer)
– Limits token sampling to the topK
most likely options. Default:-1
. Range:-1–200
.presence_penalty
(float)
– Penalizes tokens based on whether they appear in the prompt. Higher values encourage the model to introduce new concepts. Default:0.0
. Range:-2.0–2.0
.repetition_penalty
(float)
– Penalizes repeated tokens in the model’s output. Higher values reduce the likelihood of repeating the same phrases. Default:1.0
. Range:-2.0–2.0
.
Response
- 200 OK: Success. Returns JSON or streamed response.
- 400 Bad Request: If required fields are missing or model/server is invalid.
- 401 Unauthorized: If the API key is invalid or missing.
- 404 Not Found: If the model is not found.