Skip to main content

AI Studio Inference & Playground

Hyperstack AI Studio provides tools for running inference, allowing you to use trained models to generate predictions or text from input prompts. This guide covers how to use the interactive Playground and the API to test base and fine-tuned models, adjust generation parameters, deploy custom adapters, and make programmatic inference requests.

In this article


Playground – Model Inference Using the UI

The Hyperstack AI Studio Playground is an interactive chat-style interface for testing and exploring model behavior. You can run inference with supported base models or your own fine-tuned deployments, apply system prompts, and tweak generation parameters in real time. The Playground also supports side-by-side model comparisons, making it easy to evaluate the effects of fine-tuning or prompt changes. It’s a versatile tool for iterating, validating, and refining model performance directly in the UI.

How to Use the Playground

  1. Open the Playground

    Navigate to the Playground page from the Hyperstack AI Studio.

  2. Select a Model

    From the dropdown menu, select either a fine-tuned model you’ve trained and deployed, or choose from the available base models.

  3. (Optional) Enter a System Prompt

    Use the System Prompt field to guide the model’s behavior.

  4. (Optional) Adjust Playground Parameters

    Adjust the model’s generation behavior using the sliders and fields below. If you don’t modify a parameter, the default value shown in the table will be used.

    ParameterDescriptionDefault ValueValue Range
    Max TokensLimits the maximum number of tokens in the model’s response. Combined with the input, the total must stay within the model's 8192-token context limit.null1–4095
    TemperatureControls randomness in the model’s output. Lower values (e.g. 0.1) produce more focused, deterministic responses, while higher values (e.g. 0.9) produce more creative or varied outputs.1.00.0–2.0
    Top-PEnables nucleus sampling by restricting token selection to a subset with cumulative probability ≤ top_p.null0.0–1.0
    Top-KLimits token sampling to the top K most likely options.-1-1–200
    Presence PenaltyPenalizes tokens based on whether they appear in the prompt. Higher values encourage the model to introduce new concepts.0.0-2.0–2.0
    Repetition PenaltyPenalizes repeated tokens in the model’s output. Higher values reduce the likelihood of repeating the same phrases.1.0-2.0–2.0
  5. Enter a Prompt

Type your query into the text input box and hit Enter to get a response.

Context Length Limit

The total number of tokens in your input (system prompt, prior messages, and current prompt) plus the model’s generated output must not exceed the maximum context length of 8192 tokens.

Compare Side-by-Side

Click Compare Side-by-Side to evaluate how your fine-tuned model responds to the same input compared to another model.

  1. Select the two models from the dropdowns.
  2. Enter your prompt.
  3. View and compare both outputs side-by-side in real time.

Model Inference API

https://console.hyperstack.cloud/ai/api/v1/chat/completions

Use this endpoint to perform text generation with your chosen model in Hyperstack AI Studio. Requests must include your API key and a sequence of chat messages. Responses can be returned as a full message or streamed in real time. See below for request structure and available parameters.

Replace the following variables before running the command:

  • API_KEY: Your API key.

  • model: Specifies the model to use for the operation.

    • For fine-tuned models, provide the model_name obtained from the /models API.
    • For base models, provide the Hugging Face repository name as the hf_repo (e.g., "mistralai/Mistral-Small-24B-Instruct-2501").

    Use the /models API endpoint to retrieve valid model_name values for fine-tuned models or hf_repo values for base models.

  • stream: - Set to true to return the response as a stream of data chunks as they are generated. Set to false to receive a single complete message once generation is finished.

  • messages: The prompt or user input for inference. For expected format, see here.

  • To control model behavior, see the Optional Parameters section below.

Context Length Limit

The total number of tokens in your input (system prompt, prior messages, and current prompt) plus the model’s generated output must not exceed the maximum context length of 8192 tokens.

curl -X POST "https://console.hyperstack.cloud/ai/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "api_key: API_KEY" \
-d '{
"model": "your-model-name",
"messages": [
{"role": "user", "content": "YOUR TEXT HERE"}
],
"stream": true,
"max_tokens": 100,
"temperature": 0.5,
"top_p": 0.5,
"top_k": 40,
"presence_penalty": 0,
"repetition_penalty": 0.5
}'

Response

  • 200 OK: Success. Returns JSON or streamed response.
  • 400 Bad Request: If required fields are missing or model/server is invalid.
  • 401 Unauthorized: If the API key is invalid or missing.
  • 404 Not Found: If the model is not found.

Back to top