Models Overview - Base Models, Aliases, and Rankings
This guide introduces the core model management concepts in Hyperstack AI Studio: base models, model aliasing, and model rankings. Understanding how these elements work together will help you iterate faster, streamline deployment, and select an appropriate base model for your fine-tuning or inference needs.
In this article
Base Models
Hyperstack AI Studio provides several open-source foundation models that you can use out of the box or fine-tune to meet your application needs. These models are production-ready and optimized for various use cases such as summarization, reasoning, code generation, and instruction following.
Managing Base Models
Base models can be accessed and managed directly from the Base Models tab on the My Models page of AI Studio. Here, you can initiate fine-tuning for your specific use case, explore detailed information about each model, view costs per 1 million tokens, and more.
Available Base Models
The table below summarizes the base models available in AI Studio along with descriptions and links to their official documentation and Hugging Face model pages.
Model Name | About the Model |
---|---|
OpenAI gpt-oss-120b | A powerful 120B parameter model released by OpenAI as its first open-weight model since GPT‑2. Designed for advanced reasoning and tool use, it performs well on instruction-following, long-context understanding, and math tasks. Ideal for building high-performance generative applications in research or production environments.
|
Mistral Small 24B Instruct 2501 | A compact 24B model designed for low-latency inference and resource-constrained environments. Despite its size, it delivers strong results in instruction following, math, and code generation tasks. Ideal for real-time applications like chatbots, support agents, and embedded systems.
|
Llama 3.3 70B Instruct | Meta’s flagship 70B parameter model, fine-tuned for instruction-based tasks. It is optimized for complex reasoning, tool use, and long-context generation (up to 128K tokens). Suitable for production use in advanced multilingual chatbots, coding assistants, and text analysis systems.
|
Llama 3.1 8B Instruct | A smaller alternative to the 70B variant, this 8B model provides a good balance of performance and efficiency. It supports a broad range of general-purpose tasks while being more cost-effective for development and experimentation.
|
Context Length Limit
The maximum supported context length for all models in Hyperstack AI Studio is 8192 tokens. This includes both the input (system prompt, prior messages, and user prompt) and the model’s generated output. Be sure to structure your inputs and configure generation parameters accordingly to stay within this limit.
Model Aliasing
Model aliasing allows you to assign a custom, stable name (alias) to a specific base or fine-tuned model version. Instead of using the exact model name or ID in your deployment or API calls, you can reference an alias like chatbot-production
.
Aliases simplify referencing specific models in deployment and inference, allowing for more intuitive naming. This can be useful for:
- Fine-tuned and base models: clearly identifying models for deployment and usage.
- Inference: referencing models consistently across API calls or tools.
Manage your aliases on the Model Aliases page.
Creating a Model Alias
- Visit the Model Aliasing page.
- Select a currently deployed model.
- Optionally, add a suffix to the alias name.
- Click Create Alias.
After creating an alias, the model will appear on the Model Aliasing page with options to update or remove the linked model, or delete the alias entirely.
Model Rankings
You can access the Model Rankings page to explore benchmark-based evaluation scores for a wide range of foundation models, including both open-source and proprietary options. This page helps you compare performance across standardized tasks to make informed decisions when selecting a base model for fine-tuning or inference.
Each model is evaluated on a variety of popular benchmarks such as:
- AGIEval, ARC, MMLU – General academic and reasoning tasks
- GSM8K, Math, DROP – Math and numerical reasoning
- BoolQ, PIQA, SIQA – Commonsense and logic
For a complete list of benchmark datasets used in evaluation, see the Benchmark Datasets section.
Scores range from 0 to 1, with higher values indicating stronger performance.