Reliable and Powerful Inference as a Service
Managed Inference and Agents simplifies AI integration by providing access to powerful foundation models, including text, embedding, and diffusion models. Easily attach model resources to your Heroku app, and the add-on will automatically configure environment variables, enabling seamless API calls. Invoke models using the CLI plug-in or with API endpoints.
Extend Agents with tools that allow Large Language Models (LLMs) to execute actions within Heroku’s trusted environment. Deploy autonomous agents that can call APIs, run code, or interact with your app through tools like code_exec, http, or custom ones. Move from prototyping to production with optimized inference latency and minimal infrastructure management.
The Model Context Protocol (MCP) is an open standard that helps you extend Agents by connecting large language models to tools, services, and data sources. You can bring your own custom tools by deploying them as a heroku app and registering them by attaching the addon. Access all you mcp servers through a single toolkit.
Text Generation & Chat Use models like Claude-Sonnet to generate text, write code, or chat intelligently.
Retrieval-Augmented Generation (RAG) Bring your own data to power LLMs with up-to-date, domain-specific knowledge.
Personalize User Experiences: Leverage agents to deliver tailored content, recommendations, or support.
Data Analysis and Business Intelligence: Deploy agents that can analyze large datasets, identify trends, generate reports, and provide actionable insights.
For those customers paying by credit card, Heroku Managed Inference and Agents uses metered billing, as set forth in the Plans & Pricing tables below
For enterprise customers, your usage of Heroku Managed Inference and Agents will consume your General Add-on Credits and/or Data Add-on Credits as set forth in the Plans & Pricing tables below.
Heroku Managed Inference and Agents doesn’t store or log your prompts and completions. Heroku Managed Inference and Agents doesn’t use your prompts and completions to train any models and doesn’t distribute them to third parties for training.
Claude-3-5-haiku (us)
A faster, more affordable large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
Claude-3-5-sonnet-latest (us)
A state-of-the-art large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
Claude-3-7-sonnet (eu)
A state-of-the-art large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model is hosted in both the us
and eu
regions.
By default, apps with a us
region provision the us
plan,
and apps with the eu
region provision the eu
plan.
To create your model resource, run:
This model is hosted in both the us
and eu
regions. oregon
, virginia
,
and montreal
private space apps provision the us
plan by default.
All other private space apps provision the eu
plan by default.
To create your model resource, run:
To provision, copy the snippet into your CLI or use the install button above.
Claude-3-7-sonnet (us)
A state-of-the-art large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model is hosted in both the us
and eu
regions.
By default, apps with a us
region provision the us
plan,
and apps with the eu
region provision the eu
plan.
To create your model resource, run:
This model is hosted in both the us
and eu
regions. oregon
, virginia
,
and montreal
private space apps provision the us
plan by default.
All other private space apps provision the eu
plan by default.
To create your model resource, run:
To provision, copy the snippet into your CLI or use the install button above.
Claude-3-haiku (eu)
A faster, more affordable large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the eu
region.
Apps with a us
region by default cannot provision this model.
To override this, apply the --region=eu
flag:
This model only runs in the eu
region.
Private space apps in the oregon
, virginia
, or montreal
regions by default cannot provision this model.
To override this, apply the --region=eu
flag:
To provision, copy the snippet into your CLI or use the install button above.
Claude-4-sonnet (eu)
A state-of-the-art large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the eu
region.
Apps with a us
region by default cannot provision this model.
To override this, apply the --region=eu
flag:
This model only runs in the eu
region.
Private space apps in the oregon
, virginia
, or montreal
regions by default cannot provision this model.
To override this, apply the --region=eu
flag:
To provision, copy the snippet into your CLI or use the install button above.
Claude-4-sonnet (us)
A state-of-the-art large language model that supports chat and tool-calling.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
Cohere-embed-multilingual (eu)
A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model is hosted in both the us
and eu
regions.
By default, apps with a us
region provision the us
plan,
and apps with the eu
region provision the eu
plan.
To create your model resource, run:
This model is hosted in both the us
and eu
regions. oregon
, virginia
,
and montreal
private space apps provision the us
plan by default.
All other private space apps provision the eu
plan by default.
To create your model resource, run:
To provision, copy the snippet into your CLI or use the install button above.
Cohere-embed-multilingual (us)
A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model is hosted in both the us
and eu
regions.
By default, apps with a us
region provision the us
plan,
and apps with the eu
region provision the eu
plan.
To create your model resource, run:
This model is hosted in both the us
and eu
regions. oregon
, virginia
,
and montreal
private space apps provision the us
plan by default.
All other private space apps provision the eu
plan by default.
To create your model resource, run:
To provision, copy the snippet into your CLI or use the install button above.
Nova Lite
A fast and highly cost-effective model, perfect for applications requiring rapid text generation, summarization, and copywriting. It's optimized for high-throughput tasks and general-purpose use cases.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the eu
region.
Apps with a us
region by default cannot provision this model.
To override this, apply the --region=eu
flag:
This model only runs in the eu
region.
Private space apps in the oregon
, virginia
, or montreal
regions by default cannot provision this model.
To override this, apply the --region=eu
flag:
To provision, copy the snippet into your CLI or use the install button above.
Nova Lite
A fast and highly cost-effective model, perfect for applications requiring rapid text generation, summarization, and copywriting. It's optimized for high-throughput tasks and general-purpose use cases.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
Nova Pro
A high-performance model designed for more complex tasks, including advanced question-answering, detailed content creation, and nuanced data extraction. It provides a significant step up in capability for applications that demand higher quality and deeper understanding.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the eu
region.
Apps with a us
region by default cannot provision this model.
To override this, apply the --region=eu
flag:
This model only runs in the eu
region.
Private space apps in the oregon
, virginia
, or montreal
regions by default cannot provision this model.
To override this, apply the --region=eu
flag:
To provision, copy the snippet into your CLI or use the install button above.
Nova Pro
A high-performance model designed for more complex tasks, including advanced question-answering, detailed content creation, and nuanced data extraction. It provides a significant step up in capability for applications that demand higher quality and deeper understanding.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
gpt-oss-120b (us)
The gpt-oss-120b model is a powerful, open-source large language model developed by OpenAI, designed for a wide range of generative AI applications. It offers advanced capabilities in natural language understanding, generation, and complex problem-solving, making it a versatile tool for developers and enterprises.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
Stable-image-ultra (us)
A state-of-the-art diffusion (image generation) model.
Metered usage amounts
Availability
Region | Available |
---|---|
Dublin | Available |
Frankfurt | Available |
London | Available |
Montreal | Available |
Mumbai | Available |
Oregon | Available |
Singapore | Available |
Sydney | Available |
Tokyo | Available |
Virginia | Available |
This model only runs in the us
region. Apps with a eu
region by default cannot provision this model.
To override this, apply the --region=us
flag:
This model only runs in the us
region.
Only private space apps in the oregon
, virginia
, or montreal
regions can provision this model by default.
To override this for apps in other private space regions, apply the --region=us
flag:
To provision, copy the snippet into your CLI or use the install button above.
The Heroku Managed Inference and Agent add-on may employ third-party generative AI models to provide the Service. Due to the nature of generative AI, the output that it generates may be unpredictable, and may include inaccurate or harmful responses. Customer assumes all responsibility for such output, including ensuring its accuracy, safety, and compliance with applicable laws and third-party acceptable use policies. For more information, please see the Heroku Notices and License Information Documentation.