Heroku Managed Inference and Agents

Heroku AI provides access to to top models and built-in tools for agents.

Managed Inference

Managed Inference and Agents simplifies AI integration by providing access to powerful foundation models, including text, embedding, and diffusion models. Easily attach model resources to your Heroku app, and the add-on will automatically configure environment variables, enabling seamless API calls. Invoke models using the CLI plug-in or with API endpoints.

Agents

Extend Agents with tools that allow Large Language Models (LLMs) to execute actions within Heroku’s trusted environment. Deploy autonomous agents that can call APIs, run code, or interact with your app through tools like code_exec, http, or custom ones. Move from prototyping to production with optimized inference latency and minimal infrastructure management.

Model Context protocol

The Model Context Protocol (MCP) is an open standard that helps you extend Agents by connecting large language models to tools, services, and data sources. You can bring your own custom tools by deploying them as a heroku app and registering them by attaching the addon. Access all you mcp servers through a single toolkit.

Use Cases

Text Generation Use models like Claude-Sonnet to generate text, write code, or chat intelligently. Retrieval-Augmented Generation (RAG) Bring your own data to power LLMs with up-to-date, domain-specific knowledge. Personalize User Experiences: Leverage agents to deliver tailored content, recommendations, or support. Data Analysis and Business Intelligence: Deploy agents that can analyze large datasets, identify trends, generate reports, and provide actionable insights.

Metered Billing

For those customers paying by credit card, Heroku Managed Inference and Agents uses metered billing, as set forth in the Plans & Pricing tables below For enterprise customers, your usage of Heroku Managed Inference and Agents will consume your General Add-on Credits and/or Data Add-on Credits as set forth in the Plans & Pricing tables below.

Data protection

Heroku Managed Inference and Agents doesn’t store or log your prompts and completions. Heroku Managed Inference and Agents doesn’t use your prompts and completions to train any models and doesn’t distribute them to third parties for training.

The models are available for provisioning in the region shown below but are available to be accessed by apps in all regions including private spaces. The models by default are provisioned based on the region of your app. You can use the Heroku CLI when provisioning to override the model's default region.

Model	United States	European Union
Claude-4-5-haiku New!	Available	Available
Claude-4-5-sonnet New!	Available	Available
Claude-3-5-haiku	Available
Claude-3-5-sonnet-latest	Available
Claude-3-7-sonnet	Available	Available
Claude-3-haiku		Available
Claude-4-sonnet	Available	Available
Cohere embed multilingual	Available	Available
GPT-OSS-120B	Available	Available
Nova Lite	Available	Available
Nova Pro	Available	Available
Stable-image-ultra	Available	Available

Claude-4-5-haiku New! Metered
Claude-4-5-sonnet New! Metered
Claude-3-5-haiku Metered
Claude-3-5-sonnet-latest Metered
Claude-3-7-sonnet Metered
Claude-3-haiku Metered
Claude-4-sonnet Metered
Cohere embed multilingual Metered
GPT-OSS-120B Metered
Nova Lite Metered
Nova Pro Metered
Stable-image-ultra Metered

Claude-4-5-haiku
New!

A fast and highly cost-effective model, perfect for applications requiring rapid responses, content moderation, and inventory management. It's optimized for high-throughput tasks and real-time interactions.

Managed Inference and Agents API with Claude 4.5 Haiku
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $1 per million tokens
- Output Token $5 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-4-5-haiku -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Claude-4-5-sonnet
New!

A high-performance model that balances intelligence and speed, designed for more complex tasks including data processing, sales forecasting, and nuanced content generation. It provides a significant step up in capability for enterprise applications that demand high endurance and quality.

Managed Inference and Agents API with Claude 4.5 Sonnet
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $3 per million tokens
- Output Token $15 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-4-5-sonnet -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Claude-3-5-haiku

A faster, more affordable large language model that supports chat and tool-calling.

Managed Inference and Agent API with Claude 3.5 Haiku
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $0.8 per million tokens
- Output Token $4 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-3-5-haiku -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Claude-3-5-sonnet-latest

A state-of-the-art large language model that supports chat and tool-calling.

Managed Inference and Agent API with Latest Claude 3.5 Sonnet
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $3 per million tokens
- Output Token $15 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-3-5-sonnet-latest -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Claude-3-7-sonnet

A state-of-the-art large language model that supports chat and tool-calling.

Managed Inference and Agent API with Claude 3.7 Sonnet
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $3 per million tokens
- Output Token $15 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-3-7-sonnet -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Claude-3-haiku

A faster, more affordable large language model that supports chat and tool-calling.

Managed Inference and Agent API with Claude 3.0 Haiku
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $0.25 per million tokens
- Output Token $1.25 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-3-haiku -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Claude-4-sonnet

A state-of-the-art large language model that supports chat and tool-calling.

Managed Inference and Agent API with Claude 4 Sonnet
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Anthropic
Metered usage amounts
- Input Token $3 per million tokens
- Output Token $15 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:claude-4-sonnet -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Cohere embed multilingual

A state-of-the-art embedding model that supports multiple languages. This model is helpful for developing Retrieval Augmented Generation (RAG) search.

Managed Inference and Agent API with Cohere Embed Multilingual
- Type Text → Embedding
- API endpoint v1/embeddings
- Model Source Cohere
Metered usage amounts
- Input Token $0.10 per million tokens
- Output Token N/A
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:cohere-embed-multilingual -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
GPT-OSS-120B

A powerful, open-source large language model developed by OpenAI, designed for a wide range of generative AI applications. It offers advanced capabilities in natural language understanding, generation, and complex problem-solving, making it a versatile tool for developers and enterprises.

Managed Inference and Agent API with GPT-OSS-120B
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source OpenAI
Metered usage amounts
- Input Token $0.15 per million tokens
- Output Token $0.60 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:gpt-oss-120b -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Nova Lite

A fast and highly cost-effective model, perfect for applications requiring rapid text generation, summarization, and copywriting. It's optimized for high-throughput tasks and general-purpose use cases.

Managed Inference and Agent API with Nova Lite
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Amazon
Metered usage amounts
- Input Token $0.30 per million tokens
- Output Token $0.40 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:nova-lite -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Nova Pro

A high-performance model designed for more complex tasks, including advanced question-answering, detailed content creation, and nuanced data extraction. It provides a significant step up in capability for applications that demand higher quality and deeper understanding.

Managed Inference and Agent API with Nova Pro
- Type Text → Text
- API endpoint v1/chat/completions
- v1/agents/heroku
- Model Source Amazon
Metered usage amounts
- Input Token $0.80 per million tokens
- Output Token $1.60 per million tokens
- Per image N/A
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:nova-pro -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.
Stable-image-ultra

A state-of-the-art diffusion (image generation) model.

Managed Inference and Agent API with Stability AI Stable Image Ultra
- Type Text → Image
- API endpoint v1/images/generations
- Model Source Stability AI
Metered usage amounts
- Input Token N/A
- Output Token N/A
- Per image $0.14
This model is available to apps in all regions. Override the region in which the model is provisioned by adding the --region flag. Refer to Region Availability for supported regions by model.

Install Add-on

heroku addons:create heroku-inference:stable-image-ultra -a $APP_NAME -- --region=$REGION

To provision, copy the snippet into your CLI or use the install button above.

The Heroku Managed Inference and Agent add-on may employ third-party generative AI models to provide the Service. Due to the nature of generative AI, the output that it generates may be unpredictable, and may include inaccurate or harmful responses. Customer assumes all responsibility for such output, including ensuring its accuracy, safety, and compliance with applicable laws and third-party acceptable use policies. For more information, please see the Heroku Notices and License Information Documentation.

Heroku Managed Inference and Agents

Managed Inference

Agents

Model Context protocol

Use Cases

Metered Billing

Data protection

Region Availability

Plans & Pricing

Documentation

Legal Notices

Quick Links

Addon Sharing

Add-on Category

Supported Languages

Generation Support