Inference Activity logs key details about requests and responses between your application and the Heroku AI Inference API. Monitor token usage, model performance, and API calls in real-time with detailed logs, graphs, and metrics like speed, latency, throughput, and resource utilization.
Constantly monitoring tokens in and tokens out, is not just important — it’s critical for identifying inefficiencies, minimizing waste, and keeping expenses in check. Optimize prompts, choose cost-effective models, and manage API usage to prevent unexpected costs and maximize efficiency.
Set threshold-based alerts on API usage, total cost, and response times to catch issues before they escalate. Get instant notifications via email, Slack, or webhooks to prevent overruns and maintain control.
The available application locations for this add-on are shown below, and depend on whether the application is deployed to a Common Runtime region or Private Space. Learn More
Region | Available |
---|---|
United States | |
Europe |
Region | Available | Installable in Space |
---|---|---|
Dublin | ||
Frankfurt | ||
London | ||
Montreal | ||
Mumbai | ||
Oregon | ||
Singapore | ||
Sydney | ||
Tokyo | ||
Virginia |
This add-on is in alpha and can only be provisioned if you have been invited by this add-on partner. To provision, copy the snippet into your CLI.
View add-on docs on DevCenter