Pricing AI features is tricky: a viral day can quickly turn profit into loss if your token economics are off. We'll explore per-request vs. per-token models, how to account for the long tail of complex queries, and setting a price floor that keeps your Indian AI product sustainable.
A practical, jargon-free guide for Indian engineering teams and founders — part of the Learn AI with Reeturaj series on InBharat AI.
Traditional software often has predictable, relatively fixed costs per user or per feature. AI, especially with large language models (LLMs), introduces a consumption-based model where you pay per token processed . This means a simple query might cost a few paise, while a complex, multi-turn conversation or a detailed report generation could cost rupees. If your pricing doesn't account for this variability, you're setting yourself up for losses.
Imagine a healthcare startup using Sahayaak Seva to assist field workers. A worker might ask, "What are the common symptoms of dengue?" (short, cheap). Another might ask, "Summarize the patient's last five visits, noting all prescribed medications and follow-up actions needed, then draft a discharge summary in Hindi." (long, expensive). If both are priced as a single "request," the latter will eat into your margins.
There are two primary ways to charge for AI features:
This is simpler for users to understand: "Pay ₹5 per query." It feels familiar, like API calls. The challenge is that a "request" isn't a uniform unit of work for an LLM. To make this work, you need to:
This is closer to how you pay the underlying LLM provider (e.g., OpenAI, Anthropic, or even your self-hosted model). You charge users based on the number of input and output tokens. For example, "₹0.001 per 1000 tokens." This is more accurate but can be harder for users to predict and understand.
Most requests will be short and cheap. But a small percentage of users will make very long, complex, or repetitive requests that consume a disproportionate number of tokens. This is the p95 long tail – the 95th percentile of your usage. If your pricing only covers the average, these edge cases will quickly drain your budget.
Example: In a content generation tool like KathaKitaab, most users might generate a 500-word story. But a few might try to generate a 5000-word novel in one go. If you charge per story, the 5000-word one costs 10x more to generate but brings in the same revenue.
To handle the p95 long tail:
What happens when your AI feature goes viral? Suddenly, you have 10x, 100x, or even 1000x the usage. If your pricing is too low, or if your long-tail problem isn't addressed, a viral day becomes a financial disaster. This is especially true in India, where a product can gain rapid traction through word-of-mouth or social media.
Your price floor is the absolute minimum you can charge to cover your direct costs (LLM API calls, inference compute, data storage) without losing money. It's not your profit margin, it's your break-even point.
Steps to set a robust price floor:
Building for Bharat means specific constraints and opportunities:
At InBharat AI, we're always optimizing our model choices and prompt strategies to keep costs down. Sometimes, a series of smaller, cheaper models chained together can outperform a single, expensive behemoth for specific tasks. This is a core tenet of building Desh Ka AI.
Don't let the promise of AI blind you to its real costs. Understanding token economics, accounting for the long tail of usage, and setting a robust price floor are non-negotiable for building sustainable AI products in India. Price for the worst-case viral day, not just the average, and you'll be in a much stronger position. For more on building robust AI systems, consider our insights on What Agentic AI Really Means and how it impacts your system design.
Q1: How do I explain token pricing to my users who are used to fixed prices? A1: The best approach is often to offer token bundles or credits. Instead of saying "you pay per token," say "buy 10,000 credits for ₹100, where each credit is roughly X tokens." Provide a simple calculator or estimate for common actions so users can gauge their usage. Transparency is key, even if simplified.
Q2: What if my AI feature uses multiple LLMs with different token costs? A2: You'll need to track token usage per model. When calculating the cost for a user's request, sum up the token costs from all models involved. This can get complex, so consider abstracting it behind a single "credit" system where different actions consume different amounts of credits based on their underlying model costs.
Q3: Is it better to start with per-request or per-token pricing for a new AI product? A3: For initial launch and user adoption, per-request pricing is often simpler for users to grasp. However, internally, you must model your costs on a per-token basis. As your product matures and users become more sophisticated, you can introduce per-token bundles or advanced tiers. Always start with a solid understanding of your token costs, regardless of your external pricing model.