Ashwin Aravind

I've built three AI products in the last year. None of them use Kubernetes, vector databases, or custom-trained models. They all make money, handle thousands of users, and cost less than $500/month to run.

The AI tooling landscape makes you think you need a complex stack to ship something useful. You don't.

Here's the stack I actually use to go from idea to production in days, not months.

The Stack

Backend: Python + FastAPI AI: OpenAI/Anthropic APIs Database: Postgres (via Supabase) Queue: Simple in-process or Redis if needed Hosting: Railway or Render Frontend: Next.js (but honestly, anything works)

Total complexity: Low Total capability: High enough for 95% of AI tools

Let me break down why each piece and how to use it.

FastAPI: The AI Tool Backend

FastAPI is perfect for AI products because:

Async by default (good for LLM API calls)
Automatic API docs
Type hints prevent bugs
Fast to develop, fast to run

Basic Setup:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import openai
from typing import Optional

app = FastAPI()

class GenerateRequest(BaseModel):
    prompt: str
    user_id: str
    max_tokens: Optional[int] = 500

class GenerateResponse(BaseModel):
    result: str
    tokens_used: int

@app.post("/generate", response_model=GenerateResponse)
async def generate_content(request: GenerateRequest):
    try:
        response = await openai.ChatCompletion.acreate(
            model="gpt-4",
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=request.max_tokens
        )
        
        result = response.choices[0].message.content
        tokens = response.usage.total_tokens
        
        # Log usage for billing/analytics
        log_usage(request.user_id, tokens)
        
        return GenerateResponse(result=result, tokens_used=tokens)
    
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

This is the core pattern for 90% of AI tools:

Accept user input
Call LLM API
Return result
Log for analytics/billing

You can ship this in an hour.

LLM APIs: Don't Train Your Own (Yet)

The biggest mistake I see: people trying to fine-tune or train models before validating their product.

Use GPT-4/Claude API until:

You have 10,000+ users
You can't achieve quality with prompts alone
You have proprietary data that provides a real advantage
You've calculated the cost/quality tradeoff

Before that, just use the API. It's fast, cheap enough, and incredibly capable.

Prompt Engineering > Fine-Tuning:

def create_system_prompt(user_profile):
    """
    Customize behavior through prompts, not training
    """
    return f"""
    You are a content assistant for {user_profile['name']}, a {user_profile['niche']} creator.
    
    Their style:
    - {user_profile['tone']} tone
    - {user_profile['length']} content length
    - Focuses on {user_profile['topics']}
    
    Examples of their past content:
    {format_examples(user_profile['past_content'])}
    
    Generate new content matching this exact style.
    """

async def generate_for_user(user_id, prompt):
    user_profile = get_user_profile(user_id)
    system_prompt = create_system_prompt(user_profile)
    
    response = await openai.ChatCompletion.acreate(
        model="gpt-4",
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": prompt}
        ]
    )
    
    return response.choices[0].message.content

This gives you personalization without training. It works remarkably well.

Cost Management:

import asyncio
from datetime import datetime, timedelta

# Simple rate limiting
user_limits = {}

async def check_usage_limit(user_id: str, tier: str = "free"):
    limits = {
        "free": {"requests_per_day": 10, "tokens_per_day": 10000},
        "pro": {"requests_per_day": 1000, "tokens_per_day": 1000000}
    }
    
    today = datetime.utcnow().date()
    key = f"{user_id}_{today}"
    
    usage = user_limits.get(key, {"requests": 0, "tokens": 0})
    limit = limits[tier]
    
    if usage["requests"] >= limit["requests_per_day"]:
        raise HTTPException(status_code=429, detail="Daily request limit reached")
    
    if usage["tokens"] >= limit["tokens_per_day"]:
        raise HTTPException(status_code=429, detail="Daily token limit reached")
    
    return True

This prevents runaway costs. Critical when you're charging $10/month but GPT-4 costs add up.

Postgres: Your AI Tool Database

You don't need a vector database, graph database, or time-series database for most AI tools.

Postgres handles:

User data
Prompts and results
Usage tracking
Simple search (full-text is good enough)

Schema for AI Tool:

-- Users and auth
CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email TEXT UNIQUE NOT NULL,
    created_at TIMESTAMP DEFAULT NOW(),
    tier TEXT DEFAULT 'free',
    settings JSONB DEFAULT '{}'
);

-- AI generations
CREATE TABLE generations (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID REFERENCES users(id),
    prompt TEXT NOT NULL,
    result TEXT NOT NULL,
    tokens_used INTEGER,
    model TEXT,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Usage tracking
CREATE TABLE daily_usage (
    user_id UUID REFERENCES users(id),
    date DATE,
    requests INTEGER DEFAULT 0,
    tokens_used INTEGER DEFAULT 0,
    PRIMARY KEY (user_id, date)
);

-- Simple analytics
CREATE INDEX idx_generations_user_date ON generations(user_id, created_at);
CREATE INDEX idx_usage_user_date ON daily_usage(user_id, date);

That's it. Covers 95% of AI tools.

Why Not Vector Databases?

You only need vector databases if you're doing semantic search or RAG at scale.

For most AI tools, you're:

Storing user inputs/outputs → Postgres
Calling LLM APIs → No storage needed
Maybe searching past generations → Postgres full-text search works fine

If you're actually building RAG (see my other post on pragmatic RAG), then yes, add pgvector extension to Postgres or use Pinecone. But not on day one.

Queues: When and Why

Most AI tools don't need a queue initially. But when you do, here's the pattern:

Without Queue (Simple, Start Here):

@app.post("/generate")
async def generate(request: GenerateRequest):
    result = await call_llm(request.prompt)
    return {"result": result}

Client waits for response. Works fine if LLM calls are <10 seconds.

With Queue (When Needed):

from redis import Redis
from rq import Queue

redis_conn = Redis()
queue = Queue(connection=redis_conn)

@app.post("/generate")
async def generate(request: GenerateRequest):
    job = queue.enqueue(call_llm, request.prompt, job_timeout=60)
    return {"job_id": job.id}

@app.get("/status/{job_id}")
async def check_status(job_id: str):
    job = queue.fetch_job(job_id)
    if job.is_finished:
        return {"status": "complete", "result": job.result}
    elif job.is_failed:
        return {"status": "failed", "error": str(job.exc_info)}
    else:
        return {"status": "processing"}

Now long-running jobs don't block the API.

When to Add a Queue:

LLM calls take >30 seconds
You're doing batch processing
You need retry logic for failures
You want background jobs

Before that, async/await is enough.

Logging and Monitoring: Simple But Critical

You don't need Datadog or New Relic. You need structured logs and basic alerts.

Logging Pattern:

import json
import logging
from datetime import datetime

def log_generation(user_id, prompt, result, tokens, latency_ms):
    log_entry = {
        "event": "generation",
        "user_id": user_id,
        "prompt_length": len(prompt),
        "result_length": len(result),
        "tokens": tokens,
        "latency_ms": latency_ms,
        "timestamp": datetime.utcnow().isoformat()
    }
    logging.info(json.dumps(log_entry))

# This logs to stdout → Railway/Render/whatever captures it
# Now you can search: "show me all generations for user X"
# Or: "show me slow requests (latency_ms > 5000)"

Basic Metrics Dashboard:

@app.get("/admin/metrics")
async def get_metrics(admin_token: str):
    # Verify admin
    if admin_token != os.getenv("ADMIN_TOKEN"):
        raise HTTPException(status_code=403)
    
    # Query simple metrics
    today = datetime.utcnow().date()
    
    metrics = {
        "users_active_today": await db.count_users_active_since(today),
        "generations_today": await db.count_generations_since(today),
        "tokens_used_today": await db.sum_tokens_since(today),
        "error_rate": await db.error_rate_since(today)
    }
    
    return metrics

Check this once a day. If errors spike or usage drops, investigate.

That's enough monitoring for the first 6 months.

Deployment: Railway or Render

Don't overthink hosting. Both Railway and Render work great for AI tools:

Railway:

Connect GitHub repo
Add Postgres and Redis (if needed)
Deploy on push
Cost: ~$20-50/month

Render:

Similar to Railway
Slightly more configuration
Cost: ~$20-50/month

Both handle:

Auto-deploy from Git
Environment variables
SSL certificates
Scaling (when you need it)

My deployment setup:

# railway.toml
[build]
builder = "NIXPACKS"

[deploy]
startCommand = "uvicorn main:app --host 0.0.0.0 --port $PORT"
healthcheckPath = "/health"
restartPolicyType = "ON_FAILURE"

Push to main, it deploys. That's it.

The Complete Minimal AI Tool

Putting it all together, here's a working AI tool in ~150 lines:

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import openai
from supabase import create_client
import os
from datetime import datetime
import json
import logging

app = FastAPI()

# Setup
openai.api_key = os.getenv("OPENAI_API_KEY")
supabase = create_client(
    os.getenv("SUPABASE_URL"),
    os.getenv("SUPABASE_KEY")
)

# Models
class GenerateRequest(BaseModel):
    prompt: str
    user_id: str

class GenerateResponse(BaseModel):
    result: str
    tokens_used: int

# Rate limiting
async def check_rate_limit(user_id: str):
    today = datetime.utcnow().date()
    
    result = supabase.table("daily_usage").select("*").match({
        "user_id": user_id,
        "date": today.isoformat()
    }).execute()
    
    if result.data:
        usage = result.data[0]
        if usage["requests"] >= 10:  # Free tier limit
            raise HTTPException(429, "Daily limit reached")

# Main endpoint
@app.post("/generate", response_model=GenerateResponse)
async def generate(request: GenerateRequest):
    start_time = datetime.utcnow()
    
    # Check limits
    await check_rate_limit(request.user_id)
    
    # Generate
    try:
        response = await openai.ChatCompletion.acreate(
            model="gpt-4",
            messages=[{"role": "user", "content": request.prompt}],
            max_tokens=500
        )
        
        result = response.choices[0].message.content
        tokens = response.usage.total_tokens
        
    except Exception as e:
        logging.error(f"OpenAI error: {e}")
        raise HTTPException(500, "Generation failed")
    
    # Save generation
    supabase.table("generations").insert({
        "user_id": request.user_id,
        "prompt": request.prompt,
        "result": result,
        "tokens_used": tokens
    }).execute()
    
    # Update usage
    today = datetime.utcnow().date()
    supabase.rpc("increment_usage", {
        "user_id": request.user_id,
        "date": today.isoformat(),
        "tokens": tokens
    }).execute()
    
    # Log
    latency_ms = (datetime.utcnow() - start_time).total_seconds() * 1000
    logging.info(json.dumps({
        "event": "generation",
        "user_id": request.user_id,
        "tokens": tokens,
        "latency_ms": latency_ms
    }))
    
    return GenerateResponse(result=result, tokens_used=tokens)

@app.get("/health")
async def health():
    return {"status": "ok"}

This handles:

User requests
Rate limiting
LLM calls
Database storage
Usage tracking
Logging

Total lines: ~100. Total capabilities: production-ready AI tool.

When to Add Complexity

Start with this simple stack. Add complexity only when:

Add Redis Queue When:

LLM calls take >30 seconds
You're doing batch jobs
You need better retry logic

Add Vector DB When:

You're building RAG with 10K+ documents
You're doing semantic search at scale
Postgres full-text search isn't good enough

Add Custom Model When:

GPT-4 can't achieve quality you need (rare)
You have 10K+ users and cost is significant
You have proprietary data that creates moat

Add Kubernetes When:

You have multiple services with different scaling needs
You have DevOps expertise
Simple hosting is actually becoming expensive

Most AI tools never need these. The ones that do can add them later.

The Real Stack: Speed to Ship

The best stack is the one you can ship with.

I've watched founders spend 3 months setting up:

Kubernetes
Custom vector databases
Fine-tuned models
Complex MLOps pipelines

Then realize their product idea doesn't work and they have to start over.

Meanwhile, I ship in a week with:

FastAPI
OpenAI API
Postgres
Railway

If it works, I iterate. If it doesn't, I pivot without being tied to infrastructure.

The goal isn't "best practices" or "production-grade architecture." It's learning if your product solves a real problem.

Ship the simplest thing that works. Add complexity when simplicity breaks.

That's how you go from 0 to 1.

Shipping Small but Useful AI Tools: A Practical Stack for 0 → 1 Without Heavy MLOps