Ashwin Aravind

When I left Meta to start a company, I thought I'd bring all the best practices with me. Strong schemas, comprehensive testing, data validation pipelines, proper monitoring, the works.

Two weeks in, I realized I was building a data platform for a product that didn't exist yet.

Big tech teaches you patterns that work at massive scale. Most of them are overkill before you have product-market fit. But some of them—the ones nobody talks about—are critical from day one.

Here's what I kept, what I ditched, and what I wish someone had told me earlier.

What Big Tech Got Right

1. Logging Everything (But Not How You Think)

At Meta, we logged every event, every error, every user action. But the value wasn't in the volume—it was in the structure.

The Right Pattern:

Every log had:

user_id (who)
event_name (what)
timestamp (when)
event_properties (context as JSON)
session_id (which session)

That's it. Five fields. Consistent across every team, every service, every log.

The magic wasn't sophisticated analysis—it was being able to answer "show me everything user X did in the last hour" with a single query.

What I Use Now:

import logging
import json
from datetime import datetime

def log_event(user_id, event_name, properties=None, session_id=None):
    """
    Simple structured logging that works at any scale
    """
    log_entry = {
        "user_id": user_id,
        "event_name": event_name,
        "timestamp": datetime.utcnow().isoformat(),
        "properties": properties or {},
        "session_id": session_id
    }
    logging.info(json.dumps(log_entry))

# Usage
log_event(
    user_id="user_123",
    event_name="content_created",
    properties={"content_type": "video", "duration_sec": 45},
    session_id="session_xyz"
)

This logs to stdout, which goes to CloudWatch/Datadog/whatever. No fancy pipeline needed. But because it's structured, I can search, filter, and analyze it.

Cost: $0 infrastructure. Value: infinite when debugging.

What NOT to Do:

Don't build a custom event pipeline before you have 1,000 users. Use a structured logging format and stdout. You can always build something fancier later.

2. The "One Source of Truth" Principle

At Meta, every data type had exactly one authoritative source. User data lived in one service. Content data lived in another. You never duplicated the source of truth.

This sounds obvious, but I see startups break this rule constantly.

Bad Pattern:

User signup info in auth database
User profile info in application database
User preferences in separate microservice
Now you have three sources of truth and they diverge

Good Pattern:

User data lives in ONE place (auth service/database)
Everything else references it by user_id
If you need user data elsewhere, you fetch it or cache it, but the source of truth is clear

Real Example:

In my product, creators have profiles. I was tempted to store profile data in:

Auth service (name, email)
Content service (creator bio, links)
Analytics service (creator stats)

Instead, I keep ONE user table with all profile fields. Other services reference user_id but never duplicate profile data.

When a creator updates their bio, there's no sync logic, no eventual consistency issues. It just works.

The Rule: Before storing data, ask "Is this the source of truth or a reference?" If it's a reference, don't store it—fetch it.

3. Feature Flags for Everything

This is the best practice I brought from Meta that paid off immediately.

Every new feature goes behind a flag:

# Simple feature flag system
FEATURE_FLAGS = {
    "new_editor_ui": {"enabled": False, "rollout_percent": 0},
    "ai_suggestions": {"enabled": True, "rollout_percent": 100},
    "advanced_analytics": {"enabled": True, "rollout_percent": 10}
}

def is_feature_enabled(feature_name, user_id=None):
    flag = FEATURE_FLAGS.get(feature_name)
    if not flag or not flag["enabled"]:
        return False
    
    if flag["rollout_percent"] == 100:
        return True
    
    # Gradual rollout based on user_id hash
    if user_id:
        return (hash(user_id) % 100) < flag["rollout_percent"]
    
    return False

# Usage in code
if is_feature_enabled("new_editor_ui", user_id):
    return render_new_editor()
else:
    return render_old_editor()

Why this matters:

Ship to production anytime, turn features on when ready
Gradual rollout (10% of users, then 50%, then 100%)
Instant rollback if something breaks (flip flag to false)
A/B test without complex infrastructure

I use this for EVERY new feature, even as a team of one. It's saved me countless times.

4. Idempotency for All State Changes

At Meta, every write operation was idempotent: calling it twice produced the same result as calling it once.

This pattern prevents so many bugs:

Duplicate API calls (user double-clicks button)
Retries on failure
Race conditions
Event replay issues

Example Pattern:

# Bad: not idempotent
def create_user(email, name):
    user = User(email=email, name=name)
    db.save(user)
    send_welcome_email(email)

# If this is called twice, you get two users and two emails

# Good: idempotent
def create_user(email, name, idempotency_key):
    # Check if we've already processed this request
    if db.exists(idempotency_key):
        return db.get_user_by_idempotency_key(idempotency_key)
    
    user = User(email=email, name=name, idempotency_key=idempotency_key)
    db.save(user)
    send_welcome_email(email)
    return user

# Now calling it twice with the same idempotency_key returns the same user

The idempotency key can be a request ID, a hash of the inputs, or a UUID generated client-side.

This pattern from Meta has prevented so many production issues in my startup.

What Big Tech Got Wrong (For Startups)

1. Microservices Before You Need Them

Meta had hundreds of services. For good reason—different teams, different scale requirements, different technologies.

Startups copy this pattern because "that's how big tech does it." It's a disaster.

What Doesn't Work:

Auth service
User service
Content service
Analytics service
Notification service
...

Each with its own repo, deployment pipeline, database. You now have 5+ things to maintain, deploy, monitor.

What Actually Works:

Start with a monolith. One service, one database, one deployment.

Add microservices only when you have a clear reason:

This component needs to scale independently
This team needs to deploy independently
This functionality needs different technology

Before 10 engineers, you almost never have these constraints.

I started with one FastAPI service. It handles auth, content, analytics, everything. When analytics queries started slowing down the API, THEN I split it out.

2. Exhaustive Testing Suites

At Meta, we had:

Unit tests (90%+ coverage)
Integration tests
End-to-end tests
Performance tests
Security tests
Accessibility tests

For a stable product serving billions of users, this makes sense.

For a startup finding product-market fit, it's overhead that slows you down.

What Actually Works:

Test the critical paths, skip the rest:

Auth flow (signup, login, password reset)
Core user actions (create content, publish, share)
Payment processing (if applicable)

That's it. As you stabilize, add more tests.

My Testing Setup:

# I test critical flows, not every function
def test_user_signup_flow():
    # User signs up
    response = client.post("/signup", json={
        "email": "test@example.com",
        "password": "secure_password"
    })
    assert response.status_code == 200
    
    # User gets welcome email
    assert email_sent("test@example.com", subject="Welcome")
    
    # User can log in
    response = client.post("/login", json={
        "email": "test@example.com",
        "password": "secure_password"
    })
    assert response.status_code == 200
    assert "access_token" in response.json()

I don't test every edge case. I test that the main flows work. If something breaks, I add a test for it, then fix it.

Test coverage at Meta: 90%+ Test coverage in my startup: 40% Bugs in production: about the same

The difference is I shipped in 3 months instead of 9.

3. Complex Data Pipelines

At Meta, data flowed through:

Real-time streaming (Kafka)
Batch processing (Spark)
Data warehouse (Presto)
OLAP cubes
Dashboards (custom internal tools)

This architecture handled petabytes per day. It required 50+ engineers to maintain.

What Actually Works for Startups:

Your database + a cron job.

Seriously. Unless you're processing millions of events per day, you don't need Kafka, Spark, or a data warehouse.

My Data Architecture:

Events get logged (structured logging shown earlier)
Logs go to CloudWatch
Daily cron job processes yesterday's logs, writes summary to Postgres
Dashboard queries Postgres

Cost: $20/month Maintenance: 0 hours/week Latency: 24 hours (good enough for most metrics)

When I hit 1M+ events/day, I'll upgrade. Until then, simple wins.

4. Premature Optimization

At Meta, we optimized everything. Database queries, API latency, frontend bundle size, image compression, you name it.

Because at Meta's scale, a 10ms improvement saves thousands of dollars.

At startup scale, a 10ms improvement saves... nothing.

What Actually Matters:

Is the product fast enough that users don't complain?
Yes → stop optimizing, ship features
No → optimize the slowest thing, then stop

I spent 2 weeks optimizing my API response time from 200ms to 50ms. Know how many users noticed? Zero.

I should've spent that time talking to users.

The Middle Ground: What to Copy from Day One

These patterns from Meta are worth using from the start:

1. Structured Logging

Uses: debugging, understanding user behavior, building features Cost: free (it's just a logging format) Setup time: 30 minutes

2. Feature Flags

Uses: gradual rollout, A/B testing, instant rollback Cost: free (simple in-code implementation) Setup time: 1 hour

3. Idempotency Keys

Uses: preventing duplicate actions, safe retries Cost: free (just a pattern) Setup time: 30 minutes per endpoint

4. One Source of Truth

Uses: data consistency, avoiding sync bugs Cost: free (it's a design principle) Setup time: 0 (just follow it)

These four patterns prevent entire classes of bugs and cost almost nothing to implement.

The Framework: Scale-Appropriate Engineering

Here's how I decide what to adopt from big tech:

Ask three questions:

Does this solve a problem I have today?
- No → Don't build it
- Yes → Continue
What's the simplest version that solves it?
- Build that version
Will this still work at 10x scale?
- Yes → Ship it
- No → That's okay, you'll refactor later

Most big tech patterns are over-engineered for startups. But some patterns prevent bugs that are expensive to fix later (idempotency, structured logging, feature flags).

Learn to tell the difference.

What I'd Tell My Past Self

If I could go back to day one:

Do this immediately:

Structured logging with user_id, event_name, timestamp, properties
Feature flags for new features
Idempotency keys for state changes
One source of truth for each data type

Wait until you need it:

Microservices (wait until you have 5+ engineers or clear scaling need)
Complex data pipelines (wait until 1M+ events/day)
Comprehensive testing (test critical paths only)
Performance optimization (wait until users complain)

Never do it:

Copy big tech architecture because "best practices"
Build platforms before products
Optimize before measuring
Add complexity without clear ROI

The best thing about working at Meta wasn't learning how to build at scale. It was learning which problems only exist at scale.

Most startups die because they built a data platform instead of a product.

Build the product. Add the platform later, when you need it.

That's the lesson big tech won't teach you—because for them, the platform came first.

What Big Tech Got Right (and Wrong) About Data: Lessons I Actually Use as a Founder