What Big Tech Got Right (and Wrong) About Data: Lessons I Actually Use as a Founder
When I left Meta to start a company, I thought I'd bring all the best practices with me. Strong schemas, comprehensive testing, data validation pipelines, proper monitoring, the works.
Two weeks in, I realized I was building a data platform for a product that didn't exist yet.
Big tech teaches you patterns that work at massive scale. Most of them are overkill before you have product-market fit. But some of them—the ones nobody talks about—are critical from day one.
Here's what I kept, what I ditched, and what I wish someone had told me earlier.
What Big Tech Got Right
1. Logging Everything (But Not How You Think)
At Meta, we logged every event, every error, every user action. But the value wasn't in the volume—it was in the structure.
The Right Pattern:
Every log had:
user_id(who)event_name(what)timestamp(when)event_properties(context as JSON)session_id(which session)
That's it. Five fields. Consistent across every team, every service, every log.
The magic wasn't sophisticated analysis—it was being able to answer "show me everything user X did in the last hour" with a single query.
What I Use Now:
import logging
import json
from datetime import datetime
def log_event(user_id, event_name, properties=None, session_id=None):
"""
Simple structured logging that works at any scale
"""
log_entry = {
"user_id": user_id,
"event_name": event_name,
"timestamp": datetime.utcnow().isoformat(),
"properties": properties or {},
"session_id": session_id
}
logging.info(json.dumps(log_entry))
# Usage
log_event(
user_id="user_123",
event_name="content_created",
properties={"content_type": "video", "duration_sec": 45},
session_id="session_xyz"
)
This logs to stdout, which goes to CloudWatch/Datadog/whatever. No fancy pipeline needed. But because it's structured, I can search, filter, and analyze it.
Cost: $0 infrastructure. Value: infinite when debugging.
What NOT to Do:
Don't build a custom event pipeline before you have 1,000 users. Use a structured logging format and stdout. You can always build something fancier later.
2. The "One Source of Truth" Principle
At Meta, every data type had exactly one authoritative source. User data lived in one service. Content data lived in another. You never duplicated the source of truth.
This sounds obvious, but I see startups break this rule constantly.
Bad Pattern:
- User signup info in auth database
- User profile info in application database
- User preferences in separate microservice
- Now you have three sources of truth and they diverge
Good Pattern:
- User data lives in ONE place (auth service/database)
- Everything else references it by
user_id - If you need user data elsewhere, you fetch it or cache it, but the source of truth is clear
Real Example:
In my product, creators have profiles. I was tempted to store profile data in:
- Auth service (name, email)
- Content service (creator bio, links)
- Analytics service (creator stats)
Instead, I keep ONE user table with all profile fields. Other services reference user_id but never duplicate profile data.
When a creator updates their bio, there's no sync logic, no eventual consistency issues. It just works.
The Rule: Before storing data, ask "Is this the source of truth or a reference?" If it's a reference, don't store it—fetch it.
3. Feature Flags for Everything
This is the best practice I brought from Meta that paid off immediately.
Every new feature goes behind a flag:
# Simple feature flag system
FEATURE_FLAGS = {
"new_editor_ui": {"enabled": False, "rollout_percent": 0},
"ai_suggestions": {"enabled": True, "rollout_percent": 100},
"advanced_analytics": {"enabled": True, "rollout_percent": 10}
}
def is_feature_enabled(feature_name, user_id=None):
flag = FEATURE_FLAGS.get(feature_name)
if not flag or not flag["enabled"]:
return False
if flag["rollout_percent"] == 100:
return True
# Gradual rollout based on user_id hash
if user_id:
return (hash(user_id) % 100) < flag["rollout_percent"]
return False
# Usage in code
if is_feature_enabled("new_editor_ui", user_id):
return render_new_editor()
else:
return render_old_editor()
Why this matters:
- Ship to production anytime, turn features on when ready
- Gradual rollout (10% of users, then 50%, then 100%)
- Instant rollback if something breaks (flip flag to false)
- A/B test without complex infrastructure
I use this for EVERY new feature, even as a team of one. It's saved me countless times.
4. Idempotency for All State Changes
At Meta, every write operation was idempotent: calling it twice produced the same result as calling it once.
This pattern prevents so many bugs:
- Duplicate API calls (user double-clicks button)
- Retries on failure
- Race conditions
- Event replay issues
Example Pattern:
# Bad: not idempotent
def create_user(email, name):
user = User(email=email, name=name)
db.save(user)
send_welcome_email(email)
# If this is called twice, you get two users and two emails
# Good: idempotent
def create_user(email, name, idempotency_key):
# Check if we've already processed this request
if db.exists(idempotency_key):
return db.get_user_by_idempotency_key(idempotency_key)
user = User(email=email, name=name, idempotency_key=idempotency_key)
db.save(user)
send_welcome_email(email)
return user
# Now calling it twice with the same idempotency_key returns the same user
The idempotency key can be a request ID, a hash of the inputs, or a UUID generated client-side.
This pattern from Meta has prevented so many production issues in my startup.
What Big Tech Got Wrong (For Startups)
1. Microservices Before You Need Them
Meta had hundreds of services. For good reason—different teams, different scale requirements, different technologies.
Startups copy this pattern because "that's how big tech does it." It's a disaster.
What Doesn't Work:
- Auth service
- User service
- Content service
- Analytics service
- Notification service
- ...
Each with its own repo, deployment pipeline, database. You now have 5+ things to maintain, deploy, monitor.
What Actually Works:
Start with a monolith. One service, one database, one deployment.
Add microservices only when you have a clear reason:
- This component needs to scale independently
- This team needs to deploy independently
- This functionality needs different technology
Before 10 engineers, you almost never have these constraints.
I started with one FastAPI service. It handles auth, content, analytics, everything. When analytics queries started slowing down the API, THEN I split it out.
2. Exhaustive Testing Suites
At Meta, we had:
- Unit tests (90%+ coverage)
- Integration tests
- End-to-end tests
- Performance tests
- Security tests
- Accessibility tests
For a stable product serving billions of users, this makes sense.
For a startup finding product-market fit, it's overhead that slows you down.
What Actually Works:
Test the critical paths, skip the rest:
- Auth flow (signup, login, password reset)
- Core user actions (create content, publish, share)
- Payment processing (if applicable)
That's it. As you stabilize, add more tests.
My Testing Setup:
# I test critical flows, not every function
def test_user_signup_flow():
# User signs up
response = client.post("/signup", json={
"email": "test@example.com",
"password": "secure_password"
})
assert response.status_code == 200
# User gets welcome email
assert email_sent("test@example.com", subject="Welcome")
# User can log in
response = client.post("/login", json={
"email": "test@example.com",
"password": "secure_password"
})
assert response.status_code == 200
assert "access_token" in response.json()
I don't test every edge case. I test that the main flows work. If something breaks, I add a test for it, then fix it.
Test coverage at Meta: 90%+ Test coverage in my startup: 40% Bugs in production: about the same
The difference is I shipped in 3 months instead of 9.
3. Complex Data Pipelines
At Meta, data flowed through:
- Real-time streaming (Kafka)
- Batch processing (Spark)
- Data warehouse (Presto)
- OLAP cubes
- Dashboards (custom internal tools)
This architecture handled petabytes per day. It required 50+ engineers to maintain.
What Actually Works for Startups:
Your database + a cron job.
Seriously. Unless you're processing millions of events per day, you don't need Kafka, Spark, or a data warehouse.
My Data Architecture:
- Events get logged (structured logging shown earlier)
- Logs go to CloudWatch
- Daily cron job processes yesterday's logs, writes summary to Postgres
- Dashboard queries Postgres
Cost: $20/month Maintenance: 0 hours/week Latency: 24 hours (good enough for most metrics)
When I hit 1M+ events/day, I'll upgrade. Until then, simple wins.
4. Premature Optimization
At Meta, we optimized everything. Database queries, API latency, frontend bundle size, image compression, you name it.
Because at Meta's scale, a 10ms improvement saves thousands of dollars.
At startup scale, a 10ms improvement saves... nothing.
What Actually Matters:
- Is the product fast enough that users don't complain?
- Yes → stop optimizing, ship features
- No → optimize the slowest thing, then stop
I spent 2 weeks optimizing my API response time from 200ms to 50ms. Know how many users noticed? Zero.
I should've spent that time talking to users.
The Middle Ground: What to Copy from Day One
These patterns from Meta are worth using from the start:
1. Structured Logging
Uses: debugging, understanding user behavior, building features Cost: free (it's just a logging format) Setup time: 30 minutes
2. Feature Flags
Uses: gradual rollout, A/B testing, instant rollback Cost: free (simple in-code implementation) Setup time: 1 hour
3. Idempotency Keys
Uses: preventing duplicate actions, safe retries Cost: free (just a pattern) Setup time: 30 minutes per endpoint
4. One Source of Truth
Uses: data consistency, avoiding sync bugs Cost: free (it's a design principle) Setup time: 0 (just follow it)
These four patterns prevent entire classes of bugs and cost almost nothing to implement.
The Framework: Scale-Appropriate Engineering
Here's how I decide what to adopt from big tech:
Ask three questions:
-
Does this solve a problem I have today?
- No → Don't build it
- Yes → Continue
-
What's the simplest version that solves it?
- Build that version
-
Will this still work at 10x scale?
- Yes → Ship it
- No → That's okay, you'll refactor later
Most big tech patterns are over-engineered for startups. But some patterns prevent bugs that are expensive to fix later (idempotency, structured logging, feature flags).
Learn to tell the difference.
What I'd Tell My Past Self
If I could go back to day one:
Do this immediately:
- Structured logging with user_id, event_name, timestamp, properties
- Feature flags for new features
- Idempotency keys for state changes
- One source of truth for each data type
Wait until you need it:
- Microservices (wait until you have 5+ engineers or clear scaling need)
- Complex data pipelines (wait until 1M+ events/day)
- Comprehensive testing (test critical paths only)
- Performance optimization (wait until users complain)
Never do it:
- Copy big tech architecture because "best practices"
- Build platforms before products
- Optimize before measuring
- Add complexity without clear ROI
The best thing about working at Meta wasn't learning how to build at scale. It was learning which problems only exist at scale.
Most startups die because they built a data platform instead of a product.
Build the product. Add the platform later, when you need it.
That's the lesson big tech won't teach you—because for them, the platform came first.