Respect your elders and databases.

I am a vibe coder, and I really didn’t understand how nuanced and explicit you have to be with your database. Like everything has to be very deliberately planned and with a good reason you have to design everything. Each choice should carry a lot of weight behind it.

But as you can guess, I wipe-coded the database schema also and now I realize how wrong of a decision that was. Like I should let it code, but not take my hands off the wheel. I have to be very involved in the planning from the next time.

The two cardinals I committed were:

  1. Duplication of data
  2. Storing what could be calculated

Basically instead of a single table handling some unique value and assigning some foreign ID to it, I had multiple places doing the same thing, different things triggering their calculations and also storing things that should be derived from basic arithmetic instead of storing them like number of scans by users and stuff like that for Rohkun.

And of course, while wiping coding, as you’re fixing bugs, it has no respect for your existing architecture. So it kept going on and editing the code while not taking care of these duplicate data storage points, and everything got out of sync.I remember one day creating an admin dashboard for myself to monitor the usage rate and stuff like that. At one place I noticed the same metric being shown differently. And I realized at that time that yes, the database is cooked. I need to be better.

That’s what I’m doing. I’m learning the principles of database design.

From the Rokin experience, I got to learn two things:

  1. You gotta have good database design because bad quality here flows downstream, and your whole application will be cooked.
  2. To have proper API design and integration in the frontend. That’s where you gotta have good system design, properly defined contracts, so that you can design effectively and personally validate that “hey, this thing is done”. If you are just doing it off the top of your head or letting AI take the command, everything will be out of sync, or orphaned, or duplicated.

I got the intuition behind why we got to represent unique information using the specific tables. The specific table will give a foreign key to whatever unique value you got. It might be user’s name. It might be user’s post or whatever. The point is the underlying thing can change. User’s name can change, user’s email ID can change, the content of the post can change. But the ID will not change, so with the ID you can always reference to that thing. But like if you are trying to edit something and it fails midway, then how do you know what is real and what is not? Or sometimes you can even have two people with the same name. What if you end up making changes to both of them instead of just one? Now I get why we absolutely cannot have even a single duplication. Because we are going to make changes, and duplicates really make the data unusable.

Next time I am working on any application, I will always think about the database. Like if I am making any changes to the database schema, is it creating any duplicates? Or am I doing something wrong? And also when I am editing my code, I will make sure that proper changes are being made to the database. That the event is actually hitting up all the necessary tables. That I must keep the sync and accuracy of the database at the forefront of my mind before I even touch code.

There should be only one source of truth, and to maintain that, we should not have any duplicates here or there. Because then it can go out of sync. But how do you decide what to include in the table and what not? How do you establish that boundary? Because not including some anything anywhere is also bad. Very confused about what to include and what not to include in the table.

Answer from claude:

The Golden Rule: What “Belongs To” This Thing?

Ask yourself: “If this thing didn’t exist, would this piece of information make sense on its own?”

Example: Users Table

User exists as: A person with an account

What belongs to a user?

✅ email - This IS the user's email, it's a property of them
✅ name - This IS their name
✅ password_hash - This IS their password
✅ created_at - When THIS user was created
✅ subscription_tier - What tier THIS user is on

What doesn’t belong to a user?

❌ total_posts - This is a COUNT of posts (calculate it!)
❌ last_post_title - This belongs to a POST, not the user
❌ subscription_tier_price - This belongs to the TIER definition, not the user
❌ total_comments_received - This is a COUNT (calculate it!)

The Decision Framework

Question 1: Is This Describing THE Thing, or Describing ACTIVITY?

Describing the thing = Store it

  • User’s email
  • Post’s title
  • Product’s price

Describing activity = Don’t store it, calculate it

  • How many posts user has written
  • How many times product was viewed
  • Total money user has spent

Question 2: Does This Data “Own” Itself, or Is It Derived?

Owns itself = Store it

user.email = "alice@email.com"
↑ This is alice's email, it exists independently

Derived from other data = Calculate it

user.total_posts = 47
↑ This is counting rows in the posts table where author_id = alice

Question 3: If I Change This, What Else Needs to Change?

Nothing else changes = Store it

Change user.email
→ Just update this one field

Multiple things change = You probably stored something you shouldn’t

User writes a new post
→ Update posts table ✓
→ Update user.total_posts ✗ (now you're maintaining two places)
→ Update daily_aggregates.total_posts ✗ (three places!)

If changing one thing requires updating multiple tables, you’ve violated single source of truth.

The Checklist Before You Deploy

Before you push any code to production:

  • Did I change the database? If yes, did I document why?
  • Did I add any fields? If yes, are they derived or base facts?
  • Did I create duplicates? (Search for the same data in different tables)
  • If a feature requires multiple table updates, did I test that they all happen?
  • Did I test the failure case? (What if one update fails?)
  • Can I query the database and verify the data is correct?
  • Does my admin dashboard still show consistent numbers?

You won’t always pass this checklist perfectly, but asking these questions every time means you’ll catch problems before they become disasters.

The database is not just a storage mechanism. It’s the source of truth for your entire application.

If your database is wrong, your application is wrong—no matter how good your code is. Your frontend can be perfect, your API can be beautiful, but if the database is duplicating data and drifting out of sync, none of it matters.

That’s why you think about the database first