I was reading about information theory and came across the Shannon’s Principle, the formula with log and stuff in it.
I’m still a novice at reading mathematical representations.
So when I saw the log in the formula, I understood that there is some kind of curve that will be involved.
It turns out it’s just an upside-down U.
And then I thought, there is no way someone else is able to actually just physically sit and record this.
I think I should get comfortable with the fact that a lot of these formulas are approximations and just think about the shape of it and not get too lost or intimidated, because this guy also didn’t have any idea.
This is just a representation of it, not the actual; you get what I mean.
Shannon had an intuition, found a mathematical structure that behaved consistently with that intuition, proved it was unique given some reasonable axioms, and shipped it.
So the formula is just a representation, a model, not the real real thing.
The same pattern can be seen across all fields, like how formulas are split between Newton and Einstein, or how they work in their own field, but it cannot be generalized or taken into other fields.
I think I will be less intimidated by the formulas knowing they are not always accurate. Even if my understanding of a formula is not one-to-one, it does not mean I am very far off from the actual thing.
Same when deal with machine learning.
We have training data, and it is the torchbearer of truth.
We use it to get to the error, but even that data was most likely derived using some formula or recorded using some instrument.
Both are models of what the real thing is, with a lot of omissions that were smartly made.
It’s like we don’t actually have any truth or something of that sort.
We just have approximations of reality, and machine learning is an approximation of that approximation if you are dealing with a field like that for your model
And claude just told me this:
The philosophy of science term for this is instrumentalism — the idea that theories and models aren't true or false, they're just more or less useful. And ML is almost pure instrumentalism by design. Nobody building a transformer is claiming it represents how cognition actually works. It just produces outputs that are useful enough often enough.
Essentially we have constraints or definitions and accordingly people make up these formulas to model it.
Of course you cannot use any random formula to deal with information theory. It is not like Shannon is giving you a 1:1; it is an abstraction at the end of the day.
When learning, it is fine to start by understanding the shape of the curve and its behavior, and get a general sense of it. You can flesh it out later.
Unless you are dealing with pure math logic, understand you are facing reality.
And in the face of reality the formula, as you see, is a comparison of intuitions and not a source of divine truth.
They are no different than sentences that communicate with you.
It’s just language.
Be more practical and less intimidated. Just look at a formula, understand the behavior, and move on.
A better thing to walk away from this article with is that most of the time when we encounter math, it is to impose truth, and those truths have high implications for us. When we see math being used to just approximate or model, the same emotions carry over. When they are used to model or approximate something, it is often because the idea was too complex to express in words. If you burn too many brain cells thinking it is some sort of truth with very high implications for you, you will struggle and may not want to touch the material too much.
This finding is liberating: you can look at math at a given time and understand that it is there to give you the general behavior, not the absolute truth. It can encourage you to touch more books.
Well no doubt these formulas are optimal but Optimal for what? to represent the constraints
And where those constraints came from? The current task at hand.
So by nature most of them are not universal.
That’s very relevant for the previous anxiety or intimidation that I discussed.
We are treating things that are not universal with the same emotions we did for things that were universal.
You can come up with a formula for demand elasticity but it is not universal. It works under the constraint for some goods, not all, and even within the goods it applies to, that context can change.
It’s a local tool with a jurisdiction. And even Newton’s laws turned out to have a jurisdiction.
Before you feel intimidated by a formula, ask what jurisdiction it’s claiming.
Is this thing asserting something universal, or is it a model built for a specific class of problems under specific conditions?
Most of the time it’s the latter, and the appropriate response is to understand its behavior within that jurisdiction and move on