"My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'"
He was really brilliant, made contributions all over the place in the math/physics/tech field, and had a sort of wild and quirky personality that people love telling stories about.
A funny quote about him from a Edward “a guy with multiple equations named after him” Teller:
> Edward Teller observed "von Neumann would carry on a conversation with my 3-year-old son, and the two of them would talk as equals, and I sometimes wondered if he used the same principle when he talked to the rest of us."
I felt like I finally understood Shannon entropy when I realized that it's a subjective quantity -- a property of the observer, not the observed.
The entropy of a variable X is the amount of information required to drive the observer's uncertainty about the value of X to zero. As a correlate, your uncertainty and mine about the value of the same variable X could be different. This is trivially true, as we could each have received different information that about X. H(X) should be H_{observer}(X), or even better, H_{observer, time}(X).
As clear as Shannon's work is in other respects, he glosses over this.
What's often lost in the discussions about whether entropy is subjective or objective is that, if you dig a little deeper, information theory gives you powerful tools for relating the objective and the subjective.
Consider cross entropy of two distributions H[p, q] = -Σ p_i log q_i. For example maybe p is the real frequency distribution over outcomes from rolling some dice, and q is your belief distribution. You can see the p_i as representing the objective probabilities (sampled by actually rolling the dice) and the q_i as your subjective probabilities. The cross entropy is measuring something like how surprised you are on average when you observe an outcome.
The interesting thing is that H[p, p] <= H[p, q], which means that if your belief distribution is wrong, your cross entropy will be higher than it would be if you had the right beliefs, q=p. This is guaranteed by the concavity of the logarithm. This gives you a way to compare beliefs: whichever q gets the lowest H[p,q] is closer to the truth.
You can even break cross entropy into two parts, corresponding to two kinds of uncertainty: H[p, q] = H[p] + D[q||p]. The first term is the entropy of p and it is the aleatoric uncertainty, the inherent randomness in the phenomenon you are trying to model. The second term is KL divergence and it tells you how much additional uncertainty you have as the result of having wrong beliefs, which you could call epistemic uncertainty.
Thanks, that's an interesting perspective. It also highlights one of the weak points in the concept, I think, which is that this is only a tool for updating beliefs to the extent that the underlying probability space ("ontology" in this analogy) can actually "model" the phenomenon correctly!
It doesn't seem to shed much light on when or how you could update the underlying probability space itself (or when to change your ontology in the belief setting).
This kind of thinking will lead you to ideas like algorithmic probability, where distributions are defined using universal Turing machines that could model anything.
I think what you're getting at is the construction of the sample space - the space of outcomes over which we define the probability measure (e.g. {H,T} for a coin, or {1,2,3,4,5,6} for a die).
Let's consider two possibilities:
1. Our sample space is "incomplete"
2. Our sample space is too "coarse"
Let's discuss 1 first. Imagine I have a special die that has a hidden binary state which I can control, which forces the die to come up either even or odd. If your sample space is only which side faces up, and I randomize the hidden state appropriately, it appears like a normal die. If your sample space is enlarged to include the hidden state, the entropy of each roll is reduced by one bit. You will not be able to distinguish between a truly random coin and a coin with a hidden state if your sample space is incomplete. Is this the point you were making?
On 2: Now let's imagine I can only observe whether the die comes up even or odd. This is a coarse-graining of the sample space (we get strictly less information - or, we only get some "macro" information). Of course, a coarse-grained sample space is necessarily an incomplete one! We can imagine comparing the outcomes from a normal die, to one which with equal probability rolls an even or odd number, except it cycles through the microstates deterministically e.g. equal chance of {odd, even}, but given that outcome, always goes to next in sequence {(1->3->5), (2->4->6)}.
Incomplete or coarse sample spaces can indeed prevent us from inferring the underlying dynamics. Many processes can have the same apparent entropy on our sample space from radically different underlying processes.
Correct anything thats wrong here. Cross entropy is the comparison of two distributions right? Is the objectivity sussed out in relation to the overlap cross section. And is the subjectivity sussed out not on average but deviations on average? Just trying to understand it in my framework which might be wholly off the mark.
Cross entropy lets you compare two probability distributions. One way you can apply it is to let the distribution p represent "reality" (from which you can draw many samples, but whose numerical value you might not know) and to let q represent "beliefs" (whose numerical value is given by a model). Then by finding q to minimize cross-entropy H[p, q] you can move q closer to reality.
I'm not sure what you mean by objectivity and subjectivity in this case.
With the example of beliefs, you can think of cross entropy as the negative expected value of the log probability you assigned to an outcome, weighted by the true probability of each outcome. If you assign larger log probabilities to more likely outcomes, the cross entropy will be lower.
This doesn't really make entropy itself observer dependent. (Shannon) entropy is a property of a distribution. It's just that when you're measuring different observers' beliefs, you're looking at different distributions (which can have different entropies the same way they can have different means, variances, etc).
Entropy is a property of a distribution, but since math does sometimes get applied, we also attach distributions to things (eg. the entropy of a random number generator, the entropy of a gas...). Then when we talk about the entropy of those things, those entropies are indeed subjective, because different subjects will attach different probability distributions to that system depending on their information about that system.
Some probability distributions are objective. The probability that my random number generator gives me a certain number is given by a certain formula. Describing it with another distribution would be wrong.
Another example, if you have an electron in a superposition of half spin-up and half spin-down, then the probability to measure up is objectively 50%.
Another example, GPT-2 is a probability distribution on sequences of integers. You can download this probability distribution. It doesn't represent anyone's beliefs. The distribution has a certain entropy. That entropy is an objective property of the distribution.
"Entropy is a property of matter that measures the degree of randomization or disorder at the microscopic level", at least when considering the second law.
Entropy in physics is usually the Shannon entropy of the probability distribution over system microstates given known temperature and pressure. If the system is in equilibrium then this is objective.
Yeah but distributions are just the accounting tools to keep track of your entropy. If you are missing one bit of information about a system, your understanding of the system is some distribution with one bit of entropy. Like the original comment said, the entropy is the number of bits needed to fill in the unknowns and bring the uncertainty down to zero. Your coin flips may be unknown in advance to you, and thus you model it as a 50/50 distribution, but in a deterministic universe the bits were present all along.
It's an objective quantity, but you have to be very precise in stating what the quantity describes.
Unbroken egg? Low entropy. There's only one way the egg can exist in an unbroken state, and that's it. You could represent the state of the egg with a single bit.
Broken egg? High entropy. There are an arbitrarily-large number of ways that the pieces of a broken egg could land.
A list of the locations and orientations of each piece of the broken egg, sorted by latitude, longitude, and compass bearing? Low entropy again; for any given instance of a broken egg, there's only one way that list can be written.
Zip up the list you made? High entropy again; the data in the .zip file is effectively random, and cannot be compressed significantly further. Until you unzip it again...
Likewise, if you had to transmit the (uncompressed) list over a bandwidth-limited channel. The person receiving the data can make no assumptions about its contents, so it might as well be random even though it has structure. Its entropy is effectively high again.
Entropy is calculated as a function of a probability distribution over possible messages or symbols. The sender might have a distribution P over possible symbols, and the receiver might have another distribution Q over possible symbols. Then the "true" distribution over possible symbols might be another distribution yet, call it R. The mismatch between these is what leads to various inefficiencies in coding, decoding, etc [1]. But both P and Q are beliefs about R -- that is, they are properties of observers.
the subjectivity doesn't stem from the definition of the channel but from the model of the information source. what's the prior probability that you intended to say 'weave', for example? that depends on which model of your mind we are using. frequentists argue that there is an objectively correct model of your mind we should always use, and bayesians argue that it depends on our prior knowledge about your mind
I really liked the approach my stat mech teacher used. In nearly all situations, entropy just ends up being the log of the number of ways a system can be arranged (https://en.wikipedia.org/wiki/Boltzmann%27s_entropy_formula) although I found it easiest to think in terms of pairs of dice rolls.
And this is what I prefer too, although with the clarification that its the number of ways that a system can be arranged without changing its macroscopic properties.
Its, unfortunately, not very compatible with Shannon's usage in any but the shallowest sense, which is why it stays firmly in the land of physics.
> not very compatible with Shannon's usage in any but the shallowest sense
The connection is not so shallow, there are entire books based on it.
“The concept of information, intimately connected with that of probability, gives indeed insight on questions of statistical mechanics such as the meaning of irreversibility. This concept was introduced in statistical physics by Brillouin (1956) and Jaynes (1957) soon after its discovery by Shannon in 1948 (Shannon and Weaver, 1949). An immense literature has since then been published, ranging from research articles to textbooks. The variety of topics that belong to this field of science makes it impossible to give here a bibliography, and special searches are necessary for deepening the understanding of one or another aspect. For tutorial introductions, somewhat more detailed than the present one, see R. Balian (1991-92; 2004).”
I don't dispute that the math is compatible. The problem is the interpretation thereof. When I say "shallowest", I mean the implications of each are very different.
Insofar as I'm aware, there is no information-theoretic equivalent to the 2nd or 3rd laws of thermodynamics, so the intuition a student works up from physics about how and why entropy matters just doesn't transfer. Likewise, even if an information science student is well versed in the concept of configuration entropy, that's 15 minutes of one lecture in statistical thermodynamics. There's still the rest of the course to consider.
Assuming each of the N microstates for a given macrostate are equally possible with probability p=1/N, the Shannon Entropy is -Σp.log(p) = -N.p.log(p)=-1.log(1/N)=log(N), which is the physics interpretation.
In the continuous version, you would get log(V) where V is the volume in phase space occupied by the microstates for a given macrostate.
Liouville's theorem that the volume is conserved in phase space implies that any macroscopic process can only move all the microstates from a macrostate A into a macrostate B only if the volume of B is bigger than the volume of A. This implies that the entropy of B should be bigger than the entropy of A which is the Second Law.
The second law of thermodynamics is time-asymmetric, but the fundamental physical laws are time-symmetric, so from them you can only predict that the entropy of B should be bigger than the entropy of A irrespective of whether B is in the future or the past of A. You need the additional assumption (Past Hypothesis) that the universe started in a low entropy state in order to get the second law of thermodynamics.
> If our goal is to predict the future, it suffices to choose a distribution that is uniform in the Liouville measure given to us by classical mechanics (or its quantum analogue). If we want to reconstruct the past, in contrast, we need to conditionalize over trajectories that also started in a low-entropy past state — that the “Past Hypothesis” that is required to get stat mech off the ground in a world governed by time-symmetric fundamental laws.
The "can be arranged" is the tricky part. E.g. you might know from context that some states are impossible (where the probability distribution is zero), even though they combinatorially exist. That changes the entropy to you.
That is why information and entropy are different things. Entropy is what you know you do not know. That knowledge of the magnitude of the unknown is what is being quantified.
Also, the point where I think the article is wrong (or not concise enough) as it would include the unknown unknowns, which are not entropy IMO:
> I claim it’s the amount of information we don’t know about a situation
For information theory, I've always thought of entropy as follows:
"If you had a really smart compression algorithm, how many bits would it take to accurately represent this file?"
i.e., Highly repetitive inputs compress well because they don't have much entropy per bit. Modern compression algorithms are good enough on most data to be used as a reasonable approximation for the true entropy.
I've always favored this down-to-earth characterization of the entropy of a discrete probability distribution. (I'm a big fan of John Baez's writing, but I was surprised glancing through the PDF to find that he doesn't seem to mention this viewpoint.)
Think of the distribution as a histogram over some bins. Then, the entropy is a measurement of, if I throw many many balls at random into those bins, the probability that the distribution of balls over bins ends up looking like that histogram. What you usually expect to see is a uniform distribution of balls over bins, so the entropy measures the probability of other rare events (in the language of probability theory, "large deviations" from that typical behavior).
More specifically, if P = (P1, ..., Pk) is some distribution, then the probability that throwing N balls (for N very large) gives a histogram looking like P is about 2^(-N * [log(k) - H(P)]), where H(P) is the entropy. When P is the uniform distribution, then H(P) = log(k), the exponent is zero, and the estimate is 1, which says that by far the most likely histogram is the uniform one. That is the largest possible entropy, so any other histogram has probability 2^(-c*N) of appearing for some c > 0, i.e., is very unlikely and exponentially moreso the more balls we throw, but the entropy measures just how much. "Less uniform" distributions are less likely, so the entropy also measures a certain notion of uniformity. In large deviations theory this specific claim is called "Sanov's theorem" and the role the entropy plays is that of a "rate function."
The counting interpretation of entropy that some people are talking about is related, at least at a high level, because the probability in Sanov's theorem is the number of outcomes that "look like P" divided by the total number, so the numerator there is indeed counting the number of configurations (in this case of balls and bins) having a particular property (in this case looking like P).
There are lots of equivalent definitions and they have different virtues, generalizations, etc, but I find this one especially helpful for dispelling the air of mystery around entropy.
Hey did you want to say relative entropy ~ rate function ~ KL divergence. Might be more familiar to ML enthusiasts here, get them to be curious about Sanov or large deviations.
That's right, here log(k) - H(p) is really the relative entropy (or KL divergence) between p and the uniform distribution, and all the same stuff is true for a different "reference distribution" of the probabilities of balls landing in each bin.
For discrete distributions the "absolute entropy" (just sum of -p log(p) as it shows up in Shannon entropy or statistical mechanics) is in this way really a special case of relative entropy. For continuous distributions, say over real numbers, the analogous quantity (integral of -p log(p)) isn't a relative entropy since there's no "uniform distribution over all real numbers". This still plays an important role in various situations and calculations...but, at least to my mind, it's a formally similar but conceptually separate object.
Information entropy is literally the strict lower bound on how efficiently information can be communicated (expected number of transmitted bits) if the probability distribution which generates this information is known, that's it. Even in contexts such as calculating the information entropy of a bit string, or the English language, you're just taking this data and constructing some empirical probability distribution from it using the relative frequencies of zeros and ones or letters or n-grams or whatever, and then calculating the entropy of that distribution.
I can't say I'm overly fond of Baez's definition, but far be it from me to question someone of his stature.
"I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book!"
For those interested I am currently reading "Entropy Demystified" by Arieh Ben-Naim which tackles this side of things from much the same direction.
I sometimes ponder where new entropy/randomness is coming from, like if we take the earliest state of universe as an infinitely dense point particle which expanded. So there must be some randomness or say variety which led it to expand in a non uniform way which led to the dominance of matter over anti-matter, or creation of galaxies, clusters etc.
If we take an isolated system in which certain static particles are present, will there be the case that a small subset of the particles will get motion and this introduce entropy? Can entropy be induced automatically, atleast on a quantum level?
If anyone can help me explain that it will be very helpful and thus can help explain origin of universe in a better way.
Symmetry breaking is the general phenomenon that underlies most of that.
The classic example is this:
Imagine you have a perfectly symmetrical sombrero[1], and there's a ball balanced on top of the middle of the hat. There's no preferred direction it should fall in, but it's _unstable_. Any perturbation will make it roll down hill and come to rest in a stable configuration on the brim of the hat. The symmetry of the original configuration is now broken, but it's stable.
He argues that the randomness you are looking for comes from quantum fluctuations, and if this randomness did not exist, the universe would probably never have "happened".
Thanks for the reference will take some time before I see the whole video.
Can you tell me what those quantum fluctuations are in short? Are they part of some physical law?
Am I only one that can't download the pdf, or is the file server down? I can see the blog page but when I try downloading the ebook it just doesn't work..
If the file server is down.. anyone could upload the ebook for download?
Hmmm that list of things that contribute to entropy I've noticed omits particles which under "normal circumstances" on earth exist in bound states, for example it doesn't mentions W bosons or gluons. But in some parts of the universe they're not bound but in different state of matter, e.g. quark gluon plasma. I wonder how or if this was taken I to account.
I like the formulation of 'the amount of information we don't know about a system that we could in theory learn'. I'm surprised there's no mention of the Copenhagen interpretation's interaction with this definition, under a lot of QM theories 'unavailable information' is different from available information.
>I have largely avoided the second law of thermodynamics ... Thus, the aspects of entropy most beloved by physics popularizers will not be found here.
But personally, this bit is the most exciting to me.
>I have tried to say as little as possible about quantum mechanics, to keep the physics prerequisites low. However, Planck’s constant shows up in the formulas for the entropy of the three classical systems mentioned above. The reason for this is fascinating: Planck’s constant provides a unit of volume in position-momentum space, which is necessary to define the entropy of these systems. Thus, we need a tiny bit of quantum mechanics to get a good approximate formula for the entropy of hydrogen, even if we are trying our best to treat this gas classically.
There's fundamental nature of entropy, but as usual it's not very enlightening for poor monkey brain, so to explain you need to enumerate all its high level behavior, but its high level behavior is accidental and can't be summarized in a concise form.
My definition: Entropy is a measure of the accumulation of non-reversible energy transfers.
Side note: All reversible energy transfers involve an increase in potential energy. All non-reversible energy transfers involve a decrease in potential energy.
That definition doesn't work well because you can have changes in entropy even if no energy is transferred, e.g. by exchanging some other conserved quantity.
The side note is wrong in letter and spirit; turning potential energy into heat is one way for something to be irreversible, but neither of those statements is true.
For example, consider an iron ball being thrown sideways. It hits a pile of sand and stops. The iron ball is not affected structurally, but its kinetic energy is transferred (almost entirely) to heat energy. If the ball is thrown slightly upwards, potential energy increases but the process is still irreversible.
Also, the changes of potential energy in corresponding parts of two Carnot cycles are directionally the same, even if one is ideal (reversible) and one is not (irreversible).
After years of thought I dare to say the 2nd TL is a tautology. Entropy is increasing means every system tends to higher probability means the most probable is the most probable.
If I would write a book with that title then I would get to the point a bit faster, probably as follows.
Entropy is just a number you can associate with a probability distribution. If the distribution is discrete, so you have a set p_i, i = 1..n, which are each positive and sum to 1, then the definition is:
S = - sum_i p_i log( p_i )
Mathematically we say that entropy is a real-valued function on the space of probability distributions. (Elementary exercises: show that S >= 0 and it is maximized on the uniform distribution.)
That is it. I think there is little need for all the mystery.
So the only thing you need to know about entropy is that it's a real-valued number you can associate with a probability distribution? And that's it? I disagree. There are several numbers that can be associated with probability distribution, and entropy is an especially useful one, but to understand why entropy is useful, or why you'd use that function instead of a different one, you'd need to know a few more things than just what you've written here.
In particular, the expectation (or variance) of a real-valued random variable can also be seen as "a real-valued number you can associate with a probability distribution".
Thus, GP's statement is basically: "entropy is like expectation, but different".
Exactly, saying that's all there is to know about entropy is like saying all you need to know about chess are the rules and all you need to know about programming is the syntax/semantics.
Knowing the plain definition or the rules is nothing but a superficial understanding of the subject. Knowing how to use the rules to actually do something meaningful, having a strategy, that's where meaningful knowledge lies.
The problem is that this doesn't get at many of the intuitive properties of entropy.
A different explanation (based on macro- and micro-states) makes it intuitively obvious why entropy is non-decreasing with time or, with a little more depth, what entropy has to do with temperature.
That doesn't strike me as a problem. Definitions are often highly abstract and counterintuitive, with much study required to understand at an intuitive level what motivates them. Rigour and intuition are often competing concerns, and I think definitions should favour the former. The definition of compactness in topology, or indeed just the definition of a topological space, are examples of this - at face value, they're bizarre. You have to muck around a fair bit to understand why they cut so brilliantly to the heart of the thing.
The above evidently only suffices as a definition, not as an entire course. My point was just that I don't think any other introduction beats this one, especially for a book with the given title.
In particular it has always been my starting point whenever I introduce (the entropy of) macro- and micro-states in my statistical physics course.
Correct! And it took me just one paragraph, not the 18 pages of meandering (and I think confusing) text that it takes the author of the pdf to introduce the same idea.
Thanks for defining it rigorously. I think people are getting offended on John Baez's behalf because his book obviously covers a lot more - like why does this particular number seem to be so useful in so many different contexts? How could you have motivated it a priori? Etcetera, although I suspect you know all this already.
But I think you're right that a clear focus on the maths is useful for dispelling misconceptions about entropy.
Misconceptions about entropy are misconceptions about physics. You can’t dispell them focusing on the maths and ignoring the physics entirely - especially if you just write an equation without any conceptual discussion, not even mathematical.
I didn't say to only focus on the mathematics. Obviously wherever you apply the concept (and it's applied to much more than physics) there will be other sources of confusion. But just knowing that entropy is a property of a distribution, not a state, already helps clarify your thinking.
For instance, you know that the question "what is the entropy of a broken egg?" is actually meaningless, because you haven't specified a distribution (or a set of micro/macro states in the stat mech formulation).
Many students will want to know where the minus sign comes from. I like to write the formula instead as S = sum_i p_i log( 1 / p_i ), where (1 / p_i) is the "surprise" (i.e., expected number of trials before first success) associated with a given outcome (or symbol), and we average it over all outcomes (i.e., weight it by the probability of the outcome). We take the log of the "surprise" because entropy is an extensive quantity, so we want it to be additive.
As of this moment there are six other top-level comments which each try to define entropy, and frankly they are all wrong, circular, or incomplete. Clearly the very definition of entropy is confusing, and the definition is what my comment provides.
I never said that all the other properties of entropy are now immediately visible. Instead I think it is the only universal starting point of any reasonable discussion or course on the subject.
And lastly I am frankly getting discouraged by all the dismissive responses. So this will be my last comment for the day, and I will leave you in the careful hands of, say, the six other people who are obviously so extremely knowledgeable about this topic. /s
One could also say that it’s just a consequence of the passage of time (as in getting away from a boundary condition). The decay of radioactive atoms is also a measure of the arrow of time - of course we can say that’s the same thing.
CP violation may (or may not) be more relevant regarding the arrow of time.
My first contact with entropy was in chemistry and thermodynamics and I didn't get it. Actually I didn't get anything from engineering thermodynamics books such as Çengel and so.
This seems like a great resource for referencing the various definitions. I've tried my hand at developing an intuitive understanding: https://spacechimplives.substack.com/p/observers-and-entropy. TLDR - it's an artifact of the model we're using. In the thermodynamic definition, the energy accounted for in the terms of our model is information. The energy that's not is entropic energy. Hence why it's not "useable" energy, and the process isn't reversible.
Entropy is the distribution of potential over negative potential.
This could be said "the distribution of what ever may be over the surface area of where it may be."
This is erroneously taught in conventional information theory as "the number of configurations in a system" or the available information that has yet to be retrieved. Entropy includes the unforseen, and out of scope.
Entropy is merely the predisposition to flow from high to low pressure (potential). That is it. Information is a form of potential.
Philosophically what are entropy's guarantees?
- That there will always be a super-scope, which may interfere in ways unanticipated;
- everything decays the only mystery is when and how.
It sounds like log-probability is the manifold surface area.
Distribution of potential over negative potential. Negative potential is the "surface area", and available potential distributes itself "geometrically". All this is iterative obviously, some periodicity set by universal speed limit.
It really doesn't sound like you disagree with me.
Baez seems to use the definition you call erroneous: "It’s easy to wax poetic about entropy, but what is it? I claim it’s the amount of
information we don’t know about a situation, which in principle we could learn."
But it is possible to account for the unforseen (or out-of-vocabulary) by, for example, a Good-Turing estimate. This satisfies your demand for a fully defined state space while also being consistent with GP's definition.
You are referring to the conceptual device you believe bongs to you and your equations. Entropy creates attraction and repulsion, even causing working bias. We rely upon it for our system functions.
All definitions of entropy stem from one central, universal definition: Entropy is the amount of energy unable to be used for useful work. Or better put grammatically: entropy describes the effect that not all energy consumed can be used for work.
There's a good case to be made that the information-theoretic definition of entropy is the most fundamental one, and the version that shows up in physics is just that concept as applied to physics.
My favorite course I took as part of my physics degree was statistical mechanics. It leaned way closer to information theory than I would have expected going in, but in retrospect should have been obvious.
Unrelated: my favorite bit from any physics book is probably still the introduction of the first chapter of "States of Matter" by David Goodstein: "Ludwig Boltzmann, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on the work, died similarly in 1933. Now it is our turn to study statistical mechanics."
Yeah, people seemingly misunderstand that the entropy applied to thermodynamics is simply an aggregate statistic that summarizes the complex state of the thermodynamic system as a single real number.
The fact that entropy always rises etc, has nothing to do with the statistical concept of entropy itself. It simply is an easier way to express the physics concept that individual atoms spread out their kinetic energy across a large volume.
A well known anecdote reported by Shannon:
"My greatest concern was what to call it. I thought of calling it 'information,' but the word was overly used, so I decided to call it 'uncertainty.' When I discussed it with John von Neumann, he had a better idea. Von Neumann told me, 'You should call it entropy, for two reasons. In the first place your uncertainty function has been used in statistical mechanics under that name, so it already has a name. In the second place, and more important, no one really knows what entropy really is, so in a debate you will always have the advantage.'"
See the answers to this MathOverflow SE question (https://mathoverflow.net/questions/403036/john-von-neumanns-...) for references on the discussion whether Shannon's entropy is the same as the one from thermodynamics.
Von Neumann was the king of kings
So much so, he has his own entropy!
https://en.wikipedia.org/wiki/Von_Neumann_entropy
I disagree... Von Neumann went beyond being a King of Kings, the man was a God (or a "Monster Mind" according to Feynman) :)
He's a certified Martian: https://en.wikipedia.org/wiki/The_Martians_(scientists).
I was hoping the Wikipedia might explain why this might have been.
[dead]
Its odd...as someone interested but not fully into the sciences I see his name pop up everywhere.
He was really brilliant, made contributions all over the place in the math/physics/tech field, and had a sort of wild and quirky personality that people love telling stories about.
A funny quote about him from a Edward “a guy with multiple equations named after him” Teller:
> Edward Teller observed "von Neumann would carry on a conversation with my 3-year-old son, and the two of them would talk as equals, and I sometimes wondered if he used the same principle when he talked to the rest of us."
Even mortals such as ourselves can apply some of Von Neumann's ideas in our everyday lives:
https://en.m.wikipedia.org/wiki/Fair_coin#Fair_results_from_...
I've seen many people arguing he's the most intelligent person that ever lived
An Introduction here : https://www.youtube.com/watch?v=IPMjVcLiNKc
[flagged]
I felt like I finally understood Shannon entropy when I realized that it's a subjective quantity -- a property of the observer, not the observed.
The entropy of a variable X is the amount of information required to drive the observer's uncertainty about the value of X to zero. As a correlate, your uncertainty and mine about the value of the same variable X could be different. This is trivially true, as we could each have received different information that about X. H(X) should be H_{observer}(X), or even better, H_{observer, time}(X).
As clear as Shannon's work is in other respects, he glosses over this.
What's often lost in the discussions about whether entropy is subjective or objective is that, if you dig a little deeper, information theory gives you powerful tools for relating the objective and the subjective.
Consider cross entropy of two distributions H[p, q] = -Σ p_i log q_i. For example maybe p is the real frequency distribution over outcomes from rolling some dice, and q is your belief distribution. You can see the p_i as representing the objective probabilities (sampled by actually rolling the dice) and the q_i as your subjective probabilities. The cross entropy is measuring something like how surprised you are on average when you observe an outcome.
The interesting thing is that H[p, p] <= H[p, q], which means that if your belief distribution is wrong, your cross entropy will be higher than it would be if you had the right beliefs, q=p. This is guaranteed by the concavity of the logarithm. This gives you a way to compare beliefs: whichever q gets the lowest H[p,q] is closer to the truth.
You can even break cross entropy into two parts, corresponding to two kinds of uncertainty: H[p, q] = H[p] + D[q||p]. The first term is the entropy of p and it is the aleatoric uncertainty, the inherent randomness in the phenomenon you are trying to model. The second term is KL divergence and it tells you how much additional uncertainty you have as the result of having wrong beliefs, which you could call epistemic uncertainty.
Thanks, that's an interesting perspective. It also highlights one of the weak points in the concept, I think, which is that this is only a tool for updating beliefs to the extent that the underlying probability space ("ontology" in this analogy) can actually "model" the phenomenon correctly!
It doesn't seem to shed much light on when or how you could update the underlying probability space itself (or when to change your ontology in the belief setting).
This kind of thinking will lead you to ideas like algorithmic probability, where distributions are defined using universal Turing machines that could model anything.
You can sort of do this over a suitably large (or infinite) family of models all mixed, but from an epistemological POV that’s pretty unsatisfying.
From a practical POV it’s pretty useful and common (if you allow it to describe non- and semi-parametric models too).
Couldn't you just add a control (PID/Kalman filter/etc) to coverage on a stability of some local "most" truth?
I think what you're getting at is the construction of the sample space - the space of outcomes over which we define the probability measure (e.g. {H,T} for a coin, or {1,2,3,4,5,6} for a die).
Let's consider two possibilities:
1. Our sample space is "incomplete"
2. Our sample space is too "coarse"
Let's discuss 1 first. Imagine I have a special die that has a hidden binary state which I can control, which forces the die to come up either even or odd. If your sample space is only which side faces up, and I randomize the hidden state appropriately, it appears like a normal die. If your sample space is enlarged to include the hidden state, the entropy of each roll is reduced by one bit. You will not be able to distinguish between a truly random coin and a coin with a hidden state if your sample space is incomplete. Is this the point you were making?
On 2: Now let's imagine I can only observe whether the die comes up even or odd. This is a coarse-graining of the sample space (we get strictly less information - or, we only get some "macro" information). Of course, a coarse-grained sample space is necessarily an incomplete one! We can imagine comparing the outcomes from a normal die, to one which with equal probability rolls an even or odd number, except it cycles through the microstates deterministically e.g. equal chance of {odd, even}, but given that outcome, always goes to next in sequence {(1->3->5), (2->4->6)}.
Incomplete or coarse sample spaces can indeed prevent us from inferring the underlying dynamics. Many processes can have the same apparent entropy on our sample space from radically different underlying processes.
Correct anything thats wrong here. Cross entropy is the comparison of two distributions right? Is the objectivity sussed out in relation to the overlap cross section. And is the subjectivity sussed out not on average but deviations on average? Just trying to understand it in my framework which might be wholly off the mark.
Cross entropy lets you compare two probability distributions. One way you can apply it is to let the distribution p represent "reality" (from which you can draw many samples, but whose numerical value you might not know) and to let q represent "beliefs" (whose numerical value is given by a model). Then by finding q to minimize cross-entropy H[p, q] you can move q closer to reality.
You can apply it other ways. There are lots of interpretations and uses for these concepts. Here's a cool blog post if you want to find out more: https://blog.alexalemi.com/kl-is-all-you-need.html
I'm not sure what you mean by objectivity and subjectivity in this case.
With the example of beliefs, you can think of cross entropy as the negative expected value of the log probability you assigned to an outcome, weighted by the true probability of each outcome. If you assign larger log probabilities to more likely outcomes, the cross entropy will be lower.
This doesn't really make entropy itself observer dependent. (Shannon) entropy is a property of a distribution. It's just that when you're measuring different observers' beliefs, you're looking at different distributions (which can have different entropies the same way they can have different means, variances, etc).
Entropy is a property of a distribution, but since math does sometimes get applied, we also attach distributions to things (eg. the entropy of a random number generator, the entropy of a gas...). Then when we talk about the entropy of those things, those entropies are indeed subjective, because different subjects will attach different probability distributions to that system depending on their information about that system.
Some probability distributions are objective. The probability that my random number generator gives me a certain number is given by a certain formula. Describing it with another distribution would be wrong.
Another example, if you have an electron in a superposition of half spin-up and half spin-down, then the probability to measure up is objectively 50%.
Another example, GPT-2 is a probability distribution on sequences of integers. You can download this probability distribution. It doesn't represent anyone's beliefs. The distribution has a certain entropy. That entropy is an objective property of the distribution.
"Entropy is a property of matter that measures the degree of randomization or disorder at the microscopic level", at least when considering the second law.
Right but in chemistry class the way it’s taught via Gibbs free energy etc. makes it seem as if it’s an intrinsic property.
Entropy in physics is usually the Shannon entropy of the probability distribution over system microstates given known temperature and pressure. If the system is in equilibrium then this is objective.
that's actually the normal view, with saying both info and stat mech entropy are the same is an outlier, most popularized by Jaynes.
Yeah but distributions are just the accounting tools to keep track of your entropy. If you are missing one bit of information about a system, your understanding of the system is some distribution with one bit of entropy. Like the original comment said, the entropy is the number of bits needed to fill in the unknowns and bring the uncertainty down to zero. Your coin flips may be unknown in advance to you, and thus you model it as a 50/50 distribution, but in a deterministic universe the bits were present all along.
Trivial example: if you know the seed of a pseudo-random number generator, a sequence generated by it has very low entropy.
But if you don't know the seed, the entropy is very high.
Theoretically, it's still only the entropy of the sneed-space + time-space it could have been running in, right?
To shorten this for you with my own (identical) understanding: "entropy is just the name for the bits you don't have".
Entropy + Information = Total bits in a complete description.
It's an objective quantity, but you have to be very precise in stating what the quantity describes.
Unbroken egg? Low entropy. There's only one way the egg can exist in an unbroken state, and that's it. You could represent the state of the egg with a single bit.
Broken egg? High entropy. There are an arbitrarily-large number of ways that the pieces of a broken egg could land.
A list of the locations and orientations of each piece of the broken egg, sorted by latitude, longitude, and compass bearing? Low entropy again; for any given instance of a broken egg, there's only one way that list can be written.
Zip up the list you made? High entropy again; the data in the .zip file is effectively random, and cannot be compressed significantly further. Until you unzip it again...
Likewise, if you had to transmit the (uncompressed) list over a bandwidth-limited channel. The person receiving the data can make no assumptions about its contents, so it might as well be random even though it has structure. Its entropy is effectively high again.
Baez has a video (accompanying, imho), with slides
https://m.youtube.com/watch?v=5phJVSWdWg4&t=17m
He illustrates the derivation of Shannon entropy with pictures of trees
> it's a subjective quantity -- a property of the observer, not the observed
Shannon's entropy is a property of the source-channel-receiver system.
Can you explain this in more detail?
Entropy is calculated as a function of a probability distribution over possible messages or symbols. The sender might have a distribution P over possible symbols, and the receiver might have another distribution Q over possible symbols. Then the "true" distribution over possible symbols might be another distribution yet, call it R. The mismatch between these is what leads to various inefficiencies in coding, decoding, etc [1]. But both P and Q are beliefs about R -- that is, they are properties of observers.
[1] https://en.wikipedia.org/wiki/Kullback–Leibler_divergence#Co...
> he glosses over this
All of information theory is relative to the channel. This bit is well communicated.
What he glosses over is the definition of "channel", since it's obvious for electromagnetic communications.
https://archive.is/9vnVq
shannon entropy is subjective for bayesians and objective for frequentists
The entropy is objective if you completely define the communication channel, and subjective if you weave the definition away.
the subjectivity doesn't stem from the definition of the channel but from the model of the information source. what's the prior probability that you intended to say 'weave', for example? that depends on which model of your mind we are using. frequentists argue that there is an objectively correct model of your mind we should always use, and bayesians argue that it depends on our prior knowledge about your mind
I really liked the approach my stat mech teacher used. In nearly all situations, entropy just ends up being the log of the number of ways a system can be arranged (https://en.wikipedia.org/wiki/Boltzmann%27s_entropy_formula) although I found it easiest to think in terms of pairs of dice rolls.
And this is what I prefer too, although with the clarification that its the number of ways that a system can be arranged without changing its macroscopic properties.
Its, unfortunately, not very compatible with Shannon's usage in any but the shallowest sense, which is why it stays firmly in the land of physics.
> not very compatible with Shannon's usage in any but the shallowest sense
The connection is not so shallow, there are entire books based on it.
“The concept of information, intimately connected with that of probability, gives indeed insight on questions of statistical mechanics such as the meaning of irreversibility. This concept was introduced in statistical physics by Brillouin (1956) and Jaynes (1957) soon after its discovery by Shannon in 1948 (Shannon and Weaver, 1949). An immense literature has since then been published, ranging from research articles to textbooks. The variety of topics that belong to this field of science makes it impossible to give here a bibliography, and special searches are necessary for deepening the understanding of one or another aspect. For tutorial introductions, somewhat more detailed than the present one, see R. Balian (1991-92; 2004).”
https://arxiv.org/pdf/cond-mat/0501322
I don't dispute that the math is compatible. The problem is the interpretation thereof. When I say "shallowest", I mean the implications of each are very different.
Insofar as I'm aware, there is no information-theoretic equivalent to the 2nd or 3rd laws of thermodynamics, so the intuition a student works up from physics about how and why entropy matters just doesn't transfer. Likewise, even if an information science student is well versed in the concept of configuration entropy, that's 15 minutes of one lecture in statistical thermodynamics. There's still the rest of the course to consider.
Assuming each of the N microstates for a given macrostate are equally possible with probability p=1/N, the Shannon Entropy is -Σp.log(p) = -N.p.log(p)=-1.log(1/N)=log(N), which is the physics interpretation.
In the continuous version, you would get log(V) where V is the volume in phase space occupied by the microstates for a given macrostate.
Liouville's theorem that the volume is conserved in phase space implies that any macroscopic process can only move all the microstates from a macrostate A into a macrostate B only if the volume of B is bigger than the volume of A. This implies that the entropy of B should be bigger than the entropy of A which is the Second Law.
The second law of thermodynamics is time-asymmetric, but the fundamental physical laws are time-symmetric, so from them you can only predict that the entropy of B should be bigger than the entropy of A irrespective of whether B is in the future or the past of A. You need the additional assumption (Past Hypothesis) that the universe started in a low entropy state in order to get the second law of thermodynamics.
> If our goal is to predict the future, it suffices to choose a distribution that is uniform in the Liouville measure given to us by classical mechanics (or its quantum analogue). If we want to reconstruct the past, in contrast, we need to conditionalize over trajectories that also started in a low-entropy past state — that the “Past Hypothesis” that is required to get stat mech off the ground in a world governed by time-symmetric fundamental laws.
https://www.preposterousuniverse.com/blog/2013/07/09/cosmolo...
The "can be arranged" is the tricky part. E.g. you might know from context that some states are impossible (where the probability distribution is zero), even though they combinatorially exist. That changes the entropy to you.
That is why information and entropy are different things. Entropy is what you know you do not know. That knowledge of the magnitude of the unknown is what is being quantified.
Also, the point where I think the article is wrong (or not concise enough) as it would include the unknown unknowns, which are not entropy IMO:
> I claim it’s the amount of information we don’t know about a situation
Exactly. If you want to reuse the term "entropy" in information theory, then fine. Just stop trying to make a physical analogy. It's not rigorous.
I spend time just staring at the graph on this page.
https://en.wikipedia.org/wiki/Thermodynamic_beta
Also known as "the number of bits to describe a system". For example, 2^N equally probable states, N bits to describe each state.
For information theory, I've always thought of entropy as follows:
"If you had a really smart compression algorithm, how many bits would it take to accurately represent this file?"
i.e., Highly repetitive inputs compress well because they don't have much entropy per bit. Modern compression algorithms are good enough on most data to be used as a reasonable approximation for the true entropy.
The essence of entropy as a measure of information content
I've always favored this down-to-earth characterization of the entropy of a discrete probability distribution. (I'm a big fan of John Baez's writing, but I was surprised glancing through the PDF to find that he doesn't seem to mention this viewpoint.)
Think of the distribution as a histogram over some bins. Then, the entropy is a measurement of, if I throw many many balls at random into those bins, the probability that the distribution of balls over bins ends up looking like that histogram. What you usually expect to see is a uniform distribution of balls over bins, so the entropy measures the probability of other rare events (in the language of probability theory, "large deviations" from that typical behavior).
More specifically, if P = (P1, ..., Pk) is some distribution, then the probability that throwing N balls (for N very large) gives a histogram looking like P is about 2^(-N * [log(k) - H(P)]), where H(P) is the entropy. When P is the uniform distribution, then H(P) = log(k), the exponent is zero, and the estimate is 1, which says that by far the most likely histogram is the uniform one. That is the largest possible entropy, so any other histogram has probability 2^(-c*N) of appearing for some c > 0, i.e., is very unlikely and exponentially moreso the more balls we throw, but the entropy measures just how much. "Less uniform" distributions are less likely, so the entropy also measures a certain notion of uniformity. In large deviations theory this specific claim is called "Sanov's theorem" and the role the entropy plays is that of a "rate function."
The counting interpretation of entropy that some people are talking about is related, at least at a high level, because the probability in Sanov's theorem is the number of outcomes that "look like P" divided by the total number, so the numerator there is indeed counting the number of configurations (in this case of balls and bins) having a particular property (in this case looking like P).
There are lots of equivalent definitions and they have different virtues, generalizations, etc, but I find this one especially helpful for dispelling the air of mystery around entropy.
Hey did you want to say relative entropy ~ rate function ~ KL divergence. Might be more familiar to ML enthusiasts here, get them to be curious about Sanov or large deviations.
That's right, here log(k) - H(p) is really the relative entropy (or KL divergence) between p and the uniform distribution, and all the same stuff is true for a different "reference distribution" of the probabilities of balls landing in each bin.
For discrete distributions the "absolute entropy" (just sum of -p log(p) as it shows up in Shannon entropy or statistical mechanics) is in this way really a special case of relative entropy. For continuous distributions, say over real numbers, the analogous quantity (integral of -p log(p)) isn't a relative entropy since there's no "uniform distribution over all real numbers". This still plays an important role in various situations and calculations...but, at least to my mind, it's a formally similar but conceptually separate object.
PBS Spacetime‘s entropy playlist: https://youtube.com/playlist?list=PLsPUh22kYmNCzNFNDwxIug8q1...
A bit off-color but classic: https://www.youtube.com/watch?v=wgltMtf1JhY
Ah JCB, how I love your writing, you are always so very generous.
Your This Week's Finds were a hugely enjoyable part of my undergraduate education and beyond.
Thank you again.
Information entropy is literally the strict lower bound on how efficiently information can be communicated (expected number of transmitted bits) if the probability distribution which generates this information is known, that's it. Even in contexts such as calculating the information entropy of a bit string, or the English language, you're just taking this data and constructing some empirical probability distribution from it using the relative frequencies of zeros and ones or letters or n-grams or whatever, and then calculating the entropy of that distribution.
I can't say I'm overly fond of Baez's definition, but far be it from me to question someone of his stature.
"I have largely avoided the second law of thermodynamics, which says that entropy always increases. While fascinating, this is so problematic that a good explanation would require another book!"
For those interested I am currently reading "Entropy Demystified" by Arieh Ben-Naim which tackles this side of things from much the same direction.
I sometimes ponder where new entropy/randomness is coming from, like if we take the earliest state of universe as an infinitely dense point particle which expanded. So there must be some randomness or say variety which led it to expand in a non uniform way which led to the dominance of matter over anti-matter, or creation of galaxies, clusters etc. If we take an isolated system in which certain static particles are present, will there be the case that a small subset of the particles will get motion and this introduce entropy? Can entropy be induced automatically, atleast on a quantum level? If anyone can help me explain that it will be very helpful and thus can help explain origin of universe in a better way.
Symmetry breaking is the general phenomenon that underlies most of that.
The classic example is this:
Imagine you have a perfectly symmetrical sombrero[1], and there's a ball balanced on top of the middle of the hat. There's no preferred direction it should fall in, but it's _unstable_. Any perturbation will make it roll down hill and come to rest in a stable configuration on the brim of the hat. The symmetry of the original configuration is now broken, but it's stable.
1: https://m.media-amazon.com/images/I/61M0LFKjI9L.__AC_SX300_S...
Yes but what will intimate that perturbation?
I saw this video, which explained it for me (it's german, maybe the automatic subtitles will work for you): https://www.youtube.com/watch?v=hrJViSH6Klo
He argues that the randomness you are looking for comes from quantum fluctuations, and if this randomness did not exist, the universe would probably never have "happened".
Thanks for the reference will take some time before I see the whole video. Can you tell me what those quantum fluctuations are in short? Are they part of some physical law?
My goto source for understanding entropy: http://philsci-archive.pitt.edu/8592/1/EntropyPaperFinal.pdf
Am I only one that can't download the pdf, or is the file server down? I can see the blog page but when I try downloading the ebook it just doesn't work..
If the file server is down.. anyone could upload the ebook for download?
Hmmm that list of things that contribute to entropy I've noticed omits particles which under "normal circumstances" on earth exist in bound states, for example it doesn't mentions W bosons or gluons. But in some parts of the universe they're not bound but in different state of matter, e.g. quark gluon plasma. I wonder how or if this was taken I to account.
I like the formulation of 'the amount of information we don't know about a system that we could in theory learn'. I'm surprised there's no mention of the Copenhagen interpretation's interaction with this definition, under a lot of QM theories 'unavailable information' is different from available information.
The book might disappoint some..
>I have largely avoided the second law of thermodynamics ... Thus, the aspects of entropy most beloved by physics popularizers will not be found here.
But personally, this bit is the most exciting to me.
>I have tried to say as little as possible about quantum mechanics, to keep the physics prerequisites low. However, Planck’s constant shows up in the formulas for the entropy of the three classical systems mentioned above. The reason for this is fascinating: Planck’s constant provides a unit of volume in position-momentum space, which is necessary to define the entropy of these systems. Thus, we need a tiny bit of quantum mechanics to get a good approximate formula for the entropy of hydrogen, even if we are trying our best to treat this gas classically.
There's fundamental nature of entropy, but as usual it's not very enlightening for poor monkey brain, so to explain you need to enumerate all its high level behavior, but its high level behavior is accidental and can't be summarized in a concise form.
This complexity underscores the richness of the concept
I'd say it underscores its accidental nature.
My definition: Entropy is a measure of the accumulation of non-reversible energy transfers.
Side note: All reversible energy transfers involve an increase in potential energy. All non-reversible energy transfers involve a decrease in potential energy.
That definition doesn't work well because you can have changes in entropy even if no energy is transferred, e.g. by exchanging some other conserved quantity.
The side note is wrong in letter and spirit; turning potential energy into heat is one way for something to be irreversible, but neither of those statements is true.
For example, consider an iron ball being thrown sideways. It hits a pile of sand and stops. The iron ball is not affected structurally, but its kinetic energy is transferred (almost entirely) to heat energy. If the ball is thrown slightly upwards, potential energy increases but the process is still irreversible.
Also, the changes of potential energy in corresponding parts of two Carnot cycles are directionally the same, even if one is ideal (reversible) and one is not (irreversible).
However, while your definition effectively captures a significant aspect of entropy, it might be somewhat limited in scope
Closely related recent discussion on The Second Law of Thermodynamics (2011) (franklambert.net):
https://news.ycombinator.com/item?id=40972589
After years of thought I dare to say the 2nd TL is a tautology. Entropy is increasing means every system tends to higher probability means the most probable is the most probable.
I think that’s right, though it’s non-obvious that more probable systems are disordered. At least as non-obvious as Pascal’s triangle is.
Which is to say, worth saying from a first principles POV, but not all that startling.
Closely related recent discussion: https://news.ycombinator.com/item?id=40972589
If I would write a book with that title then I would get to the point a bit faster, probably as follows.
Entropy is just a number you can associate with a probability distribution. If the distribution is discrete, so you have a set p_i, i = 1..n, which are each positive and sum to 1, then the definition is:
S = - sum_i p_i log( p_i )
Mathematically we say that entropy is a real-valued function on the space of probability distributions. (Elementary exercises: show that S >= 0 and it is maximized on the uniform distribution.)
That is it. I think there is little need for all the mystery.
So the only thing you need to know about entropy is that it's a real-valued number you can associate with a probability distribution? And that's it? I disagree. There are several numbers that can be associated with probability distribution, and entropy is an especially useful one, but to understand why entropy is useful, or why you'd use that function instead of a different one, you'd need to know a few more things than just what you've written here.
In particular, the expectation (or variance) of a real-valued random variable can also be seen as "a real-valued number you can associate with a probability distribution".
Thus, GP's statement is basically: "entropy is like expectation, but different".
Exactly, saying that's all there is to know about entropy is like saying all you need to know about chess are the rules and all you need to know about programming is the syntax/semantics.
Knowing the plain definition or the rules is nothing but a superficial understanding of the subject. Knowing how to use the rules to actually do something meaningful, having a strategy, that's where meaningful knowledge lies.
Of course that is not my statement. See all my other replies to identical misinterpretations of my comment.
The problem is that this doesn't get at many of the intuitive properties of entropy.
A different explanation (based on macro- and micro-states) makes it intuitively obvious why entropy is non-decreasing with time or, with a little more depth, what entropy has to do with temperature.
That doesn't strike me as a problem. Definitions are often highly abstract and counterintuitive, with much study required to understand at an intuitive level what motivates them. Rigour and intuition are often competing concerns, and I think definitions should favour the former. The definition of compactness in topology, or indeed just the definition of a topological space, are examples of this - at face value, they're bizarre. You have to muck around a fair bit to understand why they cut so brilliantly to the heart of the thing.
The above evidently only suffices as a definition, not as an entire course. My point was just that I don't think any other introduction beats this one, especially for a book with the given title.
In particular it has always been my starting point whenever I introduce (the entropy of) macro- and micro-states in my statistical physics course.
That definition is on page 18, I agree it could've been reached a bit faster but a lot of the preceding material is motivation, puzzles, and examples.
This definition isn't the end goal, the physics things are.
That covers one and a half of the twelve points he discusses.
Correct! And it took me just one paragraph, not the 18 pages of meandering (and I think confusing) text that it takes the author of the pdf to introduce the same idea.
You didn’t introduce any idea. You said it’s “just a number” and wrote down a formula without any explanation or justification.
I concede that it was much shorter though. Well done!
Thanks for defining it rigorously. I think people are getting offended on John Baez's behalf because his book obviously covers a lot more - like why does this particular number seem to be so useful in so many different contexts? How could you have motivated it a priori? Etcetera, although I suspect you know all this already.
But I think you're right that a clear focus on the maths is useful for dispelling misconceptions about entropy.
Misconceptions about entropy are misconceptions about physics. You can’t dispell them focusing on the maths and ignoring the physics entirely - especially if you just write an equation without any conceptual discussion, not even mathematical.
I didn't say to only focus on the mathematics. Obviously wherever you apply the concept (and it's applied to much more than physics) there will be other sources of confusion. But just knowing that entropy is a property of a distribution, not a state, already helps clarify your thinking.
For instance, you know that the question "what is the entropy of a broken egg?" is actually meaningless, because you haven't specified a distribution (or a set of micro/macro states in the stat mech formulation).
Many students will want to know where the minus sign comes from. I like to write the formula instead as S = sum_i p_i log( 1 / p_i ), where (1 / p_i) is the "surprise" (i.e., expected number of trials before first success) associated with a given outcome (or symbol), and we average it over all outcomes (i.e., weight it by the probability of the outcome). We take the log of the "surprise" because entropy is an extensive quantity, so we want it to be additive.
Everyone who sees that formula can immediately see that it leads to principle of maximum entropy.
Just like everyone seeing Maxwell's equations can immediately see that you can derive the the speed of light classically.
Oh dear. The joy of explaining the little you know.
As of this moment there are six other top-level comments which each try to define entropy, and frankly they are all wrong, circular, or incomplete. Clearly the very definition of entropy is confusing, and the definition is what my comment provides.
I never said that all the other properties of entropy are now immediately visible. Instead I think it is the only universal starting point of any reasonable discussion or course on the subject.
And lastly I am frankly getting discouraged by all the dismissive responses. So this will be my last comment for the day, and I will leave you in the careful hands of, say, the six other people who are obviously so extremely knowledgeable about this topic. /s
The definition by itself without intuition of application is of little use
Don’t forget it’s the only measure of the arrow of time.
One could also say that it’s just a consequence of the passage of time (as in getting away from a boundary condition). The decay of radioactive atoms is also a measure of the arrow of time - of course we can say that’s the same thing.
CP violation may (or may not) be more relevant regarding the arrow of time.
[flagged]
Please don't post comments just to be a dick.
[flagged]
The way I understand it is with an analogy to probability. To me, events are to microscopic states like random variable is to entropy.
My first contact with entropy was in chemistry and thermodynamics and I didn't get it. Actually I didn't get anything from engineering thermodynamics books such as Çengel and so.
You must go to statistical mechanics or information theory to understand entropy. Or trying these PRICELESS NOTES from Prof. Suo: https://docs.google.com/document/d/1UMwpoDRZLlawWlL2Dz6YEomy...
This seems like a great resource for referencing the various definitions. I've tried my hand at developing an intuitive understanding: https://spacechimplives.substack.com/p/observers-and-entropy. TLDR - it's an artifact of the model we're using. In the thermodynamic definition, the energy accounted for in the terms of our model is information. The energy that's not is entropic energy. Hence why it's not "useable" energy, and the process isn't reversible.
Hawking on the subject
https://youtu.be/wgltMtf1JhY
How do you get to the actual book / tweets? The link just takes me back to the forward...
http://math.ucr.edu/home/baez/what_is_entropy.pdf
MC Hawking already explained this
https://youtu.be/wgltMtf1JhY
ΔS = ΔQ/T
[flagged]
Entropy is the distribution of potential over negative potential.
This could be said "the distribution of what ever may be over the surface area of where it may be."
This is erroneously taught in conventional information theory as "the number of configurations in a system" or the available information that has yet to be retrieved. Entropy includes the unforseen, and out of scope.
Entropy is merely the predisposition to flow from high to low pressure (potential). That is it. Information is a form of potential.
Philosophically what are entropy's guarantees?
- That there will always be a super-scope, which may interfere in ways unanticipated;
- everything decays the only mystery is when and how.
This answer is as confident as it's wrong and full of gibberish.
Entropy is not a "distribution”, it's a functional that maps a probability distribution to a scalar value, i.e. a single number.
It's the mean log-probability of a distribution.
It's an elementary statistical concept, independent of physical concepts like “pressure”, “potential”, and so on.
It sounds like log-probability is the manifold surface area.
Distribution of potential over negative potential. Negative potential is the "surface area", and available potential distributes itself "geometrically". All this is iterative obviously, some periodicity set by universal speed limit.
It really doesn't sound like you disagree with me.
Baez seems to use the definition you call erroneous: "It’s easy to wax poetic about entropy, but what is it? I claim it’s the amount of information we don’t know about a situation, which in principle we could learn."
> Entropy includes the unforseen, and out of scope.
Mmh, no it doesn't. You need to define your state space, otherwise it's an undefined quantity.
But it is possible to account for the unforseen (or out-of-vocabulary) by, for example, a Good-Turing estimate. This satisfies your demand for a fully defined state space while also being consistent with GP's definition.
You are referring to the conceptual device you believe bongs to you and your equations. Entropy creates attraction and repulsion, even causing working bias. We rely upon it for our system functions.
Undefined is uncertainty is entropic.
Entropy is a measure, it doesn't create anything. This is highly misleading.
> bongs
indeed
All definitions of entropy stem from one central, universal definition: Entropy is the amount of energy unable to be used for useful work. Or better put grammatically: entropy describes the effect that not all energy consumed can be used for work.
There's a good case to be made that the information-theoretic definition of entropy is the most fundamental one, and the version that shows up in physics is just that concept as applied to physics.
My favorite course I took as part of my physics degree was statistical mechanics. It leaned way closer to information theory than I would have expected going in, but in retrospect should have been obvious.
Unrelated: my favorite bit from any physics book is probably still the introduction of the first chapter of "States of Matter" by David Goodstein: "Ludwig Boltzmann, who spent much of his life studying statistical mechanics, died in 1906, by his own hand. Paul Ehrenfest, carrying on the work, died similarly in 1933. Now it is our turn to study statistical mechanics."
That would mean that information-theory is not part of physics, right? So, Information Theory and Entropy, are part of metaphysics?
Yeah, people seemingly misunderstand that the entropy applied to thermodynamics is simply an aggregate statistic that summarizes the complex state of the thermodynamic system as a single real number.
The fact that entropy always rises etc, has nothing to do with the statistical concept of entropy itself. It simply is an easier way to express the physics concept that individual atoms spread out their kinetic energy across a large volume.
This definition is far from universal.
I think what you describe is the application of entropy in the thermodynamic setting, which doesn't apply to "all definitions".