Perspectives on Floating Point

54 points | by vimarsh6739 6 days ago

19 comments

yxhuvud 3 days ago
One thing I think would be nice for floating point numbers, is that I'd prefer if there were two separate types - one where NaN and the two infinites are allowed, and one where they are not allowed but instead emit an error. The former would be used by some few mathematicians etc, and the rest of us could use the latter. The upside would be better error handling close to the source of the issue, and better optimizations as the not-normal values throw a wrench into optimizing math.
[-]
- swatcoder 3 days ago
  On the hardware level, your processor will have some specific behavior that it exhibits. This is sometimes configurable, in platform-specific way, and people working in low-level, high-performance numerical code already take advantage of that when it is.
  If you more mean the language level, and just having two different ways to represent floating point numbers in your code, it gets a little tricky to reason about because one of those may align with what the processor is doing natively and others won't, and so you'd have a software layer converting the behavior of the platform to what your language promises -- with very high overhead sometimes and almost no overhead other times. That kind of inconsistency isn't needed very often. It can become a real headache for testing and in the wild. It's easier for a language to just say "this is the way we do floats" and you adapt as needed.
  So the more typical balance is to deal with that kind of thing at the library level. If you want numbers that behave a certain way, and it's not the way your language models them, you use a library that gives you the kind of numbers you want with the tangible awareness that they're likely "soft" and less efficient in some way.
- exmadscientist 3 days ago
  So, kind of, you want every NaN to be a sNan? Along those lines, this SO answer is really interesting: https://stackoverflow.com/a/55648118
- otabdeveloper4 3 days ago
  "Mathematicians" don't need NaN's and Inf's.
  These things come straight from overflow and underflow behaviour of floating point bit representations. I'd guess some sort of IEEE flag exists to emit an interrupt if over/underflow happens, but nobody really wants those.
  [-]
  - SideQuark 2 days ago
    They don’t come solely from over/underflow. The do come from math. Sort of neg, 1/0, and plenty of other mathematical operations result in these, no matter how you want to store numbers.
- hollerith 3 days ago
  >better optimizations as the not-normal values throw a wrench into optimizing math.
  I'm not an expert on architecture, but I would've guessed that the need to branch to process the error would be the wrench and that the use of NaN and the infinities allow better optimizations.
  [-]
  - exmadscientist 3 days ago
    IEEE-754 NaN and Infinity have nothing to do with optimization. They come straight from math:
    * What is +1/0? It has to be +Infinity -- nothing else will do.
    * What is -1/0? It has to be -Infinity -- nothing else will do.
    * What is 0/0? There's no way to tell from the information we've got -- it's undefined: Not A Number. (However, should 0/0 come up as a result of taking the quotient of two functions that happen to both reach zero at a point, then sometimes the limit of that quotient is meaningful, and might have a numerical result.)
    IEEE-754 chose to signal these things in-band, so we get NaN and Infinity to deal with in our floats and doubles.
    [-]
    - fweimer 2 days ago
      I'm pretty sure IEEE 754 covers trapping floating point math. It's not just about in-band signaling. It's unfortunate that the standard is proprietary, so we can't easily reference it to figure out what it says.
      Anyway, most CPUs support a trapping mode, after all. Here's an example with glibc:
      #define _GNU_SOURCE #include <fenv.h> volatile double x = 1.0; volatile double y; volatile double quotient; int main(void) { feenableexcept(FE_DIVBYZERO); quotient = x / y; }
      As far as I understand it, overall support for trapping math is poor because not much code is trapping-aware. It would be quite annoying if JSON parsing results in SIGFPE due to an Inexect trap, for example.
    - hollerith 2 days ago
      NaN and Infinity have nothing to do with optimization only if your mental model of CPUs is extremely simplistic.
  - yxhuvud 2 days ago
    It is not about hardware, it is about some of the optimizations you can enable when assuming they can't happen. There are some mathematical identities that only work if you can assume they can't happen.
- dahart 2 days ago
  What better error handling can one do without reserved special numbers?
  Many (most?) compilers already offer a “fast math” flag that foregoes denormals. You don’t need to remove denormals from the format in order to avoid them.
  Having a separate type without specials would probably cause unending confusion and increase bugs. It’s already the case that only people who care about the specials handle them, but it sounds like there are more of those people than you think, if you think only mathematicians care about them.
- jeezfrk 2 days ago
  It would be quite useful to protect other code from that.
  It could easily be a compiler-based protection: any opcode that can generate it from two/one normal input must be checked for normal results.
  This is similar to how C++ gives few guarantees but "new" and "this" never give null or unallocated values.
andrepd 2 days ago
Posits https://posithub.org/docs/Posits4.pdf are an excellent perspective for an alternative to IEEE floats.
kolbusa 2 days ago
Not sure why the article does not reference the following paper which is a must read for anyone working with floating point: https://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.h... (original: https://www.itu.dk/~sestoft/bachelor/IEEE754_article.pdf).
[-]
- dunham 2 days ago
  I also recently saw this paper on the difficulty of solving the quadratic equation with floating point numbers:
  https://cnrs.hal.science/hal-04116310/document
  And also Gerald Sussman saying:
  > The only thing that scares me in programming is floating point.
  https://youtu.be/Tdwr9tweTDE?t=1145
cosignal 4 days ago
Very nice graphics in this.
rhythane 4 days ago
A really interesting review. The idea of relative error makes sense in most cases, but when we need to do subtraction and difference matters, maybe absolute error is actually better.
[-]
- exmadscientist 3 days ago
  If you are doing only additive operations, then, yes, absolute error might actually be the best choice. But as soon as multiplications start to show up, they are enough trouble that they tend to dominate the whole error propagation show. Since many real calculations have multiplication in them, you end up having to optimize the whole thing for multiplicative operations, and so we end up just using relative errors everywhere.
  You can, of course, do a very specialized optimization for one particular algorithm, but that tends to not be a very good use of time. Usually. (Counterexample: Kahan summation!)
- dahart 2 days ago
  Subtraction is the very case where relative error might matter most, and error relative to magnitude of the original unsubtracted numbers causes the most surprises.
  Absolute error has useful applications, without any doubt, and regardless of arithmetic operation, but it probably doesn’t make sense to say it’s “better” without a specific problem in front of us, and without specific goals and priorities. Error tolerance is always up to the user.