Sets, types and type checking

117 points | by kaleidawave 5 days ago

35 comments

haileys 4 days ago
> In Rust we have Option<T>, which is equivalent to T | null
No, not true!
As the author correctly states earlier in the post, unions are not an exclusive-or relation. Unions are often made between disjoint types, but not always.
This becomes important when T itself is nullable. Let's say T is `U | null`. `Option<Option<U>>` in Rust has one more inhabitant than `U | null | null` in TypeScript - `Some(None)`.
Union types can certainly be very useful, but they are tricky because they don't compose like sum types do. When writing `Option<T>` where T is a generic type, we can treat T as a totally opaque type - T could be anything and the code we write will still be correct. On the other hand, union types pierce this opacity. The meaning of `T | null` and the meaning of code you write with such a type does depend on what T is at runtime.
[-]
- kaba0 3 days ago
  Rich Hickey has a presentation titled 'Maybe Not' that talks about this exact distinction, but he actually argues the reverse (and often criticized quite wildly, even though both sides are sort of right here, I believe). He says that nullability is better [1], as refactoring a function that accepted T to T?, or a function's return type from T? to T are both backwards compatible, while the sum type variants require code change.
  Your last sentence put the whole argument even better in place in my head, it depending on the runtime is both a blessing and a curse and is what let's us change function signatures in a backwards compatible way, while also what hinders our ability to reason/encode stuff in it statically.
  [1] I think part of the misunderstanding here between the "two camps" is that some people work on systems that are a closed world. You know and control everything, so an all-encompassing type system that can evolve with the universe itself makes the most sense. Hickey on the other hand worked/works mostly in an area where big systems developed by completely different entities have to communicate with each other with little to no coordination. You can't willy nilly refactor your code and just trust the compiler to do its job, this is the open sea. Also, I think this area is a bit neglected by more modern/hyped languages, e.g. the dynamicism that the JVM has is not really reproduced anywhere anymore?
  [-]
  - James_K 2 days ago
    > as refactoring a function that accepted T to T?, or a function's return type from T? to T
    I've always found this to be a silly argument. In both cases, the problem can be solved by interface versioning. If you have f : T → Maybe T and you should like to change the type, just create f_v2 : Maybe T → T assign f x = just (f_v2 (just x)). The real shame is that most languages do not support interface versioning as a proper feature. There should really be some way to use a package and declare that you want to use v1/2 of the interface, then subsequently package.f would either be f or f_v2. You would eventually have to change the code that deals with f if you want to stay up to date, but realistically you should remove redundant error checking code anyway so you aren't much better off.
    You could also frame it as a tooling problem. Given that the transformation between types is trivial, you should be able to write a program that automatically updates code to newer versions. It seems like this is a relatively niche issue in either direction (union types not being disjoint, vs having to change code that deals with options), but its a more solvable problem in the second case.
    [-]
    - kaba0 2 days ago
      Do you know of a language that supports interface versioning? I often wished for one (as I like writing programs by "iterative refining", where I basically copy the good parts into a completely new project), but have never heard of a language that does something like that.
      [-]
      - James_K 2 days ago
        Sadly, I do not. You can emulate it pretty well with just putting numbers on the end of function names though. I'm always a little shocked at how the people who make programming languages seem to consistently ignore some of the most obvious features in favour of weird stuff.
- DecoPerson 2 days ago
  Sharing my TypeScript Result type for anyone who’s crazy like me and wishes they had “if let” and doesn’t want “.value” everywhere.
  You can do this:
```
    const user = await fetchUserDTO(123);
    // user is Result<UserDTO, Error>
    if(!isOk(user)) { something(user.err); return; }
    user.name; // user is narrowed to UserDTO
```
  Source:
```
    export const FailSym: unique symbol = (globalThis as any).____FailSym ?? Symbol('FailSym');
    (globalThis as any).____FailSym = FailSym;
    export type Result<T, E> = T | {
        [FailSym]: true,
        err: E
    }
    
    export function Ok<T>(val: T): Result<T, never> { return val; }
    export function Fail<E>(err: E): Result<never, E> { return {[FailSym]: true, err}; }
    
    export function isOk<T, E>(val: Result<T, E>): val is T { return !(typeof val == 'object' && val != null && FailSym in val); }
    export function isFail<T, E>(val: Result<T, E>): val is Result<never, E> { return typeof val == 'object' && val != null && FailSym in val; }
    
    export function resultify<T, E>(promise: Promise<T>): Promise<Result<T, E>> {
        return promise.then(Ok, err => Fail(err as any));
    }
```
- James_K 2 days ago
  > The meaning of `T | null` and the meaning of code you write with such a type does depend on what T is at runtime.
  Could you give an example of this? The only case I can imagine is a union null <=: T, T | null. This would cause unexpected behaviour, i.e. a if T = (U | null), the function orElse : T | null → T → T would always return the second argument if the first is null, but this is perfectly consistent with the fact that T itself is a union type. The idea that it should do anything different is presuming that the null assigned to T is different from the null assigned to T | null (that union types are disjoint). But this is an invalid assumption. It's still a validly typed program that obeys a consistent set of rules. It's just that there is no tagging in the union type (you don't know if a member comes from the left or right side). This is only an issue if you write code expecting the union to work like a tagged union.
- randomdata 4 days ago
  Yes, technically it is closer to (T | null) & {__tag: K}, but the context where "equivalent" is used is clearly about practical usage. Option<T> is most similar to T | null in code people actually write on a normal basis.
- marcosdumay 4 days ago
  > but they are tricky because they don't compose like sum types do
  Just to point that this part is very literally true. They compose perfectly well, but they don't compose on the same way that tagged unions do. Tagged unions compose by function abstraction, untagged ones compose by function dispatching.
  Maybe one can argue that untagged unions compose in less useful ways. But I've never seen this argument made.
nikeee 4 days ago
An honorable mention is the string template literal type. It is between the string literal type (-unions); which allow a finite set of strings and the string type which, in theory, is an infinite set of strings. Template literal types can be infinite as well, but only represent a fraction. For example `foo${string}` represent all strings that start with "foo".
Similar to this, I proposed inequality types for TS. They allow constraining a number range. For example, it is possible to have the type "a number that is larger than 1". You can combine them using intersection types, forming intervals like "a number between 0 and 1". Because TS has type narrowing via control flow, these types automatically come forward if you do an "if (a<5)". The variable will have the type "(<5)" inside the if block.
You can find the proposal here [1]. Personally I think the effort for adding these isn't worth it. But maybe someone likes it or takes it further.
[1]: https://github.com/microsoft/TypeScript/issues/43505
[-]
- thechao 4 days ago
  We had something like this in Spad — the extension language for the computer algebra system Axiom. They're not terrible to implement; but, utilization was always low. There's amusingly high effort optimizations with range-dependent integral types deduced from flow control around blocks that are tail calls. I mean ... theoretically, yes, we can; but should we?
- klysm 4 days ago
  I have no idea what I’m talking about, but it seems like these restricted cases of inequality flow control type checking are very similar in power to dependent types, but don’t require the same level of complexity. It’s nice being able to write imperative proofs of correctness guided by the compiler.
o11c 4 days ago
`never` is better known as `bottom`. `noreturn` in some languages is the same thing
`any`, however, is not `top`, it is `break_the_type_system`. The top type in TS is `unknown`.
[-]
- teaearlgraycold 4 days ago
  Adding the unknown type was such a big deal. I love it.
- Nevermark 4 days ago
  When someone gives you a truly completely unconstrained object, what they hand you is “unknown”.
  You don’t even know how to query it to find anything out about it.
  But you could pass it to someone else.
  When someone asks you for a completely unconstrained object, the type is “any”.
  It’s technically the same type from two perspectives.
  (Not saying this extreme version of the concepts are how they are implemented. Never had a chance to use such types before.)
  [-]
  - tubthumper8 4 days ago
    I don't think this is right, for two reasons:
    1. As a nit-pick "unconstrained object" is not best modeled by `unknown` because that includes non-objects as well, there's better types to use for that
    2. Someone asking you for any unconstrained data would also be `unknown`
    `any` is not a type at all, it is an annotation to disable the type system
    [-]
    - matt_kantor 4 days ago
      I suspect they were talking more about general terminology than TypeScript's specific usage of `unknown` and `any`.
      "This box contains an unknown item" and "I'll accept any item" both sound natural, while "this box contains any item" and "I'll accept an unknown item" both sound weird (to me, anyway).
      [-]
      - tubthumper8 3 days ago
        Hmm, I'm not sure about that interpretation, they said "It’s technically the same type from two perspectives."
        It's not the same type at all. I do agree with your example of the usage of those words in spoken English, but I don't think that is what we're going for in a discussion in Sets, Types, and Type Checking
        [-]
        Nevermark 3 days ago
        If we step away from any particular language:
        If I give you a reference, address, or some other identifier, for something you know nothing else about, you can't do anything with the reference but pass it on. You have no information about the thing itself. You don't even know how to dereference the reference.
        Whatever it is that I gave you a reference for, is unknown to you.
        Now if I say, go ahead and give me some thing, with no constraints, you can give me any thing.
        Which without further information, will just be a reference to an unknown to me.
        Mathematically, "unknown" and "any" both represent something without any constraints or information known about them, but in opposite roles.
        But one is when you receive a reference to something you know nothing else about, the other is when you provide something without any requirement to give any information about it.
        The two roles necessarily happen simultaneously in that exchange.
        --
        I expect in most languages that "unknown" and "any" gets used, this gets watered down.
        Most languages provide some kind of baseline information with the "things" in it. For instance, objects whose type can be always be queried via reflection, or some other baseline accessible information.
        Also, languages use words differently. So maybe they are not consistent with the unknown/any symmetry I described at all.
throwaway17_17 4 days ago
I don’t think it is appropriate to say Rust has ‘union types’. Rust has sum types, implemented as Enums and (unsafe) Union types. There is a distinct difference between sum types and union types from a type theoretic perspective.
[-]
- jasdfasd 4 days ago
  disjoint union vs union.
  Scala3 is the only programming language to implement both AFAIK.
  C# has a proposal to add both unions and disjoint unions: https://github.com/dotnet/csharplang/blob/main/proposals/Typ...
  OCaml has polymorphic variants which are open disjoint unions.
  Kotlin is looking to add union types for errors: https://youtrack.jetbrains.com/issue/KT-68296/Union-Types-fo...
  I believe Java's checked exceptions behave somewhat like union types.
  [-]
  - om2 4 days ago
    What’s the difference between a union type and a disjoint union type? In that C# proposal I couldn’t tell which syntax was which branch of your dichotomy.
    [-]
    - noelwelsh 4 days ago
      disjoint union is sum type / enum / algebraic data type. Defined at the point of declaration. Each case is distinct (hence, disjoint)
      union is what Typescript has. Defined at the point of use. Cases need not be distinct.
  - 4 days ago
    [deleted]
- randomdata 4 days ago
  > Rust has sum types, implemented as Enums
  Do you mean implemented with enums? Enums themselves are not a type. They are a mechanism for value generation, providing automatic numbering (hence enumeration) for constants. Indeed, they, like all values, are ultimately represented by a type, but that type can range from something like a simple integer or something more complex like a tagged union (typically with the generated value being the tag) with different ecosystems favouring different type approaches.
  [-]
  - tubthumper8 4 days ago
    I think they just mean that sum types are defined by the programmer using the `enum` keyword
    [-]
    - randomdata 4 days ago
      Like how subroutines are implemented as Functions.
      And by Functions you don't mean functions, but rather the letters fn?
      That is certainly an interesting way to communicate.
- kaba0 3 days ago
  Off topic, but rust really messed up the terminology by using 'enums' for sum types.
2 days ago
[deleted]
James_K 2 days ago
I think this post should have made a greater distinction between terms and types. The type 4 and the term 4 are separate entities with the same name. Otherwise string <=: string would imply that a function f : string => int could be called as f(string) on the type of strings (which is actually how they handle it Perl6/Raku). Overall it seemed very TS-centric and not much on type theory.
skybrian 4 days ago
> Like sets, types can be by description have an infinite number of distinct entries
I think they might have meant "entities" instead of "entries?"
The term "diagonal identity" seems to be non-standard as well?
[-]
- dec0dedab0de 4 days ago
  I usually say items or members, but entries basically means the same thing, and js set objects have an entries method, so there is precedence.
disqard 3 days ago
Could someone here shed some light on "Why the intersection with any is always any"?
I didn't feel satisfied with the explanation in TFA.
Mathnerd314 4 days ago
> Free variables and closures
Are these even types? I always mentally filed closures under "implementation detail of nested functions".
ingen0s 4 days ago
Finally, something useful to read