I’ve come to believe that a type should be capable of reflecting any arbitrary rules a programmer knows about the bounds of a value. The only real implementations of this I see are mostly academic with a dependent type system like Idris. We’re many years from that becoming mainstream, if it ever will be.
I regret to say that every type level Gordian knot that I have ever been exposed to came from attempts to do this.
I think a lot of the problem is that many of the constraints and acceptable values for data are not determined until well after first deployment. The "knot" comes in when you are explicitly changing data and code to add a feature that you didn't anticipate at the start.
This is a lot like schemaless databases. The flexibility of not having to fully specify the full schema does not mean that you don't benefit from specifying parts of it? Indeed, if you are indexing things, you have to specify those parts. But it is hard to argue with a straight face that schemaless tools don't have some strong benefits.
>> This is a lot like schemaless databases. The flexibility of not having to fully specify the full schema does not mean that you don't benefit from specifying parts of it? Indeed, if you are indexing things, you have to specify those parts. But it is hard to argue with a straight face that schemaless tools don't have some strong benefits.
Basically when you use a hash / JavaScript-style object rather than a fixed struct definition for your data (which amounts to a schema- less database in aggregate)
Easy addition of new properties
Co-existence of different versions of objects (newer objects will typically have more properties)
Introspection of properties and validation rules / reflection
The first two properties can be handled by adding optional fields to your struct. The third can be done at compile time rather than runtime if you define a strict class hierarchy. I think really the problem is that many languages make struct definition needlessly verbose so it feels convenient to just use a map. The price of using schemaless includes runtime performance as well as the compiler being unable to warn you about easy (for a compiler) to detect and report data mismatch errors. I have spent a lot of time using lua the way I think many people use clojure. It is quite seductive and does feel very productive at first, but eventually I find that for a sufficiently complex program, I want the compiler to validate my data usage for me.
Protobufs are an interesting hybrid. They assume a common, linear history of schema versions, where all data generators had access to an arbitrary version in that history. The schemas define field ids and types, but not which combinations of fields can show up.
If you can upgrade all the data then you don’t need to worry about versioning,
Or you can do it like HTTP headers and hope for the best.
Protobufs are also a fun place to look for how much debate people will get into regarding required versus optional fields. Your post is taking an implicit "everything is optional" view. But, it does allow you to be stricter.
Common Lisp Object System also touched on all of these ideas years ago.
Sometimes I’ve wondered if versioned types might be a help in bringing coexistence to a typed setup, and if this is part of where the benefits of a service oriented architecture come from. But given how versioning snarls can play out in dependency management I expect this idea has some tradeoffs in the best case.
Not the most elegant mechanism, but it should be accessible to imagine an implementation via inheritance hierarchies even for a static manifestly typed language. Class "Person4.1" inherits from class "Person4" inherits from class "Person2" inherits from "Person". Probably there's a better way (and one could argue that a design where knowing the version of the class/object matters isn't a well-encapsulated design, which could have something to do with why duck typing often works out better than expected in OO systems).
I think my question is what would you be doing here that doesn't boil down to "has these properties/methods, I'll accept"?
I think this is a lot easier if you exclude the "methods" in my quote there. Since those almost certainly have the same general contract that would spread across things.
> This is a lot like schemaless databases. The flexibility of not having to fully specify the full schema does not mean that you don't benefit from specifying parts of it? Indeed, if you are indexing things, you have to specify those parts. But it is hard to argue with a straight face that schemaless tools don't have some strong benefits.
Let me put on my straightest face.
Schemaless tools have no benefits.*
As soon as you start _doing_ anything with your data, it has a schema. You may not have declared it to your data store, to a specific piece of code, etc, but your application logic almost definitely makes assumptions about the shape of the data--what fields are and aren't present, the types of values in those fields, and what those values mean.
Every piece of your system must either accept undefined behaviour in the face of data which doesn't fit this implicit schema (there are very few non-trivial problems where you can define a meaningful, valid output for all possible inputs), or essentially treat the schemaless parts as hostile and assert its assumptions before every action.
The only thing you've done by going "schemaless" is distributing your schema across your systems and pushed the schema validation out to each thing interacting with the schemaless components instead of centralizing it.
* Yes, literally storing raw, unstructured data. But as soon as you need to _do_ anything with the data besides regurgitate it you're back to it having some sort of schema.
So you have mainly straw-manned on the word "schemaless" to make an argument. This is akin to attacking dynamic typing by pointing out that most things can be type inferred by the code.
That is, you are not wrong in that most data stored is structured in some way; but you are ignoring a ton of the benefit of the schemaless world. As I pointed out, there are benefits to specifying schema. In parts, you /have/ to do so. Specifically if you want your DB to do any work on the data. Indexing/filtering and the like.
Now, you do hit largely on the benefits of the "schemaless" world. And that is that it is not undefined how it will store whatever schema you throw at it. You will have restrictions on keys, but otherwise it will store whatever you throw at it.
At large, this means you can skip out on a lot of the formality of specifying all schema shapes during development and can take a much more flexible approach in your code than you are forced to if the data layer can't work with you.
I empathize with this point of view - if not in any other way then in principle.
However, as a counterpoint, I'd suggest that encoding all known invariants as types may be prohibitively cumbersome and time-consuming - to the point where writing the program becomes more of an exercise in proofs rather than producing something useful. Often the invariants of a program are implied or can be easily inferred by the reader, which can actually make the program easier to understand for both the reader and the writer by leaving some things unsaid.
Of course, whether a program is easier to understand when its invariants are written down vs. when they are left to be inferred by the reader is probably a matter of circumstance, so different tools for different jobs and all that.
I've recently been writing a small program in Go - my first foray into the language - and I thoroughly enjoy how it gets out of my way. The lack of rigor is freeing if you don't stop to think about it too much.
The problem is that user inferring doesn't scale. For small projects this is reasonable but for enterprise software engineering it is easy for a constraint that isn't enforced by the type system to be missed by an engineer leading to a bug. Whereas typed constraints naturally propagate throughout the system.
But then ironically, the problem with relying on programming language type systems is that they don't scale (beyond the boundaries of a single process). As soon as you are crossing a wire you have to go back to runtime validations.
Throwing out type safety simply because you have unverified input is a little silly. It's trivial in most statically typed languages to safely turn input into a concrete type (or an error if it is invalid) at the edge and meanwhile maintain type safety throughout the rest of the code.
There's where good documentation and training pays off.
Some invariants that are too complex to express purely in code, requiring numerous separate low level bookkeeping functions throughout the system, can be concisely expressed as a high level idea in natural language (wether as comments or at an external technical document).
Make sure that your developers are exposed to those documents, throw in a few tests for the most obvious cases to ensure that newcomers are learning the invariants by heart, and if they're competent you should not have problems but in the most obscure cases (that could appear anyway even if you try to formalize everything in code).
I eagerly await your implementation of a type describing my tax return. And to implement a compiler for the new programming language you'll inevitably have to construct, we're gonna need two types, one expressing all valid source programs, and the other all valid programs for the target platform.
Oh, can we also get updates to that for next year's taxes, new platforms, and other such future business needs? Of course the current requirements have to still remain supported.
There is no world where the added time here for the developers is with the trade off, and if you look through my history, you will see that I largely reject “dev time trade off” as a notion a company should ever seriously entertain as a significant burden.
Yeah - in theory they are the same thing. You can have a dependent type that's an array with a length of 5 or you can create a type called ArrayLengthFive, and in both cases the type checker ensures that you don't accidentally use an unbounded array.
But the difference is that with a dependent type, you get more guarantees (e.g. my ArrayLengthFive type could actually allow for arrays of length 20!). And the type checker forces you to prove the bounds (there could be a bug in my code when I create an ArrayLengthFive). And the dependent type allows other parts of the system to use the type info, whereas ArrayLengthFive is just an opaque type.
I do think there is room for both ways of working. In the string -> Email example, it's probably enough to parse your string and just call it an email. You don't need to try to encode all the rules about an email into the type itself. But there are certainly other cases where it is useful to encode the data constraints into the type so that you can do more operations downstream.
Parsing a string into an email, of course, is already fraught with peril. Is it valid HTML? Or did you mean the email address? Is it a well formed email, or a validated address that you have received correspondence with? How long ago was it validated? Fun spin on the, "It's an older code, sir, but it checks out."
I've seen attempts at solving each of those issues using types. I am not even positive they aren't solvable.
Validation is an event, with it's own discrete type separate from the address itself. This is no different than a physical address.
123 Somewhere Lane
Somewhereville, NY 12345
is a correctly formatted address but is almost certainly not one that physically exists.
Validation that it exists isn't solvable in the type system because, as I mentioned, it is an event. It is only true for the moment it was verified, and that may change at any point in the future, including immediately after verification.
I think I get your point. I would add reconciliation to validation. In that sometimes you cannot validate something without doing it, and are instead storing a possibly out of band reconciliation of result.
I'm curious, though, in how this argument does not apply to many other properties people try and encode into the types? It is one thing if you are only building envelopes and payloads. And, I agree that that gets you a long way. But start building operations on top of the data, and things get a lot more tedious.
It would help to see examples that were not toy problems. Every example I have seen on this is not exactly leaving a good impression.
There are things that can embed a bit of temporal logic into types, such as a linear type that can statically guarantee a file handler has been closed exactly once, or error.
For the most part, what I think people really want are branded types. F# has a nice example of unit, where a float can be branded Fahrenheit or Celsius or Kelvin, and functions can take one of those types as parameters.
You then have some function that can parse user input into a float and brand it one of those types. The compiler doesn't make any guarantees about the user input, only that the resulting branded value cannot be used where a different one is expected. Other good examples are degrees and radians, or imperial and metric, etc.
Depending on what you are doing, knowing at a type level that some number can never be negative (weight in pounds) can save you a lot of hassle. If one part of the system doesn't brand a value, and it feeds into another part of the system that is branded, you're stuck with extraneous runtime checks all over the place or a lot of manual unit tests. Instead, the compiler can point out exactly, and in every instance, where you assumed you had a validated value but instead did not.
Some of these are why I would add reconciliation to the idea. Linear types, specifically. You can encode that you only send a message once. But, without doing reconciliation with the destination, you can not be sure that it was received.
I'm torn, as the examples from small programs make a ton of sense and I largely like them. I just cannot escape that every attempt I've seen at doing these in large programs is often fairly painful. I cannot claim they are more error prone than the ones that skip out on it. I can say they are far more scarce. Such that the main error condition appears to be to stall out on delivery.
> In the string -> Email example, it's probably enough to parse your string and just call it an email. You don't need to try to encode all the rules about an email into the type itself.
There is also the in-between Rust approach. Start with the user input as a byte array. Pass it to a validation function, which returns it encapsulated within a new type.
What is annoying is the boilerplate required. You usually want your new type to behave like a (read-only) version of the original type, and you want some error type with various level of details.
You can macro your way out of most of the boilerplate, with the downside that it quickly becomes a domain specific language.
The salient excerpt is: "A parser is just a function that consumes less-structured input and produces more-structured output". In opposition to a validation function that merely raises an error.
With SPARK mode you can check them at compile-time with provers and elide the runtime checks. You can do _very_ sophisticated type predicates this way.
I think there's a sharp rule of diminishing returns the stronger you make type systems. Type safety comes at a cost and that cost has to come with an associated payoff.
This is why ultra strong type systems end up being little more than academic toys - the payoff when you dial type safety up to an extreme doesn't match the associated cost.
Types and tests should meet somewhere in the middle because the same law of diminishing returns affects tests in reverse. They tend to be good at what the other is bad at and if you rely too heavily on one at the expense of the other they will start fraying around the edges.
Indeed. A type means assigning a rigid category to something.
There is not a single taxonomy that is universal, it ends up like the Celestial Emporium of Benevolent Knowledge that decides animals into:
those belonging to the Emperor,
those that are embalmed,
those that are tame,
pigs,
sirens,
imaginary animals,
wild dogs,
those included in this classification,
those that are crazy-acting,
those that are uncountable,
those painted with the finest brush made of camel hair,
miscellaneous,
those which have just broken a vase, and
those which, from a distance, look like flies.
In Idris you get to choose how strict you want to make your types. E.g. you can choose to use "Vect n a" if you want a vector a's where the size is tracked by the type system. Or you can use e.g. "List a" if you want a list of a's where the size is not tracked by the type system.
Of course the problem comes when you have to interface some code which expects a "Vect n a" and you have a "List a", but that's solvable by simply checking the size (dynamically) and deciding what to do if your "List a" isn't of size exactly n. (Fail, or whatever.). Going from "Vect n a" to a "List a" is trivial, of course.
... but really, that's where most of the friction is: Boundaries between APIs where the types don't quite line up. (Which is often the case for other languages, tbf. We tend to just notice less at compile time since most languages are so bad at expressing constraints.)
What, to you, is an ultra strong type system? Both OCaml and Haskell are used in plenty of non academic contexts. Do you mean something like Coq or F*?
I don't see many GAFAM products created in any functional language, even those with primitive type systems. Are you sure it is the type system that is scaring people away?
With dependent types you still end up having to do dynamic checking of invariants for a substantial portion of real world code to construct the value with the properties you want. Anything coming from disk or network or a database etc. In practice I feel that it is far easier to just do manual dynamic checking with the constructor hidden from other modules such as `constructEmail :: string -> Email' which uses normal value level code, a common Haskell pattern. Obviously a little less flexible for the common dependent type examples of such as appending two vectors of a specific length.
The other alternative is what protobuf did to solve the same pain: In proto3, all optionals have a default value. They're effectively required where the default value doesn't need to be encoded in the message.
The exact different between the two approaches is subtle. Technically you can fancier things with the framework exposing the optional values as set/not set, but in practice a lot of code is a lot simpler with default values.
This is true when you can’t upgrade all the data and all the data generators. But there are smaller systems where you can actually upgrade everything you have and set some new requirements for everything coming in.
If the requirements change, you can update the definition in the type system once. And then you immediately know the locations where this new bound is violated.
It's true that there are valid programs which don't type check in the same way that for any mathematical system there are truths which can't be proven.
In practice though, almost any program you'd want to write can be type checked, in the same way that few proofs get tied into Gödelian knots.
True, but there is a difference between "can be type checked" and "can be type checked in a reasonable amount of time".
I get reminded of this everything I have to work with a certain large Typescript code-base of mine that makes heavy use of union and template literal types. The Typescript Language Server has a hard time with these types. Frequently, everything slows to a crawl in VSCode, and it take 10 minutes of more for intellisense to update after every type change, ouch.
Although in this case, I suspect it is Typescript's implementation of union types that is to blame (since other languages seem to handle complex union types with ease), this experience still shows that type checking can become quite expensive.
The benefits of such an approach would be substantial. However, nobody has gotten the costs down below "eye watering", or similar other metaphors. It's not that it can't be done, it's that it is not yet known if it can be done in a way that comes out positive on the cost/benefits engineering analysis.
I am open to the possibility that it can, but at the same time, I'd say if I draw the trendline of progress in this area out, it's not really all that encouraging.
If you want to see why it has remained only academic, spend some time with Idris and take notes on how many problems you encounter are fundamental to the paradigm versus problems that can be solved with a larger involvement of effort. You'll want to pick a task that fits into its library ecosystem already, and not something like "hey, I'll write a fully generic HTTP server that can be proved correct" on your first try. I seriously suggest this, it's very educational. But part of that education will probably be to tone down your expectations of this being a miracle cure any time soon.
I don't say this to celebrate the situation; it's a bummer. But if the situation is ever going to be solved, it's going to be solved by people viewing it clearly.
(One of the major problems I see is that when you get to this level of specification, you end up with everything being too rigid. It is true that a lot of programming problems arise from things being made too sloppy and permitting a lot of states and actions that shouldn't exist. But some of that slop is also what makes our real-world systems compose together at all, without the original creator having to anticipate every possible usage. The "richer" the specifications get, the more rigid they get, and the more the creator of the code has to anticipate every possible future usage, which is not a reasonable ask. While I'd like to have the option to lock down many things more tightly, especially certain high-priority things like crypto or user permissions, it is far from obvious to me that the optimum ideal across the entire programming world is to maximize the specification level to this degree. And those high-priority use cases tend to jump to mind, but we also tend to overestimate their size in the real world. If we are going to see this style of programming succeed, we need to figure out how to balance the rigidity of tight specification with the ability to still flexibly use code in unanticipated manners later, in a context where the original specifications may not be quite right, and I think it's pretty unclear how to do that. Tightening down much past Haskell seems pretty challenging, and even Haskell is a bridge or three too far for a lot of people.)
Slightly more optimistically: As we go along, small pieces go from "too expensive to actually use" to "usable on real projects, but nobody does yet", and then to "doesn't everybody do that?"
This is the story of improvements in type systems over the last 70 years. But the progress is very slow. So this is only slightly more optimistic...
This is where I'd really be interested in seeing an AI system help. Don't write an AI system that helps me shovel out vast quantities of code, and then solves the problem of the resulting incomprehensible mass of code being impossible to work with by making it even easier to shovel out yet more masses of code to deal with the even larger masses of AI-generated code. It does not take a genius to see that this ends up with even the AIs being staggered by the piles of code in the world until all intelligences, natural and otherwise, are completely bogged down. (Exponential complexity growth defeats everyone in this physical universe, even hypothetically optimal AIs running on pure computronium.) Write me an AI that can help me drive something closer to Idris and bridge the gap between that overly-rigid world I'm complaining about and the more human/real-world goals I'm really trying to accomplish, have the AI and the programmer come to a meeting-of-the-minds and produce something with exactly the right rigidity in it, and help me reuse the results of other's work.
I mean... we're going to end up with the first one. But a man can dream. And while I will cynically say we're going to end up with the first one, it is at least possible that we will then recognize there's a problem and pursue this second approach. But not until the first approach bites us, hard.
Hell, even with agreeing to a very rigid world view, that'd be a dream. Imagine an AI proof assistant that does two things:
1. writes more proofs, focusing on the area I point at
2. makes small, atomic, incremental, suggestions to make it easier to prove something; once those pass proofs+tests, go back to #1
But hey, that would likely require some sort of artificial mental models and iterative reasoning, and not just slapping more compute+data together into a word generator...
I think the world of datalog and dynamic logic programming (or perhaps Answer Set Programming with Constraints) may be very useful in scenario 2, and people are still pursuing that without waiting for scenario 1 to happen first.
I agree with the philosophy at a high level, but opening up the power of what you can do at compile time is a very deep and nuanced trade off space. I think there’s a more general way to state what you’re getting after as well: we should be able to express arbitrary constraints that are enforced by the compiler. These constraints don’t really have to be about values themselves but could also be about how they are consumed e.g. linear types.
Unfortunately, the more powerful the type system, the harder it is to make inferences. Generality comes at a cost.
In the type-theoeretic world, you want to refer to "refinement types" [0]. Some interesting work that I have seen in this direction is Liquid Types [1][2]. There is some work in Haskell[3], but it is surprisingly very useful in Rust as well.
I think the whole point of a type fully understanding the constraints of your data suggests it would ideally.
Typescript is able to do this with strings when using templates.
type Suit = "C" | "D" | "H" | "S";
type Rank = "A" | "2" | "3" | "4";
type Card = `${Suit}${Rank}`;
const club = "D";
const ace = "A";
const aceOfClubs: Card = `${club}${ace}`;
Even though I have not declared the club or ace as a Suit or Rank, it still infers the Card correctly in the last line. This is a stronger form of typing than what you are settling for where 1..100 doesn't know its own range.
The difference you are referring to was already asserted in your first comment. I'm afraid whatever you were trying to add here is being lost. Unless you were, strangely, straight up repeating yourself for no reason?
You'd like what? For the type checker to understand math? We know that already. You said that in your first comment. And the second one. And the third one. And now seemingly the fourth one. But let it be known, whatever new information you are trying to add here is getting lost in translation.
How would you ensure that the values of the types doesn't invalidate the contract? Easy for basic assignment, but not sure how you would do that when doing arbitrary mutation.
You could prove to the type checker that you're only allowing input values that, when fed to your operation, results in outputs that remain in bounds. For example, if you're doing addition, you first check that each input is between 1 and 49, and you exhaustively handle the cases where the values are out of those bounds. Then the type checker can see that the output is always between 1 and 100.
For operations/mutations where it's more complex to validate the inputs, you could assign the result to an unbounded variable, and then prove to the type checker that you're exhaustively handling the output before you assign it to a bounded variable. For example, multiply two unbounded numbers, store the result in an unbounded variable, then do "if result >= 1 && result <= 100 then assign result to var[1..100] else .... end"
You could prevent arbitrary mutation, pushing computation onto a more flexible type and allowing assignment back to the more constrained type only when first guarded by an appropriate conditional. The whole "parse, don't validate" thing.
Idris is fascinating. I actually took the book about it on vacation with me. Software reliability is becoming increasingly important as we tie all these systems together because it multiplies the number of possible unanticipated failure modes.
Regarding the ultimate utility of languages initially considered "academic," the languages Lisp, Haskell, Prolog, Scala, Erlang and OCaml would like to have a word ;)
PrimeNumber is definitely possible in pretty much any dependent type language. Indeed, it would be pretty impossible to get any work done in Lean without it.
TotalProgram: possible in some, not in others, I think.
HaltingProgram and MaybeHalting is expressible(or vice versa), if you want to cover all programs in one of two types. If you want specifically HaltingProgram and NotHaltingProgram then you need a third category of MayOrMayNotHaltingProgram.
I'm not sure what you mean. I don't find Idris restrictive (except for the lack of union types with subtyping). Rather I find any mainstream language very restrictive - or is that what you meant?
That sounds like you're using "type" to mean "representation", albeit with a lot of detail.
I may be odd, but my programming problems are rarely due to the number of bananas or how bananas are represented, but whether I'm working with the correct bananas.
Typescript has a phenomenal type-system indeed. However, its foundations are not very sound. The types are pragmatic and that's what counts mostly, but going forward it would be great to have type-systems that are even more powerful and also theoretically sound with a simpler foundation (= less edge cases and workarounds). Languages like Idris are working on this.
Typescript can't really be this language because it is impeded by having to work with the javascript runtime, which makes this task much much harder to do.
TypeScript's type foundations are not sound, and I would argue that this is a major factor why it is so good. I have yet to see a sound type system that is as pleasant to use as TypeScript's—certainly not Idris.
I am not a fan of encoding all sorts of correctness statements into a static type system. I'd much rather use a theorem proving environment for correctness, but one that is not based on type theory.
What do you gain from an unsound type system? Implicit casts still exist and one dependency returning Any means that your specification is meaningless (you can prove anything, including false).
To quote from there:
> Scala 3 has dropped some unsound and useless features to make the language smaller and more regular. It has added some new constructs to increase its expressiveness. Also, it has changed some constructs to remove warts and increase simplicity, consistency, and usability.
Some of the features they dropped because they were unsound were still useful (to me).
Typescript tries neither to make their typesystem (perfectly) sound, nor to make it as elegant as possible. That results in it to be very useful/pragmatic for everyday-programming tasks.
I don't rely on the type system for my specification. I just rely on it to significantly reduce the number of bugs I inadvertently introduce into the implementation of my specification.
Because the type system is unsound, you could add an error to your implementation and the type system will not catch it. It will happily tell you that everything type checks.
Yes, that can happen. Just as a sound type system will also not prevent all possible bugs. So what? TypeScript's type system helps me to prevent many bugs by performing standard sanity checks, and is a very practical tool.
Just rephrase your question as "How will a pervasive system of sanity checks help me prevent errors?", and I hope you agree that it kind of answers itself.
Yes, I think that's what I said. The pragmatic approach makes to very ergonomic to use in most use cases. That it breaks on for edge-cases is not very relevant in practice usually.
> I'd much rather use a theorem proving environment for correctness, but one that is not based on type theory.
Haskell’s type system isn’t sound either, there is unsafePerformIO.
Nobody actually wants a language with a sound type system, unless they’re writing mathematical proofs. Any time you need to do anything with the external environment, such as call a function written in another language, you need an escape hatch. That’s why every language that aspires to real world use has an unsound type system, and the more practical it aspires to be, the more unsound the type system is.
Soundness is only a goal if the consequences of a type error are bad enough: your proof will be wrong, or an airplane falls out of the sky, or every computer in the world boots to a blue screen.
For everybody else the goal should be to balance error rate with developer productivity. Sacrificing double digits of productivity for single digits of error rate is usually not worth it, since the extra errors that a very sound type system will catch will be dominated by the base rate of logic errors that it can’t catch.
I think there is a misunderstanding on your side. You seem to mix up the soundness of the type-system with the property of referential transparency. Those are two orthogonal things.
> For everybody else the goal should be to balance error rate with developer productivity. Sacrificing double digits of productivity for single digits of error rate is usually not worth it, since the extra errors that a very sound type system will catch will be dominated by the base rate of logic errors that it can’t catch.
I think you are missing my point.
You are merely looking at a single point in time. And yes, you are right - the balance you mention matters.
But what also matters is the future. A language needs to be able to evolve. If it does not do that, it will eventually die and become replaced. If the typesystem is well made with good foundations, the language will be able to evolve and adapt faster and causing less problems for its users.
What does "soundness" mean here? I love typescript's type system, except for the quirks that are inherited from javascript, which I sometimes find infuriating, and I'm wondering if you're talking about the same thing. For example, the fact that typescript- through javascript- only exposes a single `number` type is so annoying. Sometimes that's fine, but other times it would be nice to be able to say "No, this isn't just any damn number, this is an integer! This is an iterable ID for god's sake!" so you at the very least get a static error when you try to pass in `1.001`, or something that could be `1.001`. And that's just a very basic example of how a collection of number types with more granularity would be an improvement. Especially if they were a little more robust than that and were composable. Imagine being able to type `integer | negativeFloat` or other such wacky composed types. Ideally you could compose your types programmatically, and define shit like "the type is a `float` who's floor value has a % 3 of 0".
Informally, a sound type system is one that never lies to you.
Formally, the usual notion of soundness is defined with respect to an evaluation strategy: a term-rewriting rule, and a distinguished set of values. For pure functional programs this is literally just program execution, whereas effects require a more sophisticated notion of equivalence. Either way, we'll refer to it as evaluating the program.
There are two parts:
- Preservation: if a term `a` has type `T` and evaluates to `b`, then `b` has type `T`.
- Progress: A well-typed term can be further evaluated if and only if it is not a value.
// This can go all the way back to the smallest common type:
listenForEvent("mouse", (event: {}) => { });
Typescript {} is a trap that means "any container of things is fine" (including objects), it's not the empty struct. One of the very ugly oddities of the language..
`Record<string, never>` or something like that is the closest equivalent to an empty struct.
Elixir is taking a similar direction if I'm understanding correctly (they seem to insist their type system is fundamentally from Typescript's, but to this day I cannot understand how, both are structural and based on set theory)
The type system is still in development so we haven't seen what the end result is yet, but I believe the system they're implementing is going to be sound.
Foo and Bar are now compatible with FooBar which can create a lot of issues even in strict mode. Typescript is amazing, but interfaces not being struct are a footgun in many places
I think what you describe is called "extensible records". Elm had them in a prior version. You can implement them in a language with an expressive enough type system (e.g. Haskell or Scala) without special support, but the syntax can be a bit unwieldy.
I'm not claiming these are definitive in any sense, but they are useful guides into the literature if any reader wants to learn more. (If you are not used to reading programming language papers, the introduction, related work, and conclusions are usually the most useful. All the greek in the middle can usually be skipped unless you are trying to implement it.)
This is often referred to as "row typing", and the only language I've ever used that implemented it first-class with polymorphism is Purescript[0]. It's possible to model such things in other languages with sufficiently powerful type systems, though they're normally not as nice to use.
As an aside, Purescript is one of the most fun languages I've ever used for frontend work, and I lament that Elm seems to have overtaken it in the FP community.
The construction of the type 'Map<string, Type>' is entirely standard in languages like Agda and Coq (and I bet Idris too). In these languages, Type is itself a type, and can be treated like any other type (like string or int). Nothing clunky about it. (If you're curious, Type is usually referred to as a "universe type" in type theory circles.)
The Ceylon language allowed you to do that sort of thing with types.
If you "added" two maps together of types `Map<String, int>` and `Map<String, String>`, you would get back a `Map<String, String | int>` (or, equivalently, `Map<String, String> | Map<String, int>`).
But for some slightly complex reasons, most language designers find adhoc union types (which are required for this to work) a bad idea. See the Kotlin work related to that, they explicitly want to keep that out of the language (and I've seen that in other language discussions - notice how tricky it can be to decide if two types are "the same" or whether a type is a subtype of another in the presence of generalized unions) except for the case of errors: https://youtrack.jetbrains.com/issue/KT-68296/Union-Types-fo...
> If you "added" two maps together of types `Map<String, int>` and `Map<String, String>`, you would get back a `Map<String, String | int>` (or, equivalently, `Map<String, String> | Map<String, int>`).
But these are obviously not equivalent: the first type is a map where all values are either strings or ints, and the second one is either a map where all values are strings, or all values are ints.
If that's confusing, consider: {foo: 1, bar: "2"}. It satisfies `Map<String, String | int>` but not `Map<String, String> | Map<String, int>`.
(In fact, the latter is a subtype of the former.)
P.S. You also seem to have misunderstood the toplevel comment about modeling record types as maps, as being about typing maps that exist in the language.
These types are equivalent if you consider a value of both of these types have the exact same read operations. Calling `get(String)` will return `String | int` in both cases. You're right that you could build a value of one of these types that does NOT conform to the union, however. I am not sure what's the technical name for what I am trying to say... are they "covariantly equivalent"???
EDIT: ok, I just wanted to say that one type, `Map<String, String | int>`, is a supertype of `Map<String, int> | Map<String, String>`, so if a function accepts the former, it also accepts the latter. They're not equivalent but you can substitute one for the other (one way only) and still perform the same operations (always assuming read-only types, if you introduce mutation everything becomes horrible).
I was trying to emphasize how having type "combinations" ends up causing the type system to become undecidable as you end up with infinite combinations possible, but as I haven't really gone too deeply into why, I am having trouble to articulate the argument.
Ocaml object system can also achieve this in a quite lightweight way
type foo = < foo:int >
type bar = < bar:int >
type k = < foo; bar >
type u = < k; baz:int >
let f (x: <u; ..>) (\* the type annotation is not needed \*) = x#m
You can easily do composable tagged (by index type or even comp time strings) tuples in C++. The syntax is not pretty, but there is minimal to non-existent overhead.
On the other hand, thanks to the loose "duck typed" nature of templates, you can write functions that act on a subset of the fields of a type (even type checking it with concepts) while preserving the type of the whole object.
Go structs sorta behave this way if you put a "parent" struct in the first field of a struct definition. They are, of course, not maps though. But a map with statically defined keys/values/types is basically a struct.
This doesn't function as an interface though. You cannot pass a FooBar to a function that expects a Foo, for example, and although you can fairly easily reference the Foo-part of a FooBar instance (`foobar.Foo`) there is no way to pass e.g. an array of FooBar instances to a function that takes a slice of Foo[] as its argument. That's the problem to be solved.
Totally achievable by using interfaces instead of structs
type Foo interface {
Foo() int
}
type Bar interface {
Bar() string
}
type FooBar interface {
Foo
Bar
}
Then functions that accept a Foo will also happily take a FooBar. Does not solve the problem of passing a FooBar[] to a function that expects Foo[] but that can be solved with generics or a simple function to convert FooBar[] to Foo[].
That sounds achievable at compile-time in c++ with some type-level map implementation such as in boost.mp11 (sadly the language does not allow general use of types as values preventing the use of standard map containers).
Is this something professional programmers are having trouble with??
This is the kind of problem you face in your first year working, no? I am honetly curious what others think. Do you have trouble deciding when to use an interface (assuming your language has that), or a type wrapper (I don't think that's the brightest idea), or a function to extract the field to sort by (most languages do that)??
Your confidence that there is an obviously optimal way to model something reads as either brilliance or hubris and brilliance is a lot rarer.
Typically there are subtle trade-offs and compromises which only prove themselves to be useful/detrimental as the software changes over time. You can place bets based on experience but you can only really be cocky about your choices when looking back not looking forward
> Is this something professional programmers are having trouble with??
I would say, yes. So does everybody else. If you don't believe it, I don't think you appreciate the depth of these problems.
I am reminded of Eric Meijer saying that the interface (in the Java sense) part of the contract between types is the least interesting part. What is important are their algebraic properties, which can be modeled using functional types and other bunch of advanced type-theoretical ideas (like linear types for instance).
Modelling stuff with types is not easy, it's an art.
> Is this something professional programmers are having trouble with??
Nobody said this.
There is value in thinking about things like this sometimes, because it has long-term consequences for the projects we work on. Even if you're a "professional" programmer (whatever that means), it's valuable to go back to beliefs and knowledge you've established long ago to evaluate whether to change them in the face of new experiences you've made since the last time you've thought about it.
If you think "professional" programmers don't get this sort of thing wrong in some form or another, I have a bridge to sell you.
Why scare quotes though? Professional programmer just means someone who writes software for a living. Software is your profession... there's no doubt what that means , is there??
If you do that, you'll have run into this sort of decision very early in your career, and hopefully will understand the best way to handle it, which IMHO just depends on the language (because certain features lead to different optimum solutions) and the culture around it. But sure, I am happy to discuss fundamental topics like this, that's why I am engaging and asking what others think.
It does not solve the problem. Would you rather have a function that takes a map and puts additional keys in it (ex. `timestamp`) or a function that takes a map and stores it in a new map alongside the additional keys? In any case, how would you write a generic function that works on those additional keys?
I think TFA is talking about nesting fields in whatever order and also being able to extract the fields without ceremony like `get(get(get...))`—just allow `get(...)`. If you wanted an example more close to the submission you would have a multi-field class inside that class. Then you would perhaps solve the nested field access boilerplate by manually implementing it in the interface methods.
It's not hard at all and it's actually what the second approach listed by the author shows:
class HasRecipient a where
get_receiver :: a -> PlayerId
which adjusted to your example would be
class Timestamped a where
timestamp :: a -> UTCTime
The problem with this approach is that you'll have duplicated data on all instances. In your example, `Quote` has the fields `timestamp` and `sender` in order to satisfy with `Timestamped` and `Msg`. If you had several classes like `Quote` and interfaces like `Timestamped` then you would end up with a lot of duplicated code.
The challenge is composition. Adding Timestamped to something in a static, type-checked way, but not modifying the source code of that something.
> It doesn't have interface polymorphism?
This is besides the point, but its interface polymorphism is static, not dynamic. If you have a List which is IMappable, and you do some IMappable operations on it, in most OOP languages you get back an IMappable, but in Haskell you get back List.
In the philosophic position of mereological nihilism only "simples" exist when discussing objects, etc, simples are akin to byte types for sure and primitive types maybe.
Nothing complex, like objects, higher level types, etc. exist.
Composition of simples in time and space is what determines everything other than simples. Anything that isn't a simple is just emergent from the space/time arrangement of the simples.
So if the implications are that nothing is real outside of time/space/simples, then calling something higher level a "type" is to label something as discrete and real when it isn't, so now you are using language and models that are wrong to reason about things, which means your model will have frustrating drift from reality that you can't rectify.
That's where schemas come in, we are just labeling common arrangements of simples for semantic reasons but they don't really exist so they shouldn't be elevated to the position of a type as that is a wrong abstraction and causes divergence and mess.
problem of known types, or interfaces, as values is that conditions for the value format or validation changes based on context. so you might have a price that is valid in one context but not in another(like a negative value). the problem then occurs on the input where you might let in value that is valid on higher level but not on deeper level and by the time you detect it it might be too late(you have committed a transaction with invalid data).
> interfaces describe behavior, types describe shape and structure
Shape and structure are behavior. There's more to behavior than "abstractly produces X result". Behavior is "produces X result in Y form given Z in W form".
They shouldn't be. Conflating two things that can be separated is just compounding complexity. You don't need to know how a database lays out memory or the structure behind a web server.
> You don't need to know how a database lays out memory
You do, however, need to know what logical columns are in a table and the types of those columns to be able to effectively query against the table. And you need to know what the types of the inputs to a query wrapper function are to be able to call it properly.
Memory layout has nothing to do with type, because physical memory layout is completely separate from the semantic logical layout and form represented by the bits. You now appear to be the one conflating two unrelated things.
That something is treated as an integer fundamentally matters to its use. That the thing comprises some number of adjacent bits in big/little-endian arrangement is a very unrelated implementation detail.
> So float, int, unint64, int8 and a vector/array don't have specific memory layouts?
The memory layouts don't need to be known to the user. Different hardware architectures can have the concept of floats and ints and code can be written using them the same way while the underlying representations are distinct. I promise that IEEE754 is not the only way to represent floating point values handed down from god to man on a golden scroll.
> > And you need to know what the types of the inputs to a query wrapper function are to be able to call it properly.
> That's the interface.
Funny, I recall you not very long ago saying "interfaces describe behavior, types describe shape and structure". Now you acknowledge that the interface necessarily includes the types of the arguments?
Data types can optionally specify machine representation but need not. They specify behavior (i.e. possible values and operations) first and foremost. In programming, nearly every use of data type is in the abstract, entirely separate from its hardware representation. The "Int", the "Bool", the "2", the "UTCTime" being specified in TFA don't care about bit arrangements. They're describing valid values and operations, not hardware.
> I never said that.
You're right. Apologies. It was the other poster at the top of the thread. But it still seems apt to the thread and to your specific portrayal.
Interfaces are usually understood as a subset of types. Interfaces specifically describe method signatures. Nevertheless, the distinction is murky because any structure can be interpreted in terms of getters and setters which can be considered methods.
Depends on the language. In typescript and interface can absolutely encode shape. In C# you can encode shape in interfaces via properties. Interfaces don’t even describe behavior, it’s just that we typically associate them with a man implicit contract. Strictly speaking they are just a shape.
Differences in foundational understanding & unteachability of foundational research causes inevitably ton of confusion in texts on formal science ideas outside a known academic context.
This text should not fascinate a programmer but create frustration of two types:
(1) Frustration on one's lack of small pieces knowledge
(2) Frustration that NOW you will not have the chance to invent this on your ow; your creative process takes damage.
Out of which the second type should be the one that makes majority of cases.
ALSO this should cause one to have respect of form: "hey, this programmer was probably at least smart enough to figure out this alone."
Perhaps most polite woule be to, for every text, in the situation to have notice at beginning: "For programmers who already have thought about what would happen were you to add X to Y but so that Z enough to probably not to get new ideas in this context."
I’ve come to believe that a type should be capable of reflecting any arbitrary rules a programmer knows about the bounds of a value. The only real implementations of this I see are mostly academic with a dependent type system like Idris. We’re many years from that becoming mainstream, if it ever will be.
I regret to say that every type level Gordian knot that I have ever been exposed to came from attempts to do this.
I think a lot of the problem is that many of the constraints and acceptable values for data are not determined until well after first deployment. The "knot" comes in when you are explicitly changing data and code to add a feature that you didn't anticipate at the start.
This is a lot like schemaless databases. The flexibility of not having to fully specify the full schema does not mean that you don't benefit from specifying parts of it? Indeed, if you are indexing things, you have to specify those parts. But it is hard to argue with a straight face that schemaless tools don't have some strong benefits.
>> This is a lot like schemaless databases. The flexibility of not having to fully specify the full schema does not mean that you don't benefit from specifying parts of it? Indeed, if you are indexing things, you have to specify those parts. But it is hard to argue with a straight face that schemaless tools don't have some strong benefits.
This is very similar to what Rich Hickey argued in "Simplicity Matters": https://youtu.be/rI8tNMsozo0?si=xTkpsLYTYh0jA5lB
Basically when you use a hash / JavaScript-style object rather than a fixed struct definition for your data (which amounts to a schema- less database in aggregate)
The first two properties can be handled by adding optional fields to your struct. The third can be done at compile time rather than runtime if you define a strict class hierarchy. I think really the problem is that many languages make struct definition needlessly verbose so it feels convenient to just use a map. The price of using schemaless includes runtime performance as well as the compiler being unable to warn you about easy (for a compiler) to detect and report data mismatch errors. I have spent a lot of time using lua the way I think many people use clojure. It is quite seductive and does feel very productive at first, but eventually I find that for a sufficiently complex program, I want the compiler to validate my data usage for me.
Protobufs are an interesting hybrid. They assume a common, linear history of schema versions, where all data generators had access to an arbitrary version in that history. The schemas define field ids and types, but not which combinations of fields can show up.
If you can upgrade all the data then you don’t need to worry about versioning,
Or you can do it like HTTP headers and hope for the best.
Protobufs are also a fun place to look for how much debate people will get into regarding required versus optional fields. Your post is taking an implicit "everything is optional" view. But, it does allow you to be stricter.
Common Lisp Object System also touched on all of these ideas years ago.
Everything has been `optional` for about a decade.
> The option to set a field to required is absent in proto3 and strongly discouraged in proto2.
https://protobuf.dev/overview/#syntax
Ah, I dropped off protobuf a while back. I definitely remember a lot of uproar about it, at the time.
Sometimes I’ve wondered if versioned types might be a help in bringing coexistence to a typed setup, and if this is part of where the benefits of a service oriented architecture come from. But given how versioning snarls can play out in dependency management I expect this idea has some tradeoffs in the best case.
You might be interested in the Unison language then.
I'm curious what you have in mind that doesn't boil down to "duck typing?"
Not the most elegant mechanism, but it should be accessible to imagine an implementation via inheritance hierarchies even for a static manifestly typed language. Class "Person4.1" inherits from class "Person4" inherits from class "Person2" inherits from "Person". Probably there's a better way (and one could argue that a design where knowing the version of the class/object matters isn't a well-encapsulated design, which could have something to do with why duck typing often works out better than expected in OO systems).
I think my question is what would you be doing here that doesn't boil down to "has these properties/methods, I'll accept"?
I think this is a lot easier if you exclude the "methods" in my quote there. Since those almost certainly have the same general contract that would spread across things.
> This is a lot like schemaless databases. The flexibility of not having to fully specify the full schema does not mean that you don't benefit from specifying parts of it? Indeed, if you are indexing things, you have to specify those parts. But it is hard to argue with a straight face that schemaless tools don't have some strong benefits.
Let me put on my straightest face.
Schemaless tools have no benefits.*
As soon as you start _doing_ anything with your data, it has a schema. You may not have declared it to your data store, to a specific piece of code, etc, but your application logic almost definitely makes assumptions about the shape of the data--what fields are and aren't present, the types of values in those fields, and what those values mean.
Every piece of your system must either accept undefined behaviour in the face of data which doesn't fit this implicit schema (there are very few non-trivial problems where you can define a meaningful, valid output for all possible inputs), or essentially treat the schemaless parts as hostile and assert its assumptions before every action.
The only thing you've done by going "schemaless" is distributing your schema across your systems and pushed the schema validation out to each thing interacting with the schemaless components instead of centralizing it.
* Yes, literally storing raw, unstructured data. But as soon as you need to _do_ anything with the data besides regurgitate it you're back to it having some sort of schema.
So you have mainly straw-manned on the word "schemaless" to make an argument. This is akin to attacking dynamic typing by pointing out that most things can be type inferred by the code.
That is, you are not wrong in that most data stored is structured in some way; but you are ignoring a ton of the benefit of the schemaless world. As I pointed out, there are benefits to specifying schema. In parts, you /have/ to do so. Specifically if you want your DB to do any work on the data. Indexing/filtering and the like.
Now, you do hit largely on the benefits of the "schemaless" world. And that is that it is not undefined how it will store whatever schema you throw at it. You will have restrictions on keys, but otherwise it will store whatever you throw at it.
At large, this means you can skip out on a lot of the formality of specifying all schema shapes during development and can take a much more flexible approach in your code than you are forced to if the data layer can't work with you.
I empathize with this point of view - if not in any other way then in principle.
However, as a counterpoint, I'd suggest that encoding all known invariants as types may be prohibitively cumbersome and time-consuming - to the point where writing the program becomes more of an exercise in proofs rather than producing something useful. Often the invariants of a program are implied or can be easily inferred by the reader, which can actually make the program easier to understand for both the reader and the writer by leaving some things unsaid.
Of course, whether a program is easier to understand when its invariants are written down vs. when they are left to be inferred by the reader is probably a matter of circumstance, so different tools for different jobs and all that.
I've recently been writing a small program in Go - my first foray into the language - and I thoroughly enjoy how it gets out of my way. The lack of rigor is freeing if you don't stop to think about it too much.
The problem is that user inferring doesn't scale. For small projects this is reasonable but for enterprise software engineering it is easy for a constraint that isn't enforced by the type system to be missed by an engineer leading to a bug. Whereas typed constraints naturally propagate throughout the system.
But then ironically, the problem with relying on programming language type systems is that they don't scale (beyond the boundaries of a single process). As soon as you are crossing a wire you have to go back to runtime validations.
Throwing out type safety simply because you have unverified input is a little silly. It's trivial in most statically typed languages to safely turn input into a concrete type (or an error if it is invalid) at the edge and meanwhile maintain type safety throughout the rest of the code.
There's where good documentation and training pays off.
Some invariants that are too complex to express purely in code, requiring numerous separate low level bookkeeping functions throughout the system, can be concisely expressed as a high level idea in natural language (wether as comments or at an external technical document).
Make sure that your developers are exposed to those documents, throw in a few tests for the most obvious cases to ensure that newcomers are learning the invariants by heart, and if they're competent you should not have problems but in the most obscure cases (that could appear anyway even if you try to formalize everything in code).
Typed constraints don't scale either.
I eagerly await your implementation of a type describing my tax return. And to implement a compiler for the new programming language you'll inevitably have to construct, we're gonna need two types, one expressing all valid source programs, and the other all valid programs for the target platform.
Oh, can we also get updates to that for next year's taxes, new platforms, and other such future business needs? Of course the current requirements have to still remain supported.
There is no world where the added time here for the developers is with the trade off, and if you look through my history, you will see that I largely reject “dev time trade off” as a notion a company should ever seriously entertain as a significant burden.
Yeah - in theory they are the same thing. You can have a dependent type that's an array with a length of 5 or you can create a type called ArrayLengthFive, and in both cases the type checker ensures that you don't accidentally use an unbounded array.
But the difference is that with a dependent type, you get more guarantees (e.g. my ArrayLengthFive type could actually allow for arrays of length 20!). And the type checker forces you to prove the bounds (there could be a bug in my code when I create an ArrayLengthFive). And the dependent type allows other parts of the system to use the type info, whereas ArrayLengthFive is just an opaque type.
I do think there is room for both ways of working. In the string -> Email example, it's probably enough to parse your string and just call it an email. You don't need to try to encode all the rules about an email into the type itself. But there are certainly other cases where it is useful to encode the data constraints into the type so that you can do more operations downstream.
Parsing a string into an email, of course, is already fraught with peril. Is it valid HTML? Or did you mean the email address? Is it a well formed email, or a validated address that you have received correspondence with? How long ago was it validated? Fun spin on the, "It's an older code, sir, but it checks out."
I've seen attempts at solving each of those issues using types. I am not even positive they aren't solvable.
Validation is an event, with it's own discrete type separate from the address itself. This is no different than a physical address.
is a correctly formatted address but is almost certainly not one that physically exists.Validation that it exists isn't solvable in the type system because, as I mentioned, it is an event. It is only true for the moment it was verified, and that may change at any point in the future, including immediately after verification.
I think I get your point. I would add reconciliation to validation. In that sometimes you cannot validate something without doing it, and are instead storing a possibly out of band reconciliation of result.
I'm curious, though, in how this argument does not apply to many other properties people try and encode into the types? It is one thing if you are only building envelopes and payloads. And, I agree that that gets you a long way. But start building operations on top of the data, and things get a lot more tedious.
It would help to see examples that were not toy problems. Every example I have seen on this is not exactly leaving a good impression.
There are things that can embed a bit of temporal logic into types, such as a linear type that can statically guarantee a file handler has been closed exactly once, or error.
For the most part, what I think people really want are branded types. F# has a nice example of unit, where a float can be branded Fahrenheit or Celsius or Kelvin, and functions can take one of those types as parameters.
You then have some function that can parse user input into a float and brand it one of those types. The compiler doesn't make any guarantees about the user input, only that the resulting branded value cannot be used where a different one is expected. Other good examples are degrees and radians, or imperial and metric, etc.
Depending on what you are doing, knowing at a type level that some number can never be negative (weight in pounds) can save you a lot of hassle. If one part of the system doesn't brand a value, and it feeds into another part of the system that is branded, you're stuck with extraneous runtime checks all over the place or a lot of manual unit tests. Instead, the compiler can point out exactly, and in every instance, where you assumed you had a validated value but instead did not.
Some of these are why I would add reconciliation to the idea. Linear types, specifically. You can encode that you only send a message once. But, without doing reconciliation with the destination, you can not be sure that it was received.
I'm torn, as the examples from small programs make a ton of sense and I largely like them. I just cannot escape that every attempt I've seen at doing these in large programs is often fairly painful. I cannot claim they are more error prone than the ones that skip out on it. I can say they are far more scarce. Such that the main error condition appears to be to stall out on delivery.
> In the string -> Email example, it's probably enough to parse your string and just call it an email. You don't need to try to encode all the rules about an email into the type itself.
There is also the in-between Rust approach. Start with the user input as a byte array. Pass it to a validation function, which returns it encapsulated within a new type.
What is annoying is the boilerplate required. You usually want your new type to behave like a (read-only) version of the original type, and you want some error type with various level of details.You can macro your way out of most of the boilerplate, with the downside that it quickly becomes a domain specific language.
AKA the "parse, don't validate" approach [1].
1: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
The salient excerpt is: "A parser is just a function that consumes less-structured input and produces more-structured output". In opposition to a validation function that merely raises an error.
Of course the type signature is what seals the deal at the end of the day. But I am going to follow this naming convention from now on.> arbitrary rules a programmer knows about the bounds of a value
Ada's generalized type contracts using subtype predicates work pretty well for this: https://learn.adacore.com/courses/Ada_For_The_CPP_Java_Devel...
You can use it for something as simple as expressing ranges, or to represent types with arbitrary constraints, including types with discontinuities.
But these are only promised to be checked at runtime, even though the compiler can sometimes hint at things earlier.
With SPARK mode you can check them at compile-time with provers and elide the runtime checks. You can do _very_ sophisticated type predicates this way.
I think there's a sharp rule of diminishing returns the stronger you make type systems. Type safety comes at a cost and that cost has to come with an associated payoff.
This is why ultra strong type systems end up being little more than academic toys - the payoff when you dial type safety up to an extreme doesn't match the associated cost.
Types and tests should meet somewhere in the middle because the same law of diminishing returns affects tests in reverse. They tend to be good at what the other is bad at and if you rely too heavily on one at the expense of the other they will start fraying around the edges.
Indeed. A type means assigning a rigid category to something.
There is not a single taxonomy that is universal, it ends up like the Celestial Emporium of Benevolent Knowledge that decides animals into:
Then you have a new use case.In Idris you get to choose how strict you want to make your types. E.g. you can choose to use "Vect n a" if you want a vector a's where the size is tracked by the type system. Or you can use e.g. "List a" if you want a list of a's where the size is not tracked by the type system.
Of course the problem comes when you have to interface some code which expects a "Vect n a" and you have a "List a", but that's solvable by simply checking the size (dynamically) and deciding what to do if your "List a" isn't of size exactly n. (Fail, or whatever.). Going from "Vect n a" to a "List a" is trivial, of course.
... but really, that's where most of the friction is: Boundaries between APIs where the types don't quite line up. (Which is often the case for other languages, tbf. We tend to just notice less at compile time since most languages are so bad at expressing constraints.)
What, to you, is an ultra strong type system? Both OCaml and Haskell are used in plenty of non academic contexts. Do you mean something like Coq or F*?
"plenty" is relative.
You don't see many GAFAM products created in either, and that's because of the trade off OP talks about.
I don't see many GAFAM products created in any functional language, even those with primitive type systems. Are you sure it is the type system that is scaring people away?
I was thinking of Haskell and F#. Neither one is completely unused outside of academic contexts but it is rare.
Rust is an example where a stronger type system has an associated payoff and it's being used all over.
With dependent types you still end up having to do dynamic checking of invariants for a substantial portion of real world code to construct the value with the properties you want. Anything coming from disk or network or a database etc. In practice I feel that it is far easier to just do manual dynamic checking with the constructor hidden from other modules such as `constructEmail :: string -> Email' which uses normal value level code, a common Haskell pattern. Obviously a little less flexible for the common dependent type examples of such as appending two vectors of a specific length.
You should be parsing/validating any value coming from disk or the network
You never know when the file will just suddenly be all zeros.
Requirements change, things evolve. Having bounds in the type system is too restrictive.
Even history of "required", rather simple restriction on the value, is showing that putting that logic into type system is too much of a burden.
https://capnproto.org/faq.html#how-do-i-make-a-field-require...
The other alternative is what protobuf did to solve the same pain: In proto3, all optionals have a default value. They're effectively required where the default value doesn't need to be encoded in the message.
The exact different between the two approaches is subtle. Technically you can fancier things with the framework exposing the optional values as set/not set, but in practice a lot of code is a lot simpler with default values.
https://protobuf.dev/programming-guides/proto3/#default
This is true when you can’t upgrade all the data and all the data generators. But there are smaller systems where you can actually upgrade everything you have and set some new requirements for everything coming in.
If the requirements change, you can update the definition in the type system once. And then you immediately know the locations where this new bound is violated.
To check these types would then require possibly infinite time, and the checking may never halt, per the halting problem.
Useful types are a compromise between expressiveness and practicality.
It's true that there are valid programs which don't type check in the same way that for any mathematical system there are truths which can't be proven.
In practice though, almost any program you'd want to write can be type checked, in the same way that few proofs get tied into Gödelian knots.
True, but there is a difference between "can be type checked" and "can be type checked in a reasonable amount of time".
I get reminded of this everything I have to work with a certain large Typescript code-base of mine that makes heavy use of union and template literal types. The Typescript Language Server has a hard time with these types. Frequently, everything slows to a crawl in VSCode, and it take 10 minutes of more for intellisense to update after every type change, ouch.
Although in this case, I suspect it is Typescript's implementation of union types that is to blame (since other languages seem to handle complex union types with ease), this experience still shows that type checking can become quite expensive.
The benefits of such an approach would be substantial. However, nobody has gotten the costs down below "eye watering", or similar other metaphors. It's not that it can't be done, it's that it is not yet known if it can be done in a way that comes out positive on the cost/benefits engineering analysis.
I am open to the possibility that it can, but at the same time, I'd say if I draw the trendline of progress in this area out, it's not really all that encouraging.
If you want to see why it has remained only academic, spend some time with Idris and take notes on how many problems you encounter are fundamental to the paradigm versus problems that can be solved with a larger involvement of effort. You'll want to pick a task that fits into its library ecosystem already, and not something like "hey, I'll write a fully generic HTTP server that can be proved correct" on your first try. I seriously suggest this, it's very educational. But part of that education will probably be to tone down your expectations of this being a miracle cure any time soon.
I don't say this to celebrate the situation; it's a bummer. But if the situation is ever going to be solved, it's going to be solved by people viewing it clearly.
(One of the major problems I see is that when you get to this level of specification, you end up with everything being too rigid. It is true that a lot of programming problems arise from things being made too sloppy and permitting a lot of states and actions that shouldn't exist. But some of that slop is also what makes our real-world systems compose together at all, without the original creator having to anticipate every possible usage. The "richer" the specifications get, the more rigid they get, and the more the creator of the code has to anticipate every possible future usage, which is not a reasonable ask. While I'd like to have the option to lock down many things more tightly, especially certain high-priority things like crypto or user permissions, it is far from obvious to me that the optimum ideal across the entire programming world is to maximize the specification level to this degree. And those high-priority use cases tend to jump to mind, but we also tend to overestimate their size in the real world. If we are going to see this style of programming succeed, we need to figure out how to balance the rigidity of tight specification with the ability to still flexibly use code in unanticipated manners later, in a context where the original specifications may not be quite right, and I think it's pretty unclear how to do that. Tightening down much past Haskell seems pretty challenging, and even Haskell is a bridge or three too far for a lot of people.)
Slightly more optimistically: As we go along, small pieces go from "too expensive to actually use" to "usable on real projects, but nobody does yet", and then to "doesn't everybody do that?"
This is the story of improvements in type systems over the last 70 years. But the progress is very slow. So this is only slightly more optimistic...
This is where I'd really be interested in seeing an AI system help. Don't write an AI system that helps me shovel out vast quantities of code, and then solves the problem of the resulting incomprehensible mass of code being impossible to work with by making it even easier to shovel out yet more masses of code to deal with the even larger masses of AI-generated code. It does not take a genius to see that this ends up with even the AIs being staggered by the piles of code in the world until all intelligences, natural and otherwise, are completely bogged down. (Exponential complexity growth defeats everyone in this physical universe, even hypothetically optimal AIs running on pure computronium.) Write me an AI that can help me drive something closer to Idris and bridge the gap between that overly-rigid world I'm complaining about and the more human/real-world goals I'm really trying to accomplish, have the AI and the programmer come to a meeting-of-the-minds and produce something with exactly the right rigidity in it, and help me reuse the results of other's work.
I mean... we're going to end up with the first one. But a man can dream. And while I will cynically say we're going to end up with the first one, it is at least possible that we will then recognize there's a problem and pursue this second approach. But not until the first approach bites us, hard.
Hell, even with agreeing to a very rigid world view, that'd be a dream. Imagine an AI proof assistant that does two things:
1. writes more proofs, focusing on the area I point at
2. makes small, atomic, incremental, suggestions to make it easier to prove something; once those pass proofs+tests, go back to #1
But hey, that would likely require some sort of artificial mental models and iterative reasoning, and not just slapping more compute+data together into a word generator...
I think the world of datalog and dynamic logic programming (or perhaps Answer Set Programming with Constraints) may be very useful in scenario 2, and people are still pursuing that without waiting for scenario 1 to happen first.
Yeah, scenario 1 will also probably still happen.
Haskell has real world examples, for instance:
https://docs.servant.dev/en/stable/tutorial/ApiType.html
I agree with the philosophy at a high level, but opening up the power of what you can do at compile time is a very deep and nuanced trade off space. I think there’s a more general way to state what you’re getting after as well: we should be able to express arbitrary constraints that are enforced by the compiler. These constraints don’t really have to be about values themselves but could also be about how they are consumed e.g. linear types.
Unfortunately, the more powerful the type system, the harder it is to make inferences. Generality comes at a cost.
I'm surprised that most popular languages don't even do simple type constraints. Like
I feel that there are lots of low hanging fruits when it comes to typing which would improve both program readability and correctness.In the type-theoeretic world, you want to refer to "refinement types" [0]. Some interesting work that I have seen in this direction is Liquid Types [1][2]. There is some work in Haskell[3], but it is surprisingly very useful in Rust as well.
[0] https://en.wikipedia.org/wiki/Refinement_type [1] https://goto.ucsd.edu/~ucsdpl-blog/liquidtypes/2015/09/19/li... [2] https://goto.ucsd.edu/~rjhala/liquid/liquid_types.pdf [3] https://ucsd-progsys.github.io/liquidhaskell/ [4] https://github.com/flux-rs/flux
TypeScript does the latter.
The former is difficult to track through arithmetic operations.
Typescript also does the former, albeit with less friendly syntax.
Kinda but not really. It actually handles type inference on string concatenation, but doesn't understand math.
Typescript (the version running on my box) considers this an error.There is nothing to suggest that 1..100 understands math.
I think the whole point of a type fully understanding the constraints of your data suggests it would ideally.
Typescript is able to do this with strings when using templates.
Even though I have not declared the club or ace as a Suit or Rank, it still infers the Card correctly in the last line. This is a stronger form of typing than what you are settling for where 1..100 doesn't know its own range.This is the difference I'm referring to.
The difference you are referring to was already asserted in your first comment. I'm afraid whatever you were trying to add here is being lost. Unless you were, strangely, straight up repeating yourself for no reason?
let me say in fewer words:
>There is nothing to suggest that 1..100 understands math.
That's like, your opinion man. I'd like it.
You'd like what? For the type checker to understand math? We know that already. You said that in your first comment. And the second one. And the third one. And now seemingly the fourth one. But let it be known, whatever new information you are trying to add here is getting lost in translation.
Ada code resembles this.
How would you ensure that the values of the types doesn't invalidate the contract? Easy for basic assignment, but not sure how you would do that when doing arbitrary mutation.
You could prove to the type checker that you're only allowing input values that, when fed to your operation, results in outputs that remain in bounds. For example, if you're doing addition, you first check that each input is between 1 and 49, and you exhaustively handle the cases where the values are out of those bounds. Then the type checker can see that the output is always between 1 and 100.
For operations/mutations where it's more complex to validate the inputs, you could assign the result to an unbounded variable, and then prove to the type checker that you're exhaustively handling the output before you assign it to a bounded variable. For example, multiply two unbounded numbers, store the result in an unbounded variable, then do "if result >= 1 && result <= 100 then assign result to var[1..100] else .... end"
You could prevent arbitrary mutation, pushing computation onto a more flexible type and allowing assignment back to the more constrained type only when first guarded by an appropriate conditional. The whole "parse, don't validate" thing.
The reference (an excellent read!): "Parse, don't validate" https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
Idris is fascinating. I actually took the book about it on vacation with me. Software reliability is becoming increasingly important as we tie all these systems together because it multiplies the number of possible unanticipated failure modes.
Regarding the ultimate utility of languages initially considered "academic," the languages Lisp, Haskell, Prolog, Scala, Erlang and OCaml would like to have a word ;)
But most of those don't have a real type system though? And definitely not a strict one.
Scala, OCaml, and Haskell all have very powerful type systems. OCaml and Haskell even have good ones.
I wish it were possible, but I have a sneaking suspicion that types like PrimeNuber or TotalProgram, are beyond the realm of possibility.
PrimeNumber is definitely possible in pretty much any dependent type language. Indeed, it would be pretty impossible to get any work done in Lean without it.
TotalProgram: possible in some, not in others, I think.
> PrimeNumber is definitely possible in pretty much any dependent type language.
That's awesome! Thank you. I didn't know type systems were capable of expressing that level of sophistication.
Would HaltingProgram and NonHaltingProgram be expressible? I hope it's clear what I'm trying to convey, but the technical wording escapes me today.
HaltingProgram and MaybeHalting is expressible(or vice versa), if you want to cover all programs in one of two types. If you want specifically HaltingProgram and NotHaltingProgram then you need a third category of MayOrMayNotHaltingProgram.
There is a fascinating result about types that they have a one to one correspondence with proofs. Dependent type systems leverage that to the hilt.
see the 'total' keyword in Idris
https://docs.idris-lang.org/en/latest/proofs/patterns.html
Sadly this is true and I 100% agree. After reading and working through type-level programmin in Idris, this is the conclusion I came to.
What made it seem too restrictive to you? Or what were your experiences trying to apply it to real-world scenarios?
I'm not sure what you mean. I don't find Idris restrictive (except for the lack of union types with subtyping). Rather I find any mainstream language very restrictive - or is that what you meant?
ah, I think I misunderstood. Still getting through the Idris book :)
That sounds like you're using "type" to mean "representation", albeit with a lot of detail.
I may be odd, but my programming problems are rarely due to the number of bananas or how bananas are represented, but whether I'm working with the correct bananas.
> a type should be capable of reflecting any arbitrary rules a programmer knows about the bounds of a value
That's a programming language. In other words, now your type system needs a type system.
I'm optimistic that improved proof search using modern AI can make working with dependent type systems much more productive and practical.
A future programmer may be spending more time formally describing the invariants of the system.
Indeed, TypeScript can do exactly this and more, without much ceremony:
I doubt you will find a language where it is less clunky.Edit: Oh, I typed this on mobile, this was supposed to be a comment on another comment by posix_monad.
Typescript has a phenomenal type-system indeed. However, its foundations are not very sound. The types are pragmatic and that's what counts mostly, but going forward it would be great to have type-systems that are even more powerful and also theoretically sound with a simpler foundation (= less edge cases and workarounds). Languages like Idris are working on this.
Typescript can't really be this language because it is impeded by having to work with the javascript runtime, which makes this task much much harder to do.
TypeScript's type foundations are not sound, and I would argue that this is a major factor why it is so good. I have yet to see a sound type system that is as pleasant to use as TypeScript's—certainly not Idris.
I am not a fan of encoding all sorts of correctness statements into a static type system. I'd much rather use a theorem proving environment for correctness, but one that is not based on type theory.
What do you gain from an unsound type system? Implicit casts still exist and one dependency returning Any means that your specification is meaningless (you can prove anything, including false).
The point is that making a typesystem sound is really hard and requires trade-offs. I think Scala is a wonderful example. Check: https://www.baeldung.com/scala/dotty-scala-3
To quote from there: > Scala 3 has dropped some unsound and useless features to make the language smaller and more regular. It has added some new constructs to increase its expressiveness. Also, it has changed some constructs to remove warts and increase simplicity, consistency, and usability.
Some of the features they dropped because they were unsound were still useful (to me).
Typescript tries neither to make their typesystem (perfectly) sound, nor to make it as elegant as possible. That results in it to be very useful/pragmatic for everyday-programming tasks.
> What do you gain from an unsound type system?
The ability to type most idiomatic javascript circa 2014. It's definitely a Faustian bargain.
I don't rely on the type system for my specification. I just rely on it to significantly reduce the number of bugs I inadvertently introduce into the implementation of my specification.
How are you sure you haven’t introduced bugs if the specification/type has an error it doesn’t tell you about?
I have trouble parsing your sentence, to be honest. What are you asking?
Are you asking how am I sure that/if my specification is correct?
Are you asking how do I make sure I have no bugs without a proof?
Maybe you are asking something else entirely?
Because the type system is unsound, you could add an error to your implementation and the type system will not catch it. It will happily tell you that everything type checks.
How do you use that to prevent errors?
Yes, that can happen. Just as a sound type system will also not prevent all possible bugs. So what? TypeScript's type system helps me to prevent many bugs by performing standard sanity checks, and is a very practical tool.
Just rephrase your question as "How will a pervasive system of sanity checks help me prevent errors?", and I hope you agree that it kind of answers itself.
Yes, I think that's what I said. The pragmatic approach makes to very ergonomic to use in most use cases. That it breaks on for edge-cases is not very relevant in practice usually.
> I'd much rather use a theorem proving environment for correctness, but one that is not based on type theory.
Based on what then?
I like Abstraction Logic: http://abstractionlogic.com
Haskell’s type system isn’t sound either, there is unsafePerformIO.
Nobody actually wants a language with a sound type system, unless they’re writing mathematical proofs. Any time you need to do anything with the external environment, such as call a function written in another language, you need an escape hatch. That’s why every language that aspires to real world use has an unsound type system, and the more practical it aspires to be, the more unsound the type system is.
Soundness is only a goal if the consequences of a type error are bad enough: your proof will be wrong, or an airplane falls out of the sky, or every computer in the world boots to a blue screen.
For everybody else the goal should be to balance error rate with developer productivity. Sacrificing double digits of productivity for single digits of error rate is usually not worth it, since the extra errors that a very sound type system will catch will be dominated by the base rate of logic errors that it can’t catch.
I think there is a misunderstanding on your side. You seem to mix up the soundness of the type-system with the property of referential transparency. Those are two orthogonal things.
Oh and to your argument:
> For everybody else the goal should be to balance error rate with developer productivity. Sacrificing double digits of productivity for single digits of error rate is usually not worth it, since the extra errors that a very sound type system will catch will be dominated by the base rate of logic errors that it can’t catch.
I think you are missing my point.
You are merely looking at a single point in time. And yes, you are right - the balance you mention matters. But what also matters is the future. A language needs to be able to evolve. If it does not do that, it will eventually die and become replaced. If the typesystem is well made with good foundations, the language will be able to evolve and adapt faster and causing less problems for its users.
What does "soundness" mean here? I love typescript's type system, except for the quirks that are inherited from javascript, which I sometimes find infuriating, and I'm wondering if you're talking about the same thing. For example, the fact that typescript- through javascript- only exposes a single `number` type is so annoying. Sometimes that's fine, but other times it would be nice to be able to say "No, this isn't just any damn number, this is an integer! This is an iterable ID for god's sake!" so you at the very least get a static error when you try to pass in `1.001`, or something that could be `1.001`. And that's just a very basic example of how a collection of number types with more granularity would be an improvement. Especially if they were a little more robust than that and were composable. Imagine being able to type `integer | negativeFloat` or other such wacky composed types. Ideally you could compose your types programmatically, and define shit like "the type is a `float` who's floor value has a % 3 of 0".
Informally, a sound type system is one that never lies to you.
Formally, the usual notion of soundness is defined with respect to an evaluation strategy: a term-rewriting rule, and a distinguished set of values. For pure functional programs this is literally just program execution, whereas effects require a more sophisticated notion of equivalence. Either way, we'll refer to it as evaluating the program.
There are two parts:
- Preservation: if a term `a` has type `T` and evaluates to `b`, then `b` has type `T`.
- Progress: A well-typed term can be further evaluated if and only if it is not a value.
https://www.typescriptlang.org/docs/handbook/type-compatibil...
https://www.typescriptlang.org/play/?strictFunctionTypes=fal...
https://github.com/typescript-eslint/typescript-eslint/issue...
Elixir is taking a similar direction if I'm understanding correctly (they seem to insist their type system is fundamentally from Typescript's, but to this day I cannot understand how, both are structural and based on set theory)
Is elixir's type system sound?
The type system is still in development so we haven't seen what the end result is yet, but I believe the system they're implementing is going to be sound.
Foo and Bar are now compatible with FooBar which can create a lot of issues even in strict mode. Typescript is amazing, but interfaces not being struct are a footgun in many places
What do you mean by "compatible"? You couldn't assign a `Foo` to a `FooBar` variable unless all `Bar` properties are optional
What issues?
[dead]
MLs require a lot of ceremony modelling simple record types.
What we want to express here is an object with a map of properties (name to type):
For the OOP minded: And also compose those: But at compile-time, of course.Have any languages achieved this? I know TypeScript can do some of these things, but it's clunky.
I think what you describe is called "extensible records". Elm had them in a prior version. You can implement them in a language with an expressive enough type system (e.g. Haskell or Scala) without special support, but the syntax can be a bit unwieldy.
Here's three recent papers on extensible records:
* https://arxiv.org/pdf/2108.06296 * https://arxiv.org/pdf/2404.00338 * https://dl.acm.org/doi/pdf/10.1145/3571224
I'm not claiming these are definitive in any sense, but they are useful guides into the literature if any reader wants to learn more. (If you are not used to reading programming language papers, the introduction, related work, and conclusions are usually the most useful. All the greek in the middle can usually be skipped unless you are trying to implement it.)
> I think what you describe is called "extensible records".
From what I read, gp only asks for a way to express the type, not a way to change/extend the type of something during runtime.
Elm's current implementation does this just fine.
This is often referred to as "row typing", and the only language I've ever used that implemented it first-class with polymorphism is Purescript[0]. It's possible to model such things in other languages with sufficiently powerful type systems, though they're normally not as nice to use.
As an aside, Purescript is one of the most fun languages I've ever used for frontend work, and I lament that Elm seems to have overtaken it in the FP community.
[0]: https://hgiasac.github.io/posts/2018-11-18-Record-Row-Type-a...
Even if Elm has had no new version in 4 years while purescript has?
You made me want to try purescript :)
The construction of the type 'Map<string, Type>' is entirely standard in languages like Agda and Coq (and I bet Idris too). In these languages, Type is itself a type, and can be treated like any other type (like string or int). Nothing clunky about it. (If you're curious, Type is usually referred to as a "universe type" in type theory circles.)
Haskell does it just fine with the HasField typeclass, but like everything Haskell, it's not always easy to find the information.
Ok, but why?
I like the beauty of an expressive type system as much as the next guy, but is there any scenario where:
Is better for solving real-world problems than or whatever?With mergeMaps, you don't have to write out all of the properties.
The Ceylon language allowed you to do that sort of thing with types. If you "added" two maps together of types `Map<String, int>` and `Map<String, String>`, you would get back a `Map<String, String | int>` (or, equivalently, `Map<String, String> | Map<String, int>`).
But for some slightly complex reasons, most language designers find adhoc union types (which are required for this to work) a bad idea. See the Kotlin work related to that, they explicitly want to keep that out of the language (and I've seen that in other language discussions - notice how tricky it can be to decide if two types are "the same" or whether a type is a subtype of another in the presence of generalized unions) except for the case of errors: https://youtrack.jetbrains.com/issue/KT-68296/Union-Types-fo...
> If you "added" two maps together of types `Map<String, int>` and `Map<String, String>`, you would get back a `Map<String, String | int>` (or, equivalently, `Map<String, String> | Map<String, int>`).
But these are obviously not equivalent: the first type is a map where all values are either strings or ints, and the second one is either a map where all values are strings, or all values are ints.
If that's confusing, consider: {foo: 1, bar: "2"}. It satisfies `Map<String, String | int>` but not `Map<String, String> | Map<String, int>`.
(In fact, the latter is a subtype of the former.)
P.S. You also seem to have misunderstood the toplevel comment about modeling record types as maps, as being about typing maps that exist in the language.
These types are equivalent if you consider a value of both of these types have the exact same read operations. Calling `get(String)` will return `String | int` in both cases. You're right that you could build a value of one of these types that does NOT conform to the union, however. I am not sure what's the technical name for what I am trying to say... are they "covariantly equivalent"???
EDIT: ok, I just wanted to say that one type, `Map<String, String | int>`, is a supertype of `Map<String, int> | Map<String, String>`, so if a function accepts the former, it also accepts the latter. They're not equivalent but you can substitute one for the other (one way only) and still perform the same operations (always assuming read-only types, if you introduce mutation everything becomes horrible).
I was trying to emphasize how having type "combinations" ends up causing the type system to become undecidable as you end up with infinite combinations possible, but as I haven't really gone too deeply into why, I am having trouble to articulate the argument.
> you would get back a `Map<String, String | int>` (or, equivalently, `Map<String, String> | Map<String, int>`)
How are these equivalent? Wouldn't the latter result in {foo:1, foo:"two"}, where the former wouldn't?
OCaml's first-class modules allow you to do this: https://ocaml.org/play#code=bW9kdWxlIHR5cGUgRk9PID0gc2lnCiAg...
Ocaml object system can also achieve this in a quite lightweight way
The most consistent solution with the least ceremony. Now it is a module, not a type though.
You can create first-class values of this module type, and since values have types, "it" is a type. Specifically, my_foobar has type (module FOOBAR).
Actually getting values out of such a module-typed structure does involve some ceremony, however:
Hence, Objective Caml! This can be modeled in OCaml as an object type,
For a family of types matching any object with those methods, I think you can write something likeYou can easily do composable tagged (by index type or even comp time strings) tuples in C++. The syntax is not pretty, but there is minimal to non-existent overhead.
On the other hand, thanks to the loose "duck typed" nature of templates, you can write functions that act on a subset of the fields of a type (even type checking it with concepts) while preserving the type of the whole object.
Scala 3 can do it with some macro tricks. But the performance is not as good as with nominally typed structures.
Go structs sorta behave this way if you put a "parent" struct in the first field of a struct definition. They are, of course, not maps though. But a map with statically defined keys/values/types is basically a struct.
I am cheating a bit by naming a non-general purpose language, but Nickel[1] does that, for example through composing contracts.
[1]: https://nickel-lang.org/
Not sure whether this is what you intended, but Go structs can embed other structs
This doesn't function as an interface though. You cannot pass a FooBar to a function that expects a Foo, for example, and although you can fairly easily reference the Foo-part of a FooBar instance (`foobar.Foo`) there is no way to pass e.g. an array of FooBar instances to a function that takes a slice of Foo[] as its argument. That's the problem to be solved.
Ok, Go generics is pretty limited but you can achieve this in C++ via concepts.
Totally achievable by using interfaces instead of structs
Then functions that accept a Foo will also happily take a FooBar. Does not solve the problem of passing a FooBar[] to a function that expects Foo[] but that can be solved with generics or a simple function to convert FooBar[] to Foo[].That sounds achievable at compile-time in c++ with some type-level map implementation such as in boost.mp11 (sadly the language does not allow general use of types as values preventing the use of standard map containers).
I raise the bar and say the relational model has it and can be made to work at type level.
Something like this in TypeScript:
> Have any languages achieved this?
”The MLs” solved it decades ago. Standard ML has very nice record types.
Is this something professional programmers are having trouble with??
This is the kind of problem you face in your first year working, no? I am honetly curious what others think. Do you have trouble deciding when to use an interface (assuming your language has that), or a type wrapper (I don't think that's the brightest idea), or a function to extract the field to sort by (most languages do that)??
Your confidence that there is an obviously optimal way to model something reads as either brilliance or hubris and brilliance is a lot rarer.
Typically there are subtle trade-offs and compromises which only prove themselves to be useful/detrimental as the software changes over time. You can place bets based on experience but you can only really be cocky about your choices when looking back not looking forward
> Your confidence that there is an obviously optimal way to model something
You're imagining things (uncharitably); that's not in the comment you replied to.
You might be right you know. I can't edit anymore but please mentally strike out that first paragraph.
I did read a lot into the '??' punctuation which might not have been intended.
> Is this something professional programmers are having trouble with??
I would say, yes. So does everybody else. If you don't believe it, I don't think you appreciate the depth of these problems.
I am reminded of Eric Meijer saying that the interface (in the Java sense) part of the contract between types is the least interesting part. What is important are their algebraic properties, which can be modeled using functional types and other bunch of advanced type-theoretical ideas (like linear types for instance).
Modelling stuff with types is not easy, it's an art.
Most professional programmers don't worry much about elegance and just write code that gets the job done.
> Is this something professional programmers are having trouble with??
Nobody said this.
There is value in thinking about things like this sometimes, because it has long-term consequences for the projects we work on. Even if you're a "professional" programmer (whatever that means), it's valuable to go back to beliefs and knowledge you've established long ago to evaluate whether to change them in the face of new experiences you've made since the last time you've thought about it.
If you think "professional" programmers don't get this sort of thing wrong in some form or another, I have a bridge to sell you.
Why scare quotes though? Professional programmer just means someone who writes software for a living. Software is your profession... there's no doubt what that means , is there??
If you do that, you'll have run into this sort of decision very early in your career, and hopefully will understand the best way to handle it, which IMHO just depends on the language (because certain features lead to different optimum solutions) and the culture around it. But sure, I am happy to discuss fundamental topics like this, that's why I am engaging and asking what others think.
Sometimes I do, not everyone is a big brain coder.
It’s useful to examine fundamentals now and then.
Some of the ideas in the blog must be speculative, given that it fell through a wormhole, being published September 17, 2024.
published…2024-09-17?
I wonder what else kqr can tell us about the future.
Sounds like someone needed a type that expresses "date no later than now".
Nice catch!
Rich Hickey would say "just use maps" and avoid all this navel-gazing
It does not solve the problem. Would you rather have a function that takes a map and puts additional keys in it (ex. `timestamp`) or a function that takes a map and stores it in a new map alongside the additional keys? In any case, how would you write a generic function that works on those additional keys?
This is a lot of complication for in what most OOP languages with interfaces would simply be something like:
Why is this so hard in Haskell? It doesn't have interface polymorphism?I think TFA is talking about nesting fields in whatever order and also being able to extract the fields without ceremony like `get(get(get...))`—just allow `get(...)`. If you wanted an example more close to the submission you would have a multi-field class inside that class. Then you would perhaps solve the nested field access boilerplate by manually implementing it in the interface methods.
I guess I'm missing why you would model these interfaces as nested, when obviously nesting has the ordering issue.
Because you don't want to have to enumerate every possible combination up front. Mixins are probably the closest OO concept.
It's not hard at all and it's actually what the second approach listed by the author shows:
which adjusted to your example would be The problem with this approach is that you'll have duplicated data on all instances. In your example, `Quote` has the fields `timestamp` and `sender` in order to satisfy with `Timestamped` and `Msg`. If you had several classes like `Quote` and interfaces like `Timestamped` then you would end up with a lot of duplicated code.Doesn't PHP solve this with traits?
IIRC you can have traits automatically implement this sort of behavior with a centralized implementation.
> Why is this so hard in Haskell?
The challenge is composition. Adding Timestamped to something in a static, type-checked way, but not modifying the source code of that something.
> It doesn't have interface polymorphism?
This is besides the point, but its interface polymorphism is static, not dynamic. If you have a List which is IMappable, and you do some IMappable operations on it, in most OOP languages you get back an IMappable, but in Haskell you get back List.
I would love to just throw a field Optional[Datetime.UTC] on Msg type and call it a day :). Cool article got me thinking at 6:00 AM
> People say that types-as-interfaces are not composable
Who says that and what does it even suppose to mean?
Primitive types and schemas.
Complex types and objects don't exist.
Embrace Mereological Nihilism.
It's fun at meetups to tell everyone your programming paradigm is Nihilism.
What is fundamental, types of bytes?
Yeah, very roughly speaking.
In the philosophic position of mereological nihilism only "simples" exist when discussing objects, etc, simples are akin to byte types for sure and primitive types maybe.
Nothing complex, like objects, higher level types, etc. exist.
Composition of simples in time and space is what determines everything other than simples. Anything that isn't a simple is just emergent from the space/time arrangement of the simples.
So if the implications are that nothing is real outside of time/space/simples, then calling something higher level a "type" is to label something as discrete and real when it isn't, so now you are using language and models that are wrong to reason about things, which means your model will have frustrating drift from reality that you can't rectify.
That's where schemas come in, we are just labeling common arrangements of simples for semantic reasons but they don't really exist so they shouldn't be elevated to the position of a type as that is a wrong abstraction and causes divergence and mess.
The C ”type system", circa 1985.
oh I made a a typo. Meant "types" OR "bytes"
problem of known types, or interfaces, as values is that conditions for the value format or validation changes based on context. so you might have a price that is valid in one context but not in another(like a negative value). the problem then occurs on the input where you might let in value that is valid on higher level but not on deeper level and by the time you detect it it might be too late(you have committed a transaction with invalid data).
types are not interfaces.
interfaces describe behavior, types describe shape and structure. the difference is subtle but important.
> interfaces describe behavior, types describe shape and structure
Shape and structure are behavior. There's more to behavior than "abstractly produces X result". Behavior is "produces X result in Y form given Z in W form".
Shape and structure are behavior.
They shouldn't be. Conflating two things that can be separated is just compounding complexity. You don't need to know how a database lays out memory or the structure behind a web server.
> You don't need to know how a database lays out memory
You do, however, need to know what logical columns are in a table and the types of those columns to be able to effectively query against the table. And you need to know what the types of the inputs to a query wrapper function are to be able to call it properly.
Memory layout has nothing to do with type, because physical memory layout is completely separate from the semantic logical layout and form represented by the bits. You now appear to be the one conflating two unrelated things.
That something is treated as an integer fundamentally matters to its use. That the thing comprises some number of adjacent bits in big/little-endian arrangement is a very unrelated implementation detail.
And you need to know what the types of the inputs to a query wrapper function are to be able to call it properly.
That's the interface.
Memory layout has nothing to do with type,
So float, int, unint64, int8 and a vector/array don't have specific memory layouts?
the semantic logical layout and form represented by the bits
This is nonsense and doesn't mean anything.
> So float, int, unint64, int8 and a vector/array don't have specific memory layouts?
The memory layouts don't need to be known to the user. Different hardware architectures can have the concept of floats and ints and code can be written using them the same way while the underlying representations are distinct. I promise that IEEE754 is not the only way to represent floating point values handed down from god to man on a golden scroll.
> > And you need to know what the types of the inputs to a query wrapper function are to be able to call it properly.
> That's the interface.
Funny, I recall you not very long ago saying "interfaces describe behavior, types describe shape and structure". Now you acknowledge that the interface necessarily includes the types of the arguments?
The memory layouts don't need to be known to the user.
This isn't relevant one way or another.
Different hardware architectures can have the concept of floats and ints
This also isn't relevant. Data types aren't expected to be cross platform unless they are specifically made for that, like file formats.
Funny, I recall you not very long ago saying "interfaces describe behavior, types describe shape and structure"
I never said that.
Also just because two things work together doesn't mean they are the same thing.
> > The memory layouts don't need to be known to the user.
> This isn't relevant one way or another.
What needs to be known to the user of an interface is the most relevant aspect of describing that interface.
> Data types aren't expected to be cross platform unless they are specifically made for that, like file formats.
It's going to be difficult to reconcile that position with https://en.wikipedia.org/wiki/Abstract_data_type
Data types can optionally specify machine representation but need not. They specify behavior (i.e. possible values and operations) first and foremost. In programming, nearly every use of data type is in the abstract, entirely separate from its hardware representation. The "Int", the "Bool", the "2", the "UTCTime" being specified in TFA don't care about bit arrangements. They're describing valid values and operations, not hardware.
> I never said that.
You're right. Apologies. It was the other poster at the top of the thread. But it still seems apt to the thread and to your specific portrayal.
don't care about bit arrangements.
Not every scenario cares about bit arrangements but they are still there.
You seem to just be shifting around and coming up with new, more abstract arguments that drift further from whatever point you originally had.
You said
"Shape and structure are behavior."
One is data, one is execution. These are two different things.
The more they are conflated together, the more problems people have with their programs.
Not separating them and letting them mix is a huge part of bad software architecture and leads to lots of unnecessary complexity.
Interfaces are usually understood as a subset of types. Interfaces specifically describe method signatures. Nevertheless, the distinction is murky because any structure can be interpreted in terms of getters and setters which can be considered methods.
In Haskell behavour is type-directed (type classes) and they describe both.
Depends on the language. In typescript and interface can absolutely encode shape. In C# you can encode shape in interfaces via properties. Interfaces don’t even describe behavior, it’s just that we typically associate them with a man implicit contract. Strictly speaking they are just a shape.
fair, maybe i should have said "protocols define behavior" instead of interface
meh. or you could use mixins.
[dead]
Differences in foundational understanding & unteachability of foundational research causes inevitably ton of confusion in texts on formal science ideas outside a known academic context.
This text should not fascinate a programmer but create frustration of two types: (1) Frustration on one's lack of small pieces knowledge (2) Frustration that NOW you will not have the chance to invent this on your ow; your creative process takes damage.
Out of which the second type should be the one that makes majority of cases.
ALSO this should cause one to have respect of form: "hey, this programmer was probably at least smart enough to figure out this alone."
Perhaps most polite woule be to, for every text, in the situation to have notice at beginning: "For programmers who already have thought about what would happen were you to add X to Y but so that Z enough to probably not to get new ideas in this context."