- if you are creating a file, to ensure full synchronisation you also need to fsync the parent directory, otherwise the file can be fsynced but the update to the directory lost
- if sync fails, you can not assume anything about the file, whether on-disk or in memory, critically one understanding which got dubbed "fsyncgate" and lead to many RDBMS having to be updated is that you can not portably retry fsync after failure: the earlier error may have invalidated the IO buffers but IO errors may not be sticky, so a later fsync will have nothing to write and report success
> otherwise the file can be fsynced but the update to the directory lost
It also goes the other way - the update to the directory can be fsynced but the file lost. This can break the "create temp file, write, close, rename to current" scenario (when the intention is to replace file contents atomically).
POSIX doesn't guarantee the order in which data hits the disk, so the above scenario can become "create temp file, write [contents still in memory only], rename to current [written to disk], power failure".
I believe there was a bug where this scenario had worked for a long time then one filesystem (ext4?) pushed closer to what is admissible under POSIX (non-obvious reorders of physical writes) and people started getting random data corruption in programs which used write & rename.
> It also goes the other way - the update to the directory can be fsynced but the file lost. This can break the "create temp file, write, close, rename to current" scenario (when the intention is to replace file contents atomically).
> POSIX doesn't guarantee the order in which data hits the disk, so the above scenario can become "create temp file, write [contents still in memory only], rename to current [written to disk], power failure".
Wait.. is there a way to do this correctly? At which points is fsync warranted, and how many fsyncs do we need for the whole "write file then mv it on top of current" to not lose data on power failure?
Write, fsync, rename then fsync directory if you need an ordering guarantee that the rename is a transaction barrier.
Of course, the fun part is that the filesystem can’t really guarantee fsync behavior if drives lie about it which many consumer drives do for benchmark reasons. Fun, no?
It would also be a fun experiment to write, fsync, cut power. Rinse and repeat. On differential drives. It would quickly show which drives are lying about syncing.
Well generally OP is correct that SQLite will more likely have written fsync code correctly and be fairly robust against power failures due to the WAL. Additionally, it’s not doing any renaming which you would need with plain files.
So while it doesn’t do any other magic, it’s more likely to handle power failures correctly.
But the authors of SQLite have studied and correctly implemented the working fsync logic, so you don't have to. It ·may* be a correct approach if it's your implementation detail, and.nobody else expects that file you're replacing.
I think the idea was to store data in the db, instead of scattered over multiple files and directories. Then you only have to worry about one file to fsync (Two if using a wal).
`fsync()` is not necessarily enough if your on-disk format is complex because you still need to write recovery code. Of course, SQLite3's on-disk format is very complex, and so it requires correspondingly more complex power failure recovery code. But SQLite3 has an excellent test suite, and in particular they have an excellent power failure recovery test suite.
Using SQLite3 to avoid power failure issues is pretty good advice. I don't see why GP is getting downvoted.
I have found a blog post from 2009 about this issue[1]. I think the "recent Ubuntu bug [that] has gotten slashdotted" is [2], so I think the most technical discussion about this would be [2].
I also found an interesting "scar tissue" from that bug in the current ext4 docs[0]:
"If auto_da_alloc is enabled, ext4 will detect the replace-via-rename and replace-via-truncate patterns and force that any delayed allocation blocks are allocated such that at the next journal commit, in the default data=ordered mode, the data blocks of the new file are forced to disk before the rename() operation is committed. This provides roughly the same level of guarantees as ext3, and avoids the “zero-length” problem that can happen when a system crashes before the delayed allocation blocks are forced to disk."
So if I do fopen/fwrite/fsync/fclose, that is not enough? That is crazy, I think 90% of apps don't fsync the parent directory. Also, how many levels of parents do you need to fsync?
That should be only for creating files, and maybe updating their metadata (not sure about that one).
The confusion stems from people thinking that files and directories are more different than they are. Both are inodes, and both are basically containers for data. File inodes are containers for actual data, while directory inodes are containers for other inodes.
All inodes need to be fsynced when you write to them. For files this is obviously when you write data to them. For directories, this is any time you change the inodes they contain, since you’re effectively writing data to them.
You only need to sync the direct parent because the containers aren’t transitive; the grandparent directory only stores a reference to the parent directory, not to the files within the parent directory. It’s basically a graph.
> rename() doesn't cause a fsync of the parent directory?
It does not.
At best it will schedule a journal commit asynchronously (I recall that ext4 maintainer complained about adding this "workaround for buggy user code" on lkml). If you want to receive an IO error when renaming fails, make sure to call fsync() yourself.
I’m not well versed enough on the subject to know if there’s important nuance here that I’m missing… but that doesn’t sound like a contradiction to me. That’s effectively a map with O(n) access, right? I think I would more generally refer to that as a “collection”, but it certainly contains inodes as you describe it.
IMO it's better to think of a directory as "containing" only the named references to inodes. Other directories on the same filesystem may contain other named references to the same inodes ("hard links").
The inodes themselves are more like free floating anonymous objects independent of any particular directory, and might not have a named reference at all (O_TMPFILE; or the named reference was deleted but a file descriptor is still open; or orphaned inodes due to filesystem corruption or someone forgetting to fsync the directory after creating a file as masklinn pointed out - e2fsck will link these in /lost+found). This is also why chmod appears to affect other hard links in different directories: Because it actually modifies the inode (file permissions are recorded in that), not the named reference.
This definitely seems like a more meaningful distinction to me. It’s also closer to the mental model I had coming into the discussion, FWIW.
And I think it makes the collection/container terminology distinction sharper too. Depending on context, I think it’s usually reasonable (if imprecise) to describe a bucket of references or pointers to things as a collection of those things. But I don’t think it makes as much sense to call it a container of those things, except in a really abstract sense.
do you mean it's a basically a tree? Because if it were just a graph, you could still have edges from the grandparent to the grandchild in addition to the one from the former to the child
This ... depends. In normal POSIX land, hard links break the tree structure, for one, so you get a DAG but not a tree. I think some file systems do enforce tree structure, though - hard links are not supported everywhere.
It used to be possible ages ago to hard link to directories, which meant that you could have actual cycles and a recursive tree-walking algorithm would never terminated. (As far as I know you can still do this by editing the disk manually, although I think fsck will make a fuss if it detects this.)
You can still, with the right syscalls and drivers, do something like hard links on NTFS (I think they're technically called mount points but it's not the same thing as POSIX ones). I'm not sure if you can still make directory cycles, and you're probably a bad person if you do.
Crazy is the right term. File system APIs in general have too many sharp edges and need a ground-up rethink.
Consider S3-like protocols: these recognise that 99% of the time applications just want “create file with given contents” or “read back what they’ve previously written.”
The edge cases should be off the beaten path, not in your way tripping you up to when you want the simple scenario.
Aren't the edge cases features? An abstraction (or different API, sure) is in order to prevent footguns. However, this abstraction should not force fsyncs for example, due to the performance impact mentioned. It puts the choice of guaranteed writes vs performance to the developer.
A better abstraction, designed from the ground up, wouldn’t force fsyncs to work.
For example, write groups or barriers (like memory barriers) would be wonderful. Or a transaction api, or io completion ports like on windows.
In a database (and any other software designed for resiliency), you want the file contents to transition cleanly from state A to B to C, with no chance to end up in some intermediate state in the case of power loss. And you want to be notified when the data has durably written. It’s unnecessarily difficult to write code that does that on top of POSIX in an efficient way. Most code that interacts with files is either slow, wrong or both. All because the api is bad.
> It puts the choice of guaranteed writes vs performance to the developer.
Yes, and it's a completely false choice. This entire point of this thread is that fsync is an incredibly difficult API to use in a way that gets you the guarantees you need ("don't lose the writes to this file"). And that the consistency guarantees of specific filesystems, VFS, POSIX, and their interactions are not easy to understand even for the experienced -- and it can be catastrophic to get wrong.
It isn't actually a choice between "Speed vs correctness". That's a nice fairy tale where people get to pretend they know what they're up against and everyone has full information. Most programmers aren't going to have the attention to get this right, even good ones. So then it's just "99.9% chance you fucked it up and it's wrong" and then your users are recovering data from backups.
The filesystem is very easy to use for simple things by comparison to a DB, and it's more accessible from the shell. But you're right, the filesystem is very difficult to use in a power failure safe way. SQLite3 has great power failure recovery testing, so the advice to use SQLite3 for any but the simplest things is pretty good.
It'd be very nice to get some sort of async filesystem write barrier API. Something like `int fbarrier(int fd)` such that all writes anywhere in the filesystem will be sync'ed later when you `fsync()` that fd.
It would also be very nice to have an async `sync()`/`fsync()`. That may sound oxymoronic, but it's not. An async `sync()`/`fsync()` would schedule and even start the sync and then provide a completion notice so the application can do other work while the sync happens in the background. One can do sync operations in worker threads and then report completion, but it'd be nice to have this be a first class operation. Really, every system call that does "I/O" or is or can be slow should be / have been designed to be async-capable.
> So if I do fopen/fwrite/fsync/fclose, that is not enough?
That is my understanding.
> Also, how many levels of parents do you need to fsync?
Only one, at least if you didn't create the parent directory (if you did then you might have to fsync its parent, recursively). The fsync on the parent directory ensures the dir entry for your new file is flushed to disk.
I've never heard about this in my years of programming. I just tried to read through the Win32 documentation, as I've done several times over the years, and it mentions a lot of edge cases but not this that I could see.
Is this some Linux/Unix specific thing? Am I blind?
As far as I know, it's specific to the combination of certain POSIX-ish OS and file systems, like linux/Ext3. I have no clue what BSD does here, or whether ReiserFS is different.
Windows/NTFS is a different world, there are still edge cases that can go wrong but I don't think this particular one is a problem because FAT/NTFS is not inode-based.
I imagine if you looked at the SQLite source code you'd see different edge-case-handling code for different OSes.
The thing about Windows is that because the file open operation (`CreateFile*()`) by default prevents renames of the file, Windows apps have come to not depend so much on file renaming, which makes one of the biggest filesystem power failure hazards less of an issue on Windows. But not being able to rename files over others as easily as in POSIX really sucks. And this doesn't completely absolve Windows app devs of having to think about power failure recovery! "POSIX semantics" is often short-hand for shortcomings of POSIX that happen to also be there in the WIN32 APIs, such as the lack of a filesystem write barrier, file stat-like system calls that mix different kinds of metadata (which sucks for distributed filesystems protocols), and so on. And yes, you can open files in Windows such that rename-over is allowed, so you still have this problem.
Ya NTFS file naming and long path is a wreck. Unzipping files from mac/linux is a easy way to end up with missing data. Applications quite often break on long file paths especially Microsofts own stuff like powershell.
More specifically, these things are for trying to improve the behavior on unclean system shutdown (e.g. power loss) which is inherently chaotic and unless all parts (most critically the disk and its controller) are well behaved you don't have any real guarantees anyway.
Windows also doesn't guarantee that data is written to disk by the time WriteFile/CloseHandle returns, the Windows version of fsync is FlushFileBuffers.
One of the advantages of using a database for files is that it's relatively more likely that these platform-dependent considerations were indeed considered and you don't have to moonlight as an DBMS engineer while writing your application.
One reason Iike use sqlite so much in my personal projects, sometimes just using it to store various config data bits within row entries of the most generic table ever, works well with small projects, probably not so much with large scaled ones.
Great another case of simple thing everyone knows is simple but turns out to be horrifyingly complicated and heavily system dependent, but only on occasion so you can get 80% of the way through your career before encountering the gaps in your knowledge.
I guess I'll add it to the list.
Of course on the other hand, I was already thinking that I should just use SQLite for all my file handling needs. This little nugget makes me think that this was the correct intuition. [Queue horrifying revelations w.r.t. SQLite here.]
w.r.t SQLite, the only horrifying revelation I’ve had is that it allows NULLs in composite primary keys, which I’ve seen lead to some nasty bugs in practice.
You don't need even to bother with fsync unless you are developing a database or use the file system like a database.
There's a reason Apple made fsync useless and introduced F_FULLSYNC. They know developers have an incentive to overestimate the importance of their own files at the detriment to system responsiveness and power draw.
The problem is, Apple's fsync doesn't even introduce any useful semantics. All it says is "write this eventually".
What a lot of people really need is just a "provide ordering, don't leave me with inconsistent data" operation. If you lose power "after" a write but before the fsync, for any non-networked application that's no different than losing power before the write, as long as the filesystem doesn't introduce chaos.
That seems significantly harder to implement, though, given that dirty page cache entries don't have an order entry as far as I know, and so retroactively figuring out which writes not to reorder with others is anything but trivial.
"if you are creating a file, to ensure full synchronisation you also need to fsync the parent directory, otherwise the file can be fsynced but the update to the directory lost."
And if you need this in Java you still have resort to ugly hacks.
The point is that there is no official way to fsync a directory in Java and that everyone is relying on an unintentional side effect of an unrelated function to accomplish it. The link I supplied is about the fact that the side effect briefly disappeared in Java 9 until enough people complained.
We're still living in xkcd 1172 land with this, have been for a decade or who knows how long.
The article is only (initially) talking about the first, and that is valid.
The second is just second-guessing the OS and hardware environment, and is invalid.
Here's the rule to figure out if you need fsync() or not: "If you think you might need fsync(), you don't." ;)
There are almost no cases where you should worry about the underlying layers. Basically if you aren't writing the filesystem itself, then you shouldn't be calling fsync().
You have to check what open()/close() etc said, but if close() said it worked, then it worked. You're done. The fact that lightning might have struck the drive just exactly then is not your problem and not something you should try to do anything about.
Unreliable networks, busses, batteries, etc none of that changes this. Those things all already have their own layers with their own responsibilities to be doing all the necessary testing and verifying and retrying before they return a success code to you.
There is open(...,O_SYNC) and mount -o sync (udev rules) for the case of a camera or thumb drive connect by usb etc.
It's not merely that you don't have to, it's that it's actively wrong to. fsync() is just joggling someone else's elbow while they're trying to do their job, and if it seems to solve some problem, that actually just exposes that you have some logic or order of operations problem and you aren't doing your own job.
"trust the other layer" or "trust the api contract" is unrelated to and does not conflict with "Be forgiving in your inputs and strict in your outputs.".
It just means:
Do: Check the returned error from, say, malloc().
Don't: Get a success from malloc() and then go try to do things to prove that malloc() actually did work.
That would be insane and impossible because it would have to apply equally to everything, including every single keyword or function you would use as part of the verification. How do you know when you so much as set a value to variable that it actually got set? If you printf the variable to prove it, how do you know printf didn't lie?
The logic for trusting close() is no different from the logic for trusting malloc().
Our responsibility is just to stay within the bounds of defined behavior and not make any assumptions about anything that isn't promised.
> The fact that lightning might have struck the drive just exactly then is not your problem [...] If you think you might need fsync(), you don't. [...] Those things all already have their own layers with their own responsibilities to be doing all the necessary testing [...]
You're mixing up the concepts of durability and consistency in pretty significant ways here, and are implying that everybody that is fine with a lack of the former will also be fine with a lack of the latter.
This is absolutely not true, and can cause extremely painful and hard-to-track-down bugs.
For better or worse, there is no way to directly tell the OS "whatever you do, make sure you don't reorder these writes I just did with those other writes I'm about to do".
The next best (portable) thing we have to achieve that outcome is fsync. It's a bit heavy handed, in that it gives you durability even if you only want consistency. That absolutely doesn't mean it's redundant, though.
> Basically if you aren't writing the filesystem itself, then you shouldn't be calling fsync().
Given that fsync is a syscall, but file systems are generally implemented in the kernel, this is a pretty nonsensical statement by itself.
File systems usually have (and need, for performance) a much lower level view of the underlying block storage, and fine-grained control over it.
Just as one example, Linux has the concept of write barriers. Not using these correctly (in the filesystem driver) can cause data leaks across files owned by different users and processes.
"there is no way to directly tell the OS "whatever you do, make sure you don't reorder these writes I just did with those other writes I'm about to do"
Perhaps because there is no reason for such a thing to exist.
Text editors: I want my file content to either be what it was before I pressed "save", or what it was afterwards; I don't want my file to be mangled halfway through. (In other words, in this case I need consistency, but not necessarily durability.) How would you do that without fsync?
Databases: People generally don't like unrecoverable consistency errors just because their computer crashed during a write. Not generally possible with reordered writes.
Sometimes people also need durability on top of consistency, e.g. for everything where you want to make at most one request to some server; you can do that by e.g. writing "I did the thing" to a log file, fsync'ing it, and then making your request.
// CheckClose is a utility function used to check the return from
// Close in a defer statement.
func CheckClose(c io.Closer, err *error) {
cerr := c.Close()
if *err == nil {
*err = cerr
}
}
I caution against this approach, as you are not really dealing with the error when it occurs. If the work you do after the defer has other side effects, you may have just gotten your application into an inconsistent state and it's very hard to see in code why this might be.
`defer` is really not well-suited for error handling, its benefit is mainly in resource cleanup where failure is impossible or doesn't matter. (This makes it fine for `Close` on read-only file I/O operations, and not so great for writes.)
> If the work you do after the defer has other side effects, you may have just gotten your application into an inconsistent state and it's very hard to see in code why this might be.
Can you give an example case of how this could happen?
This is a contrived example, but imagine a situation where I have a file I want to write on disk and then have a reference to it in a database. If I have a flow like:
This function might have updated the user in the database with a new file despite the fact that `CheckClose` (defined up-thread) does check to see if the `Close` failed and returned an error. The calling code won't have known this has happened.
The core problem is that the error checking is not done soon enough, either because Go programmers are conditioned to `defer f.Close()` from nearly all example code -- most of it demonstrating reads, not writes -- or because they are handling the error, but only in a deferred function, not earlier.
`Sync()` flushes the data to disk, and `Close()` gives a "last-chance" opportunity to return an error. The `defer f.Close()` exists as a way to ensure resource cleanup if an error occurs before the explicit `f.Close()` toward the end of the function. As I mentioned in an update to the post, double `Close()` is fine.
I think a better solution is to write smaller, single-purpose functions. To refer to your example downthread, you should have one function that only writes the file, and another that does the "whole" operation -- calling the function to write the file, checking for errors, and then updating the database.
Then you can use defer in the file-writing function if you so please, and not bother to close at the end explicitly, without issue. A more robust example might be to even include the sync call in the deferred function (and even clean up the file itself on error). To re-use your example from your blog post:.
func helloNotes() (err error) {
var f *os.File
f, err = os.Create("/home/joeshaw/notes.txt")
if err != nil {
return
}
defer func() {
if err == nil {
err = f.Sync()
}
cerr := f.Close()
if err == nil {
err = cerr
}
if err != nil {
os.Remove("/home/joeshaw/notes.txt")
}
}()
err = io.WriteString(f, "hello world")
return
}
I would probably move that out into a helper, though, so I could do something like
defer SafeClose(f, &err)
instead, and be able to use it elsewhere. Hell, even without defer, it's nice to have a helper that will sync and close for you so you can avoid the boilerplate, if you have lots of different bits of code that writes files.
FWIW, I'm not sure why you are so negative on named return values, but I'm at best a novice Go programmer, so perhaps I don't fully understand why they aren't great (I guess it does look weird to me to have bare `return` statements that do actually return a value even though it doesn't look like it). Your argument about the return value possibly being modified after the core function finishes being unintuitive doesn't really strike me as a big deal either.
> If the work you do after the defer has other side effects
Defer is by definition the last work you do in a function, there won't be more work except by the caller who will get the error returned to them.
If you are structuring a function that writes a file, and then does something with it, defer isn't appropriate, since you should close it before you do any more work.
It's possible to have multiple defers in a function though (so you have multiple "last work in a function"; nowhere is it dictated that a function should only have one operation that needs to clean something up at the end. Think for example copying one file to another.
I use a similar pattern, cribbed from one of the Go databases that aspired never to ignore errors. The difference is that errs.Capture preserves existing errors and formats the error with a message--important if generalized to handle any error function.
package errs
// Capture runs errFunc and assigns the error, if any, to *errPtr. Preserves the
// original error by wrapping with errors.Join if the errFunc err is non-nil.
func Capture(errPtr *error, errFunc func() error, msg string) {
err := errFunc()
if err == nil {
return
}
*errPtr = errors.Join(*errPtr, fmt.Errorf("%s: %w", msg, err))
}
I conventionally use mErr to distinguish from err.
func doThing() (_ string, mErr error) {
f, err := os.Open("foo")
if err != nil {
return "", fmt.Errorf("open file: %w", err)
}
// Use the file...
defer errs.Capture(&mErr, f.Close, "close file")
return "", nil
}
I still question why defer doesn't support doing exactly that.
After all it's like the go language provide us with a cleanup function that in 99% of the time shouldn't be used unless we manually wrap what it's calling to properly handle error.
> I still question why defer doesn't support doing exactly that.
When would it ever be useful? You'd soon start to hate life if you actually tried using the above function in anything beyond a toy application.
> 99% of the time shouldn't be used
1. 99% of the time it is fine to use without further consideration. Even if there are errors, they don't matter. The example from the parent comment is a perfect case in point. Who cares if Close fails? It doesn't affect you in any way.
2. 0.999% of the time if you have a function that combines an operation that might fail in a manner you need to deal with along with cleanup it will be designed to allow being called more than once, allowing you, the caller, to separate the operation and cleanup phases in your code.
3. 0.001% you might have to be careful about its use if a package has an ill-conceived API. If you can, fix the API. The chances of you encountering this is slim, though, especially if you don't randomly import packages written by a high school student writing code for the first time ever.
I think from a Go point of view, the lesson to be drawn from that is "don't defer a function call if you need to check its error value", rather than "defer needs to support checking of function return values".
In the example at hand, it really makes more sense to call Close() as soon as possible after the file is written. It's more of an issue with the underlying OS file API making error checking difficult.
In 99% of cases, the solution to this problem will be to use a WriteFile function that opens, writes and closes the file and does all the error handling for you.
> the lesson to be drawn from that is "don't defer a function call if you need to check its error value"
Isn't the lesson here: If you must have a Close method that might fail in your API, ensure it can safely be called multiple times?
As long as that is true, you can approach it like you would any other API that has resources that might need to be cleaned up.
f, _ := os.Create(...)
defer f.Close()
// perform writes and whatever else
if err := f.Close(); err != nil {
// recover from failure
}
(os.File supports this, expectedly)
> the solution to this problem will be to use a WriteFile function
If it were the solution you'd already be using os.WriteFile. It has a time and place, but often it is not suitable. Notably because it requires the entire file contents to be first stored in memory, which can become problematic.
Certainly you could write a custom WriteFile function that is tuned to your specific requirements, but now you're back to needing to be familiar with the intricacies of a lower-level API in order to facilitate that.
Sure, that's an alternative, although it means there will be some code paths where the error returned by f.Close() becomes the error returned by the entire function and others where it is ignored (though you could easily log it). That might be fine, but you also might want to handle all the cases explicitly and return a combined error in a case where, say, a non-file-related operation fails and then the file also fails to close.
> becomes the error returned by the entire function
If you find the error returned by f.Close to be significant, are you sure returning again it is the right course of action? Most likely you want to do something more meaningful with that state, like retrying the write with an alternate storage device.
Returning the error is giving up, and giving up just because a file didn't close does not make for a very robust system. Not all programs need to be robust, necessarily, but Go is definitely geared towards building systems that are intended to be robust.
You seem confused. The article is about writing a file where it does matter, but the comment example, which is what we're talking about, only reads a file. If close fails after read, who gives a shit? What difference is it going to make? All your read operations are complete already. Close isn't going to trigger a time machine that goes back and time and undos the reads you've performed. It is entirely inconsequential.
You ignore errors on close, and one morning you wake up with your app in CrashLoopBackoff with the final log message "too many files". How do you start debugging this?
Compare the process to the case where you do log errors, and your log is full of "close /mnt/some-terrible-fuse-filesystem/scratch.txt: input/output error". Still baffling of course, but you have some idea where to go next.
To start, you need to figure out why Kubernetes isn't retaining your stack trace/related metadata when the app crashes. That is the most pressing bug. Which is probably best left to the k9s team. You outsourced that aspect of the business of good reason, no doubt.
After they've fixed what they need to fix you need to use the information now being retained to narrow down why your app is crashing at all. Failing to open a file is expected behaviour. It should not be crashing.
Then maybe you can get around to looking at the close issue. But it's the least of your concerns. You've got way bigger problems to tackle first.
A file not able to opened is expected, always! accept is no exception here. Your application should not be crashing because of it.
If I recall, Kubernetes performs health checks over HTTP, so presumably your application is using the standard library's http server to provide that? If so, accept is full abstracted away. So, if that's crashing, that's a bug in Go.
Is that for you to debug, or is it best passed on to the Go team?
There isn't a bug, it's resource exhaustion. You open a bunch of files and they fail to close. You don't log errors on the close, so you have no idea it's happening. Now your app is failing to open new file descriptors to accept HTTP connections. You get a fixed number of fds per app; ulimit -n. If you don't close files you've read, the descriptor is gone.
The bug in this case is in the filesystem that hangs on close. It happens on network filesystems. You can't return the fd to the kernel if your filesystem doesn't let you.
The bug of which we speak is in that your app is crashing. Exhausting open file handles is expected behaviour! Expected behaviour should not lead to a crash. Crashing is only for exceptional behaviour.
The filesystem hanging is unlikely to be a bug. The filesystems you'd realistically use in conjunction with Kubernetes are pretty heavily tested. More likely it is supposed to hang under whatever conditions has lead that to happen.
And, sure, maybe you'll eventually want to determine why the filesystem has moved into that failure state, but most pressing is that your app is crashing. All that work you put into gracefully handling the failing situation going to waste.
Kubernetes is really here nor there. It's the crashing of the app that is our focus. An app should not be crashing on expected behaviour.
That's clearly a bug, and the bug you need to fix first so that you can have your failsafes start working again. You asked where to start and that's the answer, unquestionably.
The app doesn't crash, it's deadlocked. It can't do any more work because to do future work it needs to accept TCP connections. It can't do that because it has hit a resource limit. It hit the resource limit because it didn't correctly close files. It can't close files because of a bug in the filesystem. You don't know this because you didn't log the errors.
I really don't know how I can make my explanation simpler.
> I really don't know how I can make my explanation simpler.
Not making up some elaborate story that you now are trying to say didn't even happen would be a good start. What you are actually trying to communicate is not complicated at all. It didn't need a story. Not sure what you were thinking when you decided fiction writing was a good idea, but I certainly had fun making fun of you for it! So, at least it was not all for not.
Only if you can safely assume the OS, file system, or std lib cleans up any open file handles that failed to close; I'm 99% sure this is the case in 99% of cases, but there may be edge cases (very specific filesystems or hardware?) where it does matter? I don't know.
You can't safely assume that, but what are you going to do about it when it does fail? There is nothing you can do. There isn't usually a CloseClose function to use when Close fails. If Close fails, that's it. You're otherwise out of luck.
Certainly, in the write failure case where there is write failure you'd want to try writing to something else (ideally), notify someone that an operation didn't happen (as a last resort), or something to that effect in order to recover.
But in this case there is no need to try again and nobody really cares. Everything you needed the resources for is already successfully completed. If there is failure when releasing those resources, so what? There is nothing you can do about it.
Exactly that; for critical operations like e.g. a database, if a write fails you've got corrupted data and you have a Major Issue.
That said, I'm not sure how they would handle a file close failure, wouldn't the file be corrupted anyway because some of the bits may have been written? Then again, at least you can raise the alarms if Close fails, because silent failures are worse than failures.
We're talking about a read-only case. os.Open returns a read-only file handle. If you try writing to it, you'll get an error already at that point. If close fails, who cares?
> I'm not sure how they would handle a file close failure
Ideally there is some kind of failover you can resort to, but if there is no other option at very least you will want to notify a human that what they thought was written isn't actually. But when only reading, you don't need to fall back to anything – all the reads were successful – and what is it to the human? What they thought was supposed to happen did!
Let me introduce you to the question mark, seen on your screen as the '?' character. It indicates that a question is asked. This is important as...
Never has "who cares" been confidently said. It has always been asked "who cares?". And not asked in a vacuum either, but specifically asked alongside the question of what is to be gained from the knowledge of the error.
We now know that you allegedly care, which is a promising start. But you purposefully ignored the other question, which questions the credibly of your care. You can't meaningfully care about something if you don't know why you care about it, and if you knew you'd have told us about it already as it nonsensical to answer to the "who cares?" question alone, so...
I care about it because it's unexpected. I seek to wrap my mind as fully as possible around the behavior of my programs. Unexpected behavior like this indicates there is something I do not understand. I cannot rely on my programs if I do not understand them. I cannot state they are robust and stable if I do not understand these sorts of details.
func Open(name string) (*File, error)
Open opens the named file for reading. If successful, methods on the returned file can be used for reading; the associated file descriptor has mode O_RDONLY. If there is an error, it will be of type *PathError.
Which of course isn't really a "solution", as it results in ambiguous error semantics; this is especially the case with Go's defer, as it pushes the code to the end of the entire function, not merely some intermediate relevant scope... the result is that a file used near the beginning of a function might fail to close but that error will not be realized until after something else in the function fails, after the point of no return on the close.
Really, this is kind of a fundamental limitation of automatic allocation semantics, which is effectively a monad that is being stacked with the error propagation monad in a confusing manner that means you "should" only defer operations which don't have errors.
Even in languages with exceptions, such as C++ and Java, they had to wrangle with this problem and failed to solve it: in C++ 11 or 17 or whatever, deconstructors are now by default nothrow in an attempt to prevent this kind of mistake.
...but then, what does one do with close?! In some sense, the entire concept of close must not fail, and yet it exists in a world where we don't really believe in anything that can't fail, as we like moving around failure semantics.
FWIW, Linus has suggested that the kernel should largely accept that application developers don't ever check the return value of close... but has also stated that developers should sync the file first if they care and also check close (at least, to be maximally correct).
But like, what does one do then? How do you ever recover from this? I actually think you can't, if you are in a cleanup operation... not without breaking your error regime. I think--and this is also where the article eventually goes in the updates (though without noting you can add your own boolean)--you should therefore both explicitly close (and/or maybe flush/sync) the file after writing to it and have a "if I didn't close this, close it" in your cleanup handler; critically, the (explicit) former checks for errors, while the (implicit) latter doesn't.
I think some of those issues with errors could be worked around. Wrapping errors provides a way to return a "cleanup failed" error that also includes the root cause of being a failed write. Likewise, there are packages for handling multi-errors.
I think the reality is that most of us push anything where the return status of Close would be important into the database, specifically because it handles semantics like this and simultaneous writes for us. It's like half the selling point of SQLite; you could write JSON documents and handle all the edge cases yourself, or just jam it in SQLite and quite worrying about Close and simultaneous writes and all that junk.
I mean, we can cause the same kind of problem there, as people might try to write a scope exit handler to commit a transaction. You'd then run into the same issue, and so "commit transaction" isn't a thing which should ever be in such a construct. Of course, deleting the objects for the transaction / closing the database connection / etc. would be fine to ignore errors from (and hopefully wouldn't/shouldn't fail anyway) and so those can and should be automated: you defer close but manually call and check a commit.
Once you accept this reality, the file case is the same: putting the sync and/or first critical close inline is equivalent work. The issue is that you simply can't -- no matter what the mechanism is -- slip back and forth between your scope maintenance and your error handling monads, resulting in needing cleanup operations where failure is not an option; and, so, you either must not care about the error in the context of the call or must do something even more drastic like terminate the entire program for violating semantics.
FWIW, I do appreciate that people are less likely to make that kind of mistake when working with a database, as people largely get that you should even try to commit a transaction if the code in it had failed somehow. Additionally, I appreciate that if you have a very tight scope -- which Go makes hard, but can still be pulled off -- the "close and throw an error if and only if we don't have an error right now" strategy is not at all horrible... it just isn't a "solution" to the underlying issue without an understanding of why.
Put elsewise, I think it is useful to appreciate that there is more of a universal theoretical / math reason why this is awkward and why it kind of needs to be built in a specific way, and that this issue transcends the syntax or even the implementation details of how you are trying to manage errors: at the end of the end of the day, all of these techniques people discuss are in some sense equivalent, and, at best, most of these workarounds at offer are ways to incorrectly model the problem due to some systems giving you too much rope.
I don't think that holds for database transactions because of the semantics of rollbacks. Trying to do a rollback after committing is effectively a no-op (I believe it returns an error, but doesn't actually change the DB), so you can defer a rollback and only call commit on the happy path.
I don't believe people generally care about the error context on the rollback, which makes it safe to defer into a context that can't interact with the error handling monads. Rollbacks shouldn't generally fail, even if they do there's basically nothing you can do about it, and there are few differences between a successful and failed rollback beyond resources on the DB server until the connection is closed.
The Commit is the portion that contains the context people care about in their errors, and that is still safely in a context where it can interact with error handling.
I believe files can get similar atomicity, but it requires doing IO in strange ways. E.g. updating a file isn't atomic, but mv'ing one is. So you can copy the file you want to update into /tmp, update the copy, and then mv the copy to the original file (commit is mv'ing it, rollback is rm'ing it or just ignoring it).
Database transactions aren't atomic and do have the same issue if they reference external resources, though. E.g. if you have a database that stores an index of S3 files, transactions won't save you from writing a file to S3 but then failing to write a record for it into the database. That does muddle the error handling again.
> FWIW, Linus has suggested that the kernel should largely accept that application developers don't ever check the return value of close... but has also stated that developers should sync the file first if they care and also check close (at least, to be maximally correct).
Close just behaves very differently depending on the actual filesystem, too. Usually it’s very fast because it doesn’t do much of anything, but e.g. on NFS close will actually wait for writeback to the server to complete (due to close-to-open semantics).
The solution is that there are multiple solutions that are suitable in different situations.
If you need "strict" requirements then it is likely impossible but you can get close enough if you use sqlite. Or more lightweight atomic/thread-safe/fault-tolerant file library. You could rollout your own: it easy to start, and continue until the error rate is tolerable for your application (though it may take more dev time).
If you don't need db-like strict guarantees. Just write your app knowing that it may fail (data may be lost, corrupted). It may be ok in a lot of cases.
I wouldn't mind the language adding some syntax for this, based on the java try-with-resource but using error values rather than exceptions. Something like
f, err = with(os.Open(thing)) {
// do stuff
}
// at this point err has the same semantics as your CheckClose
I believe your example doesn't introduce any new syntax though. Go is highly resistant to adding syntax (and that's a good thing, it keeps the tooling fast and barrier to entry low).
Rust has the same problem. Files are closed in `Drop` when the value goes out of scope, but all errors are silently ignored. To solve this there's `sync_all`[0].
Generally, relying on defer in Go or Drop in Rust for anything that can fail seems like an anti-pattern to me.
I assume a big issue is that this is full of edge cases up the ass, and the value is somewhat limited in the sense that if you know you want durable writes you'll sync() and know you're fucked if you get an error, but close() does not guarantee a sync to disk, as the linux man page indicates:
> A successful close does not guarantee that the data has been successfully saved to disk, as the kernel uses the buffer cache to defer writes.
So you'd need a "close", and a "close_sync", and possibly also a "close_datasync" (if you're ok with discarding metadata). And one could argue at that point `close` has essentially no value beyond hopefully getting rid of the fd / handle, and drop already does a fine job of that.
IIRC the Rust book talks about files automatically being closed when dropped, and how that's better than having a close method. That's probably why it's not a separate method, even though it suppresses errors.
This is also why, in Rust, relying on drop to close a file (which ironically is the poster child for RAII) is a bad pattern. Closing a file can raise errors but you can't reasonably treat errors on drop.
What we really need is a way to handle effects in drop; one way to achieve that is to have the option to return Result in a drop, and if you do this then you need to handle errors at every point you drop such a variable, or the code won't compile. (This also solves the async drop issue: you would be forced to await the drop handling, or the code wouldn't compile)
> This is also why, in Rust, relying on drop to close a file (which ironically is the poster child for RAII) is a bad pattern.
Is it though? It ensures the fd is closed which is what you want, and if you have some form of unwinding in the language you can't really ask for more. And aborts are, if anything, worse.
It also works perfectly well for reading, there's no value to close errors then.
> Closing a file can raise errors but you can't reasonably treat errors on drop.
It's mostly useless anyway, since close does not guarantee that the data has been durably saved. If you want to know that, you need to sync the file, and in that case errors on close are mostly a waste of time:
- if you've opened the file for reading you don't care (errors are not actionable, since you can't retry closing on error)
- if you've flushed a write, you don't care (for the same reason as above)
The one case where it matters is if you care but missed it, in which case we'd need a bunch of different things:
- a version of must_use for implicit drops
- a consuming flush-and-close method on writeable files
- a separate type for readable and writeable files in order to hook both, and a suite of functions to convert back and forth because even if rust had the subtyping for you don't want to move from write to read without either flushing or explicitly opting out of it as you're moving into a "implicit drop is normal" regime
> one way to achieve that is to have the option to return Result in a drop, and if you do this then you need to handle errors at every point you drop such a variable
That is nonsensical, the entire point of drop is that it's a hook into default / implicit behaviour. How do you "handle errors" when a drop is called during a panic? It also doesn't make sense from the simple consideration that you can get drops in completely drop-unaware code.
Boggles my mind that after more than 60 years of computer science, we still design tools (programming languages) where the simplest tasks are full of gotchas and footguns. This is a great example.
The tasks seem simple from 30,000 feet up in the air. Once you get down into the dirt you realize there's absolutely nothing simple about what you're proposing.
A filesystem is a giant shared data structure with several contractual requirements and zero guarantees. That people think a programming language could "solve" this is what is boggling to me.
It's hardly a footgun. Close may be able to report some additional errors with getting the data on persistent storage but it won't report all of them anyway. For most applications, ignoring the return of close is perfectly fine in practice.
Agreed. Just log it and move on. The code _probably_ wrote what it needed to even if it didn't close. If truly cared that you got everything out correctly, you'd need to do more work than a blind `defer Close()` anyway and you'd never have written the code like this.
the close() manpage says that it shouldn't be retried anyway, because one might end up closing a file that meanwhile had been opened with the same handle by a different thread.
The problem is that the filesystem primitives are garbage, so it is impossible to make something safe and reasonably performant. This is not a case of "speed at all costs" where huge footguns are added for marginal performance, this is avoiding 10x and up slowdowns that would be required to be safe due to the anemic primitives. If the filesystem had better primitives/APIs, like barriers and proper asynchronous completion, it would be trivial to design tools that are safe and performant. But, without them it is like trying to build a skyscraper out of mud and toothpicks.
To be honest, you sound like someone who has no experience with designing a filesystem and thinks he can do better because doesn't understand the problems at all.
I think this post overhypes the issue. So many writes we do just aren’t that important (e.g. logs, cli config, blah), Close fails rarely, and it’s pretty standard for casually developed application software to misbehave once the disk is full or breaking.
This is a classic safety / performance trade off that was properly selected in favor of performance.
The defer Close() is still quite useful as a way to avoid fd leaks.
Funny thing is that there is a near footgun with this go: if you defer and set a non named return in a defer, like cErr, that won’t actually set that variable. Not sure what actually happens in that case but godbolt would tell you. In that case, the error would get swallowed
Surely this would all go away if Go had an exception handling mechanism like most mainstream languages do?
You'd just concentrate on the "happy path", you'd close the file, there'd be nothing to forget or write blog posts about because the exception would be propagated, without needing to write any lines of code.
This problem isn't solved with exceptions either. The problem is that finalizers (C++ destructors, Java's `finally` blocks, Go's `defer` etc.) shouldn't fail but `close()` can fail. Therefore, for 100% correctness, `close()` calls should be handled explicitly and not left to finalizers.
Finalizers shouldn't fail because they might be executed while another exception is already in flight. Three languages have three different behaviors when that happens but in my opinion they all do the wrong thing:
In C++, if you throw from a destructor while another exception is in flight, the program will be terminated. In Java, throwing from a `finally` block will "forget" the original exception. In Go (according to this article, I'm not familiar with it), error from `defer` will be ignored. None of these are ideal.
Go also has `errors.Join` since 1.20, and one of its uses is exactly this one: to be used in deferred close that can possible raise errors so the errors can stack each other: https://stackoverflow.com/a/78013962.
> Doing this automatically is also one of the killer arguments for exceptions over error codes, IMO.
Definitely doing this automatically is better than relying on the programmer to do this manually, but I wouldn't say this is the "killer" argument for exceptions over errors codes, because this doesn't add anything new to the argument of exceptions vs explicit error handling.
Yes, with enough syntactic sugar it can become equivalent. Exceptions can be viewed as sum types with special syntactic sugar in conjunction with the regular return types, and can in principle be implemented as such.
When I say "exceptions", I mean the source-level semantics, not how it's implemented behind the scenes.
That's probably the best option I think. I've heard Ada does that (but don't quote me on that). If you can access the original errors from the `MultipleError` object, at least you can tell the user what exactly went wrong.
I don't thing there's one true right thing™ though. That's why explicit handling is necessary: The compiler doesn't have enough context to handle it for you. The programmer needs to decide what's the right way to handle it.
Personally I do not believe the math of these two monads allows for any better solutions (and I do not believe multierror is correct ;P)... I am thereby also very curious what they think the correct thing to do here is.
Python handles this case by raising the new error but including a reference to the original error. By default, the formatted error shows both:
>>> mylist = []
>>> try:
... first = mylist[0]
... finally:
... inverse_length = 1.0 / len(mylist) # imagine this was something more complex
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 4, in <module>
ZeroDivisionError: float division by zero
See my reply to the sibling comment about try-with-resources. I'm not familiar with any of these mechanisms but giving access to suppressed exceptions is probably a step in the right direction.
It might fail, but usually doesn't. In C#, to handle an e.g. file open failure, you can just write
try {
using var file = File.OpenWrite("test.txt");
file.Write("Hello, World!"u8);
}
catch (IOException e) {
Console.WriteLine(e.Message);
}
The exception handler is an enclosing scope for both file open and dispose (flush and close) operations. You can also hoist file variable to an outer scope to, for example, decide what to do if dispose throws for some reason. In practical terms, this is as much of an edge case as it gets, but can be expressed in a fairly straightforward way.
I don't think this does the right thing. You are catching exception if Close fails and nothing else. The problem thought is what to do when after file is opened something fails first, and then Close also fails.
I'm catching any exception that occurs within the scope of try block.
This includes an exception when trying to open or write to the file. If I fail to write to a file, it will try to dispose the stream, which flushes it and closes the file handle. If disposing the file handle itself fails, which should never happen, the exception will occur in the finally block, which this exception handler catches too. If you need to disambiguate and handle each case differently, which is rarely needed, you can order try-catch-finally blocks differently with explicit dispose and different nesting. This, again, is not a practical scenario and most user code will just `using file = File.OpenWrite` it and let the exception bubble up.
Yes, and this ignoring of the original exception is the core of the problem discussed. If you are willing to lose supposedly written data, your approach is golden.
As said in the previous comment, you can place a variable in an outside scope and assign to it from within an try-catch block to handle an open file stream in a particular way upon failure. You can simply write more code to disambiguate and specifically handle errors from separate parts of the execution flow. In any case closing a file handle should never fail, but if it does - there are tools to deal with this.
With exceptions you don’t silently ignore error and go on as if nothing happened.
Just simply not explicitly handling every single possible error is the correct choice in many scenarios - in which case it bubbles up to a general error handler, e.g. telling the user that something bad happened here and here.
My Java knowledge is at least 10 years out of date so I'm not familiar with try-with-resources. But from what I can gather from Google, it looks like they now provide a way to access the suppressed exceptions, which is probably a step in the right direction.
Explicit error handling is a choice and implicit error handling through exceptions is not necessarily a feature.
Both have advantages and disadvantages, I’d say the more “modern” approach actually the opposite to what you state here, and is in my opinion the way to Go (pun int intended), though it’s also how Haskell does it. You’ll find the same philosophy in Rust, Zig, Swift and others which all build on the previous decades of throwing exceptions and how terribly that scales in terms of maintainability. Even in the “old world” like with Java you have Kotlin which does both.
I feel like you might be claiming that Haskell does something like Go does, but it actually doesn't: it supports monads, and so uses a monad to hide the error semantics entirely, providing exception-like syntax with automatic propagation.
(I might misunderstand your use of "though", though? It could be that you were just noting in passing how Haskell disagrees with all of these supposedly-"modern" languages, and instead leaned into the sane happy path semantics, thanks to monads.)
(edit: to be clear, though... I do not think exceptions solve this. I wrote a comment elsewhere on this thread about the semantics issue, but a few other people also wrote similar things while I was trying to type my overly-verbose reply ;P.)
It's worth noting that using Either style monadic error handling in IO code in Haskell is arguably an antipattern, as the Haskell runtime has its own built in exception handling (which actually works in a pretty conventional way). Some more info on this under 'Exceptions best practices
' here: https://tech.fpcomplete.com/haskell/tutorial/exceptions/
Go is doing the worst thing possible. It is neither expressive enough for proper sum types, nor does it have expressions (that are analogous to sum types with good defaults and syntactic sugar).
It is literally C’s shitty `errno` with syntactic sugar.
There is a lot of space between the spooky magic quasi-global errno integer and proper sum types. Go's errors are not magic globals nor are they mere integers, even if they aren't sum types either.
How is it not just errno? Especially that POSIX mandates errno to be thread-local, so not even that is a difference. Just because there is some syntactic sugar that converts it to a slightly more descriptive type than an int, doesn’t make it different.
Thread locals are not lexically scoped, they are not stored as part of the function call stack, and mutating them is not expressed anywhere in the function signature. They are global variables with thread-local storage, not local variables.
Go's error returns are not sum types, but they are product types. The return signature (T, error) indicates that two values will be returned essentially as a tuple by the function: one of type T and one of type error. Error-returning functions are pure functions (though they typically perform other, impure operations).
There is no syntactic sugar (both for good and for ill). The type of errors is an ordinary interface, with a single method. Any type can implement that interface, including strings and structs and slices. Errors can have as many contextual details as needed, including nested/wrapped error messages, specific parameters of loop iterations, multiple errors rolled up from multiple operations, etc.
Go's error return is just an ordinary but common use of its multiple-value returns. You can write a function/method that returns three ints and no errors:
func (v Vector3) Splay() (x int, y int, z int)
You can even write a function that returns multiple errors:
> The return signature (T, error) indicates that two values will be returned essentially as a tuple by the function
Technically, yes, practically, it doesn’t tell you anything, as its most common usage is how it would be used as a sum type (either one or the other).
And yeah, I didn’t quite think of multiple return values, but that itself can be just syntactic sugar over an `out` parameter.
But these technical details aside, I am not convinced that it is not “as useless as errno-type error handling”, with the only caveat of it returning an interface that is slightly more informative.
I know the "if err != nil" pattern is spoken up as some sort of cultural idiosyncrasy of Go, similar to the whitespace formatting in python. But so far, I haven't seen any actual data (or even arguments) why it is superior to exceptions, or which inherent problems of exceptions it solves. (The classical example of "it makes control flow more obvious and makes it easier to ensure that mandatory finalizers are not skipped" was just disproven by this very article)
So if there is more substantial criticism against exceptions than just FUD, I'd like to know it.
> But so far, I haven't seen any actual data (or even arguments)
OK, here's an argument.
- In order to write resilient software, programs must handle not only the "happy path" when things succeed, but the path where things might fail.
- Thus it is important for developers to 1) be aware of which operations may fail fail, and b) think about what the program should do in that case.
- Exceptions make it easier for the programmer to forget that something might fail, and to avoid thinking about what to do if it does fail.
- Go's error handling idiom makes it clear that an operation might fail, and prompts programmers to think about what to do in that case. (They may of course choose not to think about it, but at least they made a conscious choice at some level.)
Thus Go's error handling idiom nudges developers towards more resilient software than exception-based workflows.
Or to put it differently: Programming systems which may fail simply is ugly: there are an exponential number of ways a system may fail, and each one must be handled correctly. Exceptions hide this ugliness, but by doing so make it more likely that there will be cases not handled correctly. By exposing this ugliness, Go makes it more likely that most cases will be handled correctly.
> In order to write resilient software, programs must handle not only the "happy path" when things succeed, but the path where things might fail
And exceptions let you handle error conditions without making the actual business logic harder to read, with as little or much specificity as required.
> Thus it is important for developers to 1) be aware of which operations may fail fail, and b) think about what the program should do in that case
Checked exceptions/effect types exist, being explicit or implicit in function signatures is not a fundamental property of exceptions.
And what is clearer in terms of error handling — if err being every third line, with questionable handling logic, e.g. just printing or swallowing stuff (or gestures at the article), and definite human error from repetition —— or a well-defined block with proper scoping, without which the error case does the only reasonable thing — automatically bubbles up, making it possible to handle higher up. There is often no immediate action that can be done in certain exceptional situations, e.g. your ordinary function that writes a file can’t do anything about a full disc. The best it can do is to yell, so that the action that called it somewhere can do some evasive action, e.g. re-trying/notifying the user/etc.
> Exceptions make it easier for the programmer to forget that something might fail, and to avoid thinking about what to do if it does fail.
Disagree. If anything, something not being in a try-catch block says that it will be handled higher up (or checked exceptions making it part of the signature), and when it’s surrounded by it, I know what is the happy path, and unhappy path immediately, without it being crossed over (usually badly), as it would happen with if errs.
> Go's error handling idiom makes it clear that an operation might fail
What about the case when it both returns a value and an error?
> and prompts programmers to think about what to do in that case
Blindly if erring and printing out a random string is not error handling. That’s just noise, and a terrible trap for yourself, having to grep for useless error codes later on.
> If anything, something not being in a try-catch block says that it will be handled higher up
I don't think you get what I'm saying. Some functions will always succeed. Some functions fail in obvious ways. Some functions fail in non-obvious ways. How do you know, as you're scanning a long block of code, which operations may fail, and which will always succeed?
For instance, suppose you have code like the following:
// Decode they key JsonKeyGuids as type []Guid
guids := JsonGetKey[[]Guid](&ru[i].Json, JsonKeyGuids)
Without looking at the function signature: If the key in the structure doesn't exist, what happens -- does it throw an exception, or return an empty value? Is it possible for JsonGetKey to fail to parse?
And while checked exceptions might help, it's not perfect: Suppose your code block calls functions a(), b(), and c(); all of them return ErrParseFail, but while it's pretty obvious that a() or c() might fail that way, it's not at all obvious that b() would.
Secondly, even for operations that are obvious may fail: maybe you, as a senior programmer who has programmed with exceptions for years, are paranoid enough that you're always thinking in the back of your mind "what happens if this fails?" But I very much doubt a junior programmer is going to have that habit. Part of the intent of Go was to have a language for Google which would allow junior programmers could come to a previously unfamiliar bit of code and be reasonably effective very quickly.
> And what is clearer in terms of error handling — if err being every third line, with questionable handling logic, e.g. just printing or swallowing stuff (or gestures at the article), and definite human error from repetition
You'd almost never just print the result of error messages unless it's at the top level, or it's the equivalent of a script. In most cases, you bubble it up, often wrapping it with a message of what you were trying to do; e.g.:
return nil, fmt.Errorf("trying to excluded guid list for %v: %w", ru[i].Id, err)
That way at the top level (or wherever you do log the message), you have a stack not just of the function names and line numbers, but what the program was actually trying to do, potentially with specific values involved.
Not having the equivalent of C's "must_check" is certainly a missing guard-rail in golang
> And what is clearer in terms of error handling...
It comes down to a judgement call. I think Golang's way is better. Yes, it makes the code look cluttered with exit paths, but that's because the code is cluttered with exit paths.
I can see that with experience, an exception-based developer would learn to see the implicit exit paths in most cases. So let me assert to you, that with experience, a check-the-return-values based developer also learns to filter out the explicit error paths to see the "happy path" algorithm clearly. But on the whole, I think the latter is likely to lead to fewer bugs, particularly for less experienced developers, but even for more experienced developers.
At any rate, now you've heard arguments for Go's error handling idiom; and if you don't agree, at least you can understand where the Golang crowd are coming from.
I prefer it because of location. The error handling happens where the error occur. It makes large systems much more maintainable in an enterprise setting.
It’s not like exception handling and throwing around things to have them caught later is inherently bad. It’s just a different philosophy, one that I don’t personally like anymore. It’s down the same alley as things like OOP, SOLID or DRY. Things which have good concepts that way too often leads to code bases which are incredibly annoying to work with. Maybe not for small systems with short life times, but for systems where you’re going to be the 100th person working on something that’s been running for 30 years it’s just nice to not have to play detective. I’d like to put a little disclaimer in here, because that isn’t inherently a consequence of exception handling or any of the other concepts but it’s just what happens when people work on code on a Thursday afternoon after a tough week. The simpler less abstract things are made, the easier it’ll be to unravel, and simple error handling is dealing with the errors exactly where they occur.
As others point out, it’s not without its disadvantages. It’s just that in my experience, those are better disadvantages than the disadvantages of implicit error handling.
Sibling comments have good points but what made it click for me was the book "Exceptional C++" where all the possible sources of exceptions are pointed out -- it's usually twice as many as you would first guess by looking at the code, sometimes made more complicated by type-converting constructors and operator overloading and RAII.
Then consider that if any code you're calling is not exception-safe it makes the task of writing your code to be exception safe that much harder.
Then add on top of that the question of responsibility -- do you handle the exception close to where it gets thrown or farther up the call stack? Handling it too close to the issue may be missing some context and result in a lot of the same exception handling across many areas of the codebase. But handling it too far up misses context, too, and often leaves the program in an uncertain state where it's not clear whether the show must go on or if it's time to shut it down. It's not uncommon to see a mix of exceptions and returned error values to try and find a happy middle ground there.
Java tried to paper over some of these problems with fewer gotchas (more explicit separation between resource allocation and initialization) and compiler checks that exceptions are in the method signature and are always handled _somewhere_ up the call stack but this often results in handling of very vague exception types at a high level. Some developers, uncertain what the right way to handle an exception is, and not wanting to crash the program, will just silently drop the exception instead! Admittedly, you can do this in go-style and C-style error handling, too, but at least then it's not as far removed from the source of an error because it's annoying to have to keep passing an error response through so many function signatures.
I used to use exceptions a lot in C++ and Java. This changed when I started working at a place where a lot of C++ is used but without exceptions. It was ostensibly about runtime costs but when seniors were pressed on the issue it was clear there were a lot of these other reasons stated above, and ultimately about readability of the code (error handling being close to error source) and a philosophy of failing quickly when assertions fail (because keeping calm and carrying on leads to programs running that should have died before they could make more of a mess).
I know it's unpopular but, in light of some real problems with exceptions, I actually prefer the way Go makes you do it. It often encourages doing all the setup in one (or a few) function scopes, and it becomes pretty evident from a function's signature whether you should expect something to go wrong or not. Would I prefer it for a game-dev scripting language? no! But for a system language it is better IMHO
> I haven't seen any actual data (or even arguments) why it is superior to exceptions, or which inherent problems of exceptions it solves
It's faster. Doing all error handling via exception is not viable if you want speed.
Exceptions work well for errors in the sense that these rarely happen. But using them for general "this is the bad outcome" of an operation that can happen in the hot-path is problematic. For example pythons "next()" function which raises an exception if the end of the iterator is reached, if C++ did this it just wouldn't be used.
So in my opinion exceptions are nice if you want ease of use, and explicit error handling (look at rust with their question mark operation which makes it pretty easy to do) is the way to go for performance.
One advantage of explicit error handling like rust does it is: it forces you to handle the error path like other business logic. Again, if you write a short script, this is annoying and doesn't get you much but if the application is very important then doing so is a good thing since it forces you to think about where which errors can happen and how you handle them. With exceptions its very easy to completely forget that a can even happen and thus they get ignored and suddenly the program crashes with a unreadable error message.
This is merely an implementation tradeoff: I could trivially modify a compiler to support exceptions that were as fast as returning error sum types without any syntax changes, but it comes at the cost of making all non-failing code slower--as you have to litter the runtime with checks that exceptions elide--so we don't generally do it... but like, if you make a new language and don't go with the syntax of exceptions because you like the implementation of sum types, I feel you have misunderstood the problem space.
(As for "handling" errors, you should only have a few places in the entire codebase which do that... littering the entire codebase with opportunities to feel like you might could handle an error seems like a mistake as it just encourages more handling.)
Well, let's assume Go did commonly make use of exception handlers for error cases:
defer func() {
if r := recover(); r != nil {
fmt.Println("Failed to write file", r)
}
}()
f := os.Create("file")
defer f.Close()
io.WriteString(f, "Hello, World!")
Cool. You've solved one problem layer, perhaps. But if you look closely you'll notice that code still has a bug!
So clearly exception handlers aren't enough. There might be some good ideas in using exception handling, but if you strictly limit its use to errors you end up half-assing it. Why not go all the way and find a solution that works in all cases?
Ha. I suppose you could argue that is also a bug, but not the one I was thinking of. Let's say this hypothetical Go where error over exception handling is the norm has no such issue.
Think more carefully about what we're actually trying to solve. It is not just about errors.
os.Create, per the topic of discussion, is said to throw an error using the exception handling mechanism rather than return error. The exception handler checks the error.
If you focus on errors, you’re going to never get it. The whole idea here is that the problem is bigger than errors.
You interpret Go using a different pattern for error handling as a criticism of or challenge to exceptions. Consider that languages can take different approaches, without having to get into “best” or “worse.” Every approach to error handling has advantages and shortcomings.
The Go designers have explained and talked about this decision many times. Some people don’t like it and maybe choose a different language. No one pushes FUD regarding exceptions. I get the impression that if you choose a Toyota you would think that your neighbor buying a Honda poses a criticism you have to address.
> Every approach to error handling has advantages and shortcomings
If under ‘every approach’ you explicitly exclude go’s terrible errno syntax sugar, and include exceptions and sum types, then yeah. There is zero advantage to go’s error handling compared to the proper sum typed solutions.
You don't like how Go implements error handling. We get it. Lots of people complain about that. Use a "proper" language more to your liking.
Go's designers have explained their decisions many times, I won't repeat their justifications here. Obviously they had to make choices in line with their overall vision for Go. They feel strongly that programmers should handle or at least acknowledge the possibility of every error, explicitly, right where it gets reported. Exceptions and sum types don't enforce that.
How are sum types not better in literally every possible way? Like, it’s as objective as it gets. Sum types makes it mandatory to unwrap the result - ergo some form of error handling. It is not the case with go.
Go does indeed has an exception handling system like most other languages. Errors and exceptions are very different things, though.
Of course, nothing stops you from building your own file handing package that overloads exception handlers to deal with errors. If it gains traction then it would prove the stdlib should consider a v2 API.
Languages in which try/catch/throw work across coroutine boundaries: Java, Javascript, and it's a standard feature in C++ coroutine libraries (and completely supported by the core C++ coroutine engine). So in my limited personal experience, among the languages that I am familiar with ... all of them except go.
Are there actually ANY languages other than go that have coroutines, and try/catch/throw mechanisms, where you cannot throw across a coroutine boundary?
And why would exception handlers NOT work across coroutine boundaries, other than laziness on the part of implementers?
> Languages in which try/catch/throw work across coroutine boundaries
You wha...? The question was about goroutines, not coroutines.
Besides, you'll notice that exception handlers cross coroutine boundaries in Go just fine. Your random tangent isn't even correct. Where did you dream up this idea to the contrary? I know coroutines are still new to Go, only officially landing in the latest release (experimentally in 1.22), but you'd think that would also mean their behaviour is fresh in your memory.
I'll take your avoidance of the original question to mean that no other language does it either.
I answered as I did because the original question seems strangely broken. The premise upon which the question is asked is flawed.
What are goroutines, other than peculiarly broken coroutines? (Notwithstanding your point that go has a non-broken implementation of coroutines at experimental release stage).
It is true that Javascript has a goroutine-like facility for executing coroutines on a seperate thread. But there are languages (c++, c# at least) where coroutines can execute on separate threads without suffering from the broken-ness of goroutines.
There's a self-selection problem though. People who prefer error handling through exceptions are just not going to use go. Period. Will not use it. Been there done that. No no no.
So it is telling. But I think what it actually tells is that people would have done it just use another language instead.
Most of the other options don't include native code compilation though. I suppose Swift is there but doesn't make much sense outside of the Apple ecosystem.
"You can always make your own" is missing the point. Of course it's useless if a single library implements error handling differently than the rest of the language. The question is why the language does have this kind of manual error handling as a standard in the first kind.
> If it gains traction then it would prove the stdlib should consider a v2 API.
Some library that behaves completely differently from the rest of the language and breaks all interop with the rest of the ecosystem will have a hard time gaining traction, no matter if the way the library does it is objectively better or not.
> The question is why the language does have this kind of manual error handling as a standard in the first kind.
Probably for the same reason Rust does, and why it suffers much the same problem:
1. It is what was in vogue in the 2010s.
2. More importantly, the problem isn't limited to errors. What have you gained treating errors as some hyper special case when they aren't any different than any other value?
I think we agree that we can do better, but seeing errors as special doesn't get you there. We need something that understands the all-encompassing problem.
So, failing that understanding, if you're going to do something that sucks, you may as well choose the least-sucky option, surely? Exception handling brings a horrible developer experience. To the point that in languages where errors over exception handling semantics are the norm, you will find that most developers simply give up on error handling entirely because it is so painful.
> Some library that behaves completely differently from the rest of the language and breaks all interop with the rest of the ecosystem will have a hard time gaining traction
I'm not sure history agrees. Ruby was also of the return values over exception handling mind before Rails came along. Rails pushed exception handlers for errors and developers went for it. Provide an API people actually want to use, and they'll use it. What was common before is inconsequential.
I expect what you are really saying is that exception handling wouldn't actually improve this example case even in the best case, and in the worst case developers would end up giving up on error handling leaving such a package to be a net negative to a codebase.
> Surely this would all go away if Go had an exception handling mechanism like most mainstream languages do?
Or monads, but that might be a step too far for the Go world considering the push-back against generics. When you have declarative error handling (a good thing imho) then monads really are the bees knees.
If you've never seen somebody type 'catch (Exception e) { logger.log("should never happen", e);}' then sure. In the real world people will often explicitly ignore the error, even when they are confronted with it.
Exceptions are a terrible error handling mechanism. You have no idea what throws and what doesn’t, it’s impossible to write defensive code that makes sense with exceptions. Errors as value is the only sane way to deal with errors. Granted, Go does it pretty badly but it’s still infinitely better than exceptions.
You do in Java. It's called Checked Exceptions. It's a binding API contract and communicates this well not only when writing the code, but also when reviewing it.
You don't want to just `fsync()`, but also flush whatever is buffered and then `fsync()`.
Another thing is that if you're holding an flock on that file, it's nice that closing it will drop the lock. Generally you want to drop the lock as soon as you're done writing to the file. Deferring the dropping of the lock to the deferred close might cause the lock to be held longer than needed and hold back other processes/threads -- this doesn't really apply in TFA's example since you're returning when done writing, but in other cases it could be a problem.
Do not risk closing twice if the interface does not specifically allow it! Doing so is a real good way to create hard-to-find corruption bugs. In general if you see `EBADF` when closing any fd values other than `-1`, that's a really good clue that there is a serious bug in that code.
The article suggests using a named return value `err` to allow the return value of `Close` to be be propagated - unless doing so would overwrite an earlier error:
Wouldn't it be better to use `errors.Join` in this scenario? Then if both `err` and `cerr` are non-nil, the function will return both errors (and if both are `nil`, it will return `nil`):
OP here. Another commenter pointed out that `errors.Join` didn't exist when I wrote this, but I wouldn't have changed my guidance if it had.
The core issue here is that you want to deal with errors as soon as possible. The async nature of writes to files makes this more challenging. Part of the purpose of this post was to inform people that you don't necessarily always see write errors at write-time, sometimes you see them at close-time and should handle them there. (And sometimes you don't see them at all without fsync.)
Even if you do handle errors in your deferred close, you may be taking other actions that you'd rather not have until the file i/o is completed, and this could leave your program in an inconsistent state. Side effects in Go are common, so this is a practical concern and it is hard to spot and debug.
Exceptions aren't exceptional though; they are too expensive for not-exceptional errors, like failing writes.
That said, a language feature where you can throw lightweight error values without generating a stack trace etc might be a middle ground. But it won't land in Go, given the discussion about alternative error handling some years ago.
Anyway, in practice it's not that bad. A write can go wrong, you as a developer should write code that handles that situation. Exceptions lead a developer to miss errors, or to not handle them in a finegrained manner - e.g. generic "catch all, log error, maybe" style error handling.
Exceptions are slow in some languages based upon how they are implemented. I'm not convinced that is fundamental to exceptions and rather a choice of how they were implemented. In Java exceptions arn't actually that slow, most of the cost is just allocating the exception object (and allocations in Java are fast).
> Exceptions lead a developer to miss errors, or to not handle them in a finegrained manner - e.g. generic "catch all, log error, maybe" style error handling.
I don't see how Go error handling makes people handle things any more explicitly than exceptions. Most people just `if err != nil { return err }`, which to be honest is the _correct_ logic in many cases, and it's pretty easy to forget to check if err and keep on trucking. At least with exceptions if you don't catch it your thread terminates making unhandled exceptions
Exception bubbling means its easier to catch the error at the level that makes sense, and because they are real objects type checking is easier as opposed to the performance of `errors.Is()` which is surprisingly slow.
> which to be honest is the _correct_ logic in many cases
It is almost never the correct logic. The only time it might be appropriate is in a private helper function that has limited scope around another function.
It is most definitely not the correct logic if you are returning that from a public function! For many reasons, but especially because it now binds you to the implementation of the function you called forevermore. That is a horrible place to be.
For example, find out your os.File usage would be better served by SQLite? Too bad. You can't change it now because the users of your function have come to rely on errors from the os file operations when they handle the error you give them. Their code can't deal with the errors coming out of SQLite.
Instead, you need to return errors that are relevant to your function. It may be appropriate to wrap the source error in some circumstances, but your error structures should compel the user to rely on your errors, leaving the wrapped error only for things like logging where a change in the future won't break the callers.
IMO the formatting of the error string returned by errors.Join is atrociously opinionated and not very logging-friendly - it adds a newline between each error message. I know I'm not the only one that has this opinion
Arguably, one should call `flush()` on the file first. Resource deallocation must always succeed; otherwise a lot of invariants break. This is why Zig's close method[0] ignores errors (with the exception of `EBADF`).
Note that the "unreachable" there is equivalent to assert(error != EBADF), so really it's not even an exception, it's just helpful to crash there when debugging if you get that error. Important to understand that EBADF is not a catchable error because the kernel may have already reused that file descriptor for something else, in which case you wouldn't get EBADF, you would close an unrelated file descriptor.
On production systems, it may often be better to completely ignore the problem at this level. On modern hardware if a disk is throwing an io error on write you are having a bad day. I can almost guarantee that while you might happily modify your code so it properly returns the error, there almost certainly aren't test cases for ensuring such situations are handled "correctly", especially since the error will almost certainly not occur in isolation.
I'm arguing that the proper thing to do here is to kill the process along with whatever else is using the block device.
Whether you handle the error or immediately or if you allow the error to occur after a defer, you still are almost certainly not handling it properly and are taking a speed hit for your troubles.
Absolutely not. Imagine a text editor failing when I hit save. Do you really want that to crash the application? No. You want a fallback to ctrl+A ctrl+C, paste into a email and send to yourself. Giving the user a choice to save on a different drive is also a possibility, maybe your thumb drive just got a nudge and lost connection for a second.
This is really all normal use cases under normal conditions.
"Commit pending writes" and "discard file handle" should ideally be separate operations, with the former potentially returning errors and the latter being infallible. "Discard file handle" should be called from defer/destructiors/other language RAII constructs, while "commit pending writes" should be called explicitly in the program's control flow, with its error appropriately handled or passed up the stack.
Whether if it's allowed to write again after a "commit pending writes" operation or not is a separate design decision. If not, then in some languages it can be expressed as an ownership taking operation that still allows for handling the error.
I think anything other than deferring Close() and then calling Close() again explicitly is overengineering. Like anything that requires creating a cleanup function, capturing a named return value, requires contorting in unnatural ways to handle a very common scenario. Just... defer Close() the soonest you can (after checking for errors), then call Close() again at the end. Most sane providers of Close() should handle being called multiple times (I know os.File does, as well as sql.Tx ).
Finally, errcheck can detect errors-ignored-on-defer now, though not when run with golangci-lint for whatever reason. I switched to nogo a while ago and it found all the missed closes in our codebase, which were straightforward to fix. (Yes, I wish the linter auto-ignored cases where files opened for read had their errors ignored. It doesn't, so we just check them.)
With your package, how do you suggest users of your function handle the error upstream?
Like this?
switch {
case strings.Contains(err.Error(), "close thing foo"):
// deal with foo close error
case strings.Contains(err.Error(), "close thing bar"):
// deal with bar close error
}
And if so, what lead you or your organization to prefer "stringly-typed" errors over the idiomatic approach?
Want to parse an address + port? netip.ParseAddrPort only returns strings ('errors.New' errors).
http/http2 is also a minefield of half-exported errors.
The go authors say to use 'errors.Is' and 'errors.As', but the go stdlib also defines an idiom, and the idiom it defines is that somewhere around 30% of all errors should be stringly typed, including many where you may want to have specific handling for them.
> Stringly typed errors are also idiomatic in go, if you follow the stdlib.
Realistically, you can't follow the standard library, except perhaps the newest additions. Idioms emerge and evolve with use. Much of the standard library was written before Go saw much use, being largely in place before the world got to see Go for the first time. Also, thanks to the Go1 guarantee, cannot be changed now.
If the aforementioned package was written in the 2000s, then it might be fair to say that it was in line with the idioms of the time. But it appears to have been written within the last year, and thus is not aligned with idioms of its age.
That's not to say it has to be. Idioms are not requirements. The "stringly-typed" design may be justified even knowing what we know in the 2020s. And, with that, we don't need to speculate about the justifications. We can let the author speak for himself as to why the choice was made.
netip is one of the newer additions to the stdlib (added in ~2022), and follows the venerable stringly typed error idiom.
I always interpreted the preference for stringly typed errors as a way to keep the Go language simpler. Good error handling is complicated and hard to read, and one of Go's values is that programs should be easy to read. As such, if you want good error handling, you should use a different language, like Java or Haskell or C++. This also helps keep people who might demand complicated things like generics away from the language, further keeping it simple.
My understanding was that many of the go idioms are there to scare off programming language theorists, who have a tendency to unnecessarily complicate everything with type theory, and error handling also seems to mostly be in that vein.
> and follows the venerable stringly typed error idiom.
Not exactly. It assumes that all error conditions within the functions provided by netip are of the same nature as it pertains to a single unit of work. In other words, there is only one type (not referring to the language's type system). Error type reuse where different failure points produce the same type of error does not violate current idioms. I cannot immediately think of any reason for why their assumption is wrong, so unless you have other ideas?
That is not the same situation as the deferred close wrapper, though. It is assuming that closing multiple file handles is the same operation, but clearly that's not true. If you were, say, writing a copy function the error handling of the read handle failure is unlikely to be the same as the handling of the write handle failure. The former doesn't tell you much, the latter is quite actionable. The failure points are distinct, and thus of different types (again, not referring to the type system).
The author's answer was basically that he is the only caller so if that problem arises in his code he'll simply modify the function to return idiomatic types. Which is fair for the lone wolf developer. When working alone anything goes! But it is not good API design generally speaking. It certainly wouldn't fly in something like the standard library or anywhere you have other developers.
> Error type reuse where different failure points produce the same type of error does not violate current idioms. I cannot immediately think of any reason for why their assumption is wrong, so unless you have other ideas?
I have a program that takes user input and parses it, and then displays an error. My program is for a language other than english so having it display a pop up with the message "invalid ip:port, square brackets can only be used with ipv6 addresses" in english is bad.
Therefore I want to switch on the error message to display translated errors, but of course Go does not think that parsing errors are something that is important.
If parsing user input isn't a place to expose clear non-stringly-typed-errors, I don't know what is.
Note, it also gives bad errors in that some of them include details and the user input, and some don't, so displaying them to users will stutter or require parsing.
For example:
_, err = netip.ParseAddrPort("foo:bar")
// invalid port "bar" parsing "foo:bar"
_, err = netip.ParseAddrPort("1.2.3.4:")
// no port
So in one case it has included the original user string, to make it clear what failed, and in another case it doesn't, so I'll have to always add in the context of the input (i.e. `fmt.Errorf("error parsing %q: %w", input, err)`) anyway in order to know what failed, but it'll stutter in every case where they do include the input.
I know the answer to all my issues is the usual go thing of "a little copying is better than depending on the go stdlib" of course. At least for netip, forking it is fine, having to maintain a fork of the go net/http stack just to get halfway decent errors is a real pain.
I'm not sure the desire to perform a transformation on a value implies that there are multiple types. You speak to a real problem, of course, but perhaps at the wrong layer of abstraction. It seems the deeper seeded issue is that Go strings assume one language, which is not true. I wonder what native internationalization support might look like?
The caller of this function will now want to distinguish between "Did we fail to parse input, or did we fail to do networking".
The way to do that is to have `netip.ParseAddrPort` failures return a different error type than other methods, but the "idiomatic" go code above doesn't wrap the error with an additional type, so the caller can't distinguish between a network error (many of which are also just 'errors.New'), and a netip parse error (all of which are 'errors.New').
Pushing that responsibility onto the callers seems silly, and like a footgun, especially when several other packages do have typed errors that mean the caller can successfully identify the error without having to do verbose explicit wrapping.
> The way to do that is to have `netip.ParseAddrPort` failures return a different error type than other methods
No. Absolutely not. If you leave the callers of this hypothetical function to rely on the errors of the functions it calls, even if we assume those functions were written by an infallible deity, you've created a coupling that now binds you to the implementation forevermore. That's just plain reckless behaviour.
Return your own errors. I know it takes a tiny amount of extra thinking to figure out what types are relevant to your function, but is unquestionably worthwhile and would still be worthwhile even if all the functions you call were designed by an infallible entity.
> but the "idiomatic" go code
You're really stretching the use of idiomatic here. Not ever has that been considered idiomatic Go. The Go community has always been clear that you should never, ever write code like that. It is so painfully horrid for so many reasons that it could never be considered idiomatic.
> errors.Is is true for whatever x.Close() returned.
That may get you through if there is only one x.Close, but now you have to offer the guarantee to the callers that there will only ever be one Close error returned. Furthermore, any callers of your function have to ensure that they don't end up introducing additional Close error returns in the same vein.
Without such guarantees, you, the caller, need to resort to string matching to protect against undocumented functionality and/or future modifications when you handle the error. Sounds like a rough situation...
> If I'm going to do something upstream based on the error, I use a type.
Implying that you are a lone wolf developer? I think you make a good point that if you exist in your own world without other developers just about anything goes.
That said, the language used around the previously linked repository implies that it welcomes other developers using and working on the code, so it is not clear how you are "doing the upstream" in all cases.
> That may get you through if there is only one x.Close, but now you have to offer the guarantee to the callers that there will only ever be one Close error returned.
No you don't. It's a multierror, a feature of the standard library.
> It's a multierror, a feature of the standard library.
Understood (I read the code), but that doesn't help, and really has nothing to do with the topic at hand. Is the problem here that you don't have an understanding of what we're talking about?
> You know that's not true from the history of the repository.
Which is why the claim was identified as being strange and in need of clarification. How can "If I'm going to do something upstream based on the error, I use a type." be true? If there are many developers, you don't get to control the upstream.
I dislike the multiple close pattern - I was bitten by this behavior years ago, when the second close() ended up closing another file which had been opened between the first and second close ( I think they were actually sockets ). It was a bona fide bug on my side , but it made for unpleasant memories, and a general distrust of such idioms on my side unless there's a language wide guarantee somewhere in the picture.
Older versions of go (1.0 for example ) were much less safe.
I had a look at the code, and it closes the file directly, and marks it as unusable. However, if you do concurrent operations, you can race and close twice the underlying fd - which I think was my bug ( I shouldn’t have been closing things twice anyway !)
AFAIK Python/C# use a similar approach to Go - instead of `defer`, they have `using`/`with` statements. Go's `defer` seems more flexible though - it can execute custom code each time it's used, whereas `using`/`with` always call `__exit__`/`Dispose`.
How does the Python/C# approach compare to Go's in this situation? How are errors in `__exit__`/`Dispose` handled?
From the application point of view, exceptions are more natural to catch, so if the close inside the exit-function throws, the nearest catch block will not be far up the stack. Compared to go, where you are much more unlikely to have a recover block, because it is considered such a rare case.
Don't call close() twice if you're not using pidfds, this is racy. The fd could be reused in between the two calls. You'll risk closing random things a frustratingly small fraction of the time, creating very hard bugs for yourself.
Didn't I see this thread the other day including comments? Investigating, Algolia search shows this thread as being posted 2 days ago, and the memorable comments too.
The posting timestamp is temporarily adjusted, such that people don't complain about a week day old submission being on the front page.
The comment timestamps are temporarily adjusted to be consistent with the adjusted posting times, such that readers aren't confused by the comments predating the apparent posting time.
The timestamps will revert back to their original values in a few days.
What I don't understand is why this went into the second chance pool if the original submission made it to the front page and got >100 points.
Can someone tell me what’s going on here? This very post including the comments appeared on HN two days as well. I thought I was getting crazy but Google confirms.
If a post doesn't get much attention, there's a small chance of it getting a boost a few days later. If this happens, the timestamps are all adjusted to make it look like it was just posted.
Here is a related question that has been on my mind for a while, but I have yet to find a good answer for:
If I write to file on a reasonably recent Linux and a sane file system like ext or zfs, at which point do I have the guarantee that when I read the same file back, that it is consistent and complete?
Do I really need to fsync or is Linux smart enough to give me back the buffer cache? Does it make a difference if reader and and writer are in the same thread or same process?
Best practice IMO would be to wrap the closable thing in a wrapper object that handles the case where Close is called multiple times and then defer close that one as well as closing at the end of the function. Another idea would be to define a small lambda that either returns the input value or returns error if Close returns an error if you want to use early returns.
I think this is likely a code style difference due to working with a linter that alarms on discarded error returns, but I’m not sure. Both options have the same behavior, unless you reassign f (defer f.Close() will use the original value, defer func() ... will use the current value).
I can't decide because in this pattern you ignore once and check once and I like the lint for the check. Ideally the linter could recognize this pattern. Even better would be if the linter could catch when you never close, I've made that mistake a few times.
I’m a bit new to Golang, but not good programming practices. Isn’t ignoring the error value returned by a function a very bad practice in general? Regardless of if the function call returning it is used in defer? Not just for file write operations?
Ignoring errors is generally a poor practice. You don't have to stop your program on errors, but you should at least log some percentage of them so that when the failure cascades, you have some idea where the failure started.
Sometimes there is nothing you can do when there is an error, in that case there is no point in adding several layers of error forwarding until you ignore it somewhere higher up.
Is… NOT ignoring errors just not an option? I don’t get it. If you propagate errors up but not all the way to being handled, haven’t you failed in a very simple, easy to fix way? Should you have a linter catching these things?
In this case the issue is that defer is a very good way to ensure you don’t forget to close the file in any branches, but a bad way to return values (you have to set the value of a named return variable, which is one of Go’s odder features).
> Should you have a linter catching these things?
JetBrains’ GoLand will in fact warn you of this. If the error truly is immaterial you can instead do
defer func() { _ = f.Close() }()
which is verbose but explicit in its intent to ignore the error.
> […] you have to set the value of a named return variable
Ahhhh okay I see it now. I definitely prefer to not use that feature as well, and I’m surprised it’s even there given how well the rest of the language adheres to “only one way to do things”. Doubly agree that it’s a strange “hack” for forwarding the deferred return value… oof
> JetBrains’ GoLand will in fact warn you of this
Heh yeah that’s what prompted me to ask, as I noticed (and very much appreciated) these hints. 100% agree with the verbose-but-explicit example you gave, and do that myself.
Isn’t what the article suggests (with defer and then WriteString) technically a race condition? Is there no way that the closer can get called before WriteString executes?
Thanks! Not a Go programmer, so I saw the parentheses after the cloaure def and assume it translated to “execute this on a goroutine in the background immediately”
The snippet in the first update is very wrong. The manpage for Linux’s implementation of close() explicitly says that it should not be called again if it fails. Apparently, this is the same under FreeBSD.
“Close will return an error if it has already been called.”
An os.File is a data structure containing a file descriptor and some other fields. It is safe to call Close() multiple times, because it will only call the underlying syscall close() once.
Fails if (say) OS can't write out pending cache and confirm data written to device.
Causes include memory failure, drive cable melted, network cable pulled, etc.
What to do?
How important is the data being written? Is the only copy of just aquired data from a $10 million day geophysical survey? How much time and resources can you spend on work arounds, multiple copies, alternative storage paths, etc.
In aquisition you flush often, worst case lose a minute rather than a day.
In, say, seismic quisition, you might aquire audio data from microphone array and multi track raw audio to SEGY tape banks AND split raw data to thermal plotter AND processing WHERE RAW DATA -> (digitally to DAT AND hard drives) and through processing WHERE COOKED DATA -> digital storage.
In processing pipelines a failed write() or close() isn't so bad, you flag that it happened and you can try to repipe the raw data to get a savable second result.
Ultimately you want human operator control on what and when to do something - it's a hardware problem or resource starvation at the root.
Custom hardware, custom real time kernel (acquisition|processing DSP boards) + loadable RT firmware, custom kernel + comms + window manager on main terminal.
Other than the recording redundancies described (raw analog logged, raw digital logged, raw paper chart created, cooked data logged, cooked paper chart, (raw | cooked each on tape, disk, paper) what are these "redundant systems" that you speak of?
Keeping in mind, of course, that the client has raw data, etc. on the contract as deliverables.
Do you imagine two full ships pulling two full microphone arrays to offset a rare (but happens) recoring failure? Now you've doubled the per dium costs and halved the area that can be covered in a typically short season.
Do you imagine one ship pulling two arrays that magically don't tangle? It doesn't work that way.
The goal here, of course, is to do all that as feasibly possible upfront in order to minimise aquisition time on the water and to ensure that all pings | booms | etc and their returns to multiple mic's recorded so the ship doesn't have to do a repass.
Expand on your non cargo culting non insane design ideas for 1970-1990s offshore seismic exploration by all means as what you intend isn't clear in your terse comment.
Keep in mind your design will need to be moved on and off arbitrary ships and will operate in places like the North Sea, Spratly Islands, etc. and will have to survive the pitch and toss of stormy weather (eg: attention to card fit in bus backbone).
On Linux, the close() system call rarely fails unless you provide an invalid file descriptor. L.Torvalds has stated that on Linux, close() immediately removes the file descriptor from the process, regardless of the underlying implementation's success or failure. Any errors related to the actual closing of the file are handled within the kernel and won't affect user-space programs. I know that go Close is not posix/linux close, but in majority of cases it'll boil down to it.
To quote:
Retrying the close() after a failure return is the wrong thing to
do, since this may cause a reused file descriptor from another
thread to be closed. This can occur because the Linux kernel
always releases the file descriptor early in the close operation,
freeing it for reuse; the steps that may return an error, such as
flushing data to the filesystem or device, occur only later in
the close operation.
Many other implementations similarly always close the file
descriptor (except in the case of EBADF, meaning that the file
descriptor was invalid) even if they subsequently report an error
on return from close(). POSIX.1 is currently silent on this
point, but there are plans to mandate this behavior in the next
major release of the standard.
There are more wrinkles with this:
- if you are creating a file, to ensure full synchronisation you also need to fsync the parent directory, otherwise the file can be fsynced but the update to the directory lost
- if sync fails, you can not assume anything about the file, whether on-disk or in memory, critically one understanding which got dubbed "fsyncgate" and lead to many RDBMS having to be updated is that you can not portably retry fsync after failure: the earlier error may have invalidated the IO buffers but IO errors may not be sticky, so a later fsync will have nothing to write and report success
> otherwise the file can be fsynced but the update to the directory lost
It also goes the other way - the update to the directory can be fsynced but the file lost. This can break the "create temp file, write, close, rename to current" scenario (when the intention is to replace file contents atomically).
POSIX doesn't guarantee the order in which data hits the disk, so the above scenario can become "create temp file, write [contents still in memory only], rename to current [written to disk], power failure".
I believe there was a bug where this scenario had worked for a long time then one filesystem (ext4?) pushed closer to what is admissible under POSIX (non-obvious reorders of physical writes) and people started getting random data corruption in programs which used write & rename.
> It also goes the other way - the update to the directory can be fsynced but the file lost. This can break the "create temp file, write, close, rename to current" scenario (when the intention is to replace file contents atomically).
> POSIX doesn't guarantee the order in which data hits the disk, so the above scenario can become "create temp file, write [contents still in memory only], rename to current [written to disk], power failure".
Wait.. is there a way to do this correctly? At which points is fsync warranted, and how many fsyncs do we need for the whole "write file then mv it on top of current" to not lose data on power failure?
Write, fsync, rename then fsync directory if you need an ordering guarantee that the rename is a transaction barrier.
Of course, the fun part is that the filesystem can’t really guarantee fsync behavior if drives lie about it which many consumer drives do for benchmark reasons. Fun, no?
It would also be a fun experiment to write, fsync, cut power. Rinse and repeat. On differential drives. It would quickly show which drives are lying about syncing.
Sqlite is much better than raw files to keep data intact on power failures, and be sure to study the options carefully
If you truly need to use files you can take other steps such as mv the old file to .bck before mv the new file, but I really think you want sqlite
SQLite doesn't do any magic other than fsync. Using it to deal with power failures is nonsense.
Well generally OP is correct that SQLite will more likely have written fsync code correctly and be fairly robust against power failures due to the WAL. Additionally, it’s not doing any renaming which you would need with plain files.
So while it doesn’t do any other magic, it’s more likely to handle power failures correctly.
> SQLite doesn't do any magic other than fsync.
True.
> Using it to deal with power failures is nonsense.
Using a very well designed library for your use case is not nonsense.
> Using a very well designed library for your use case is not nonsense.
Could you recommend your personal favorite(s) of such libraries? Enquiring minds want to know! Thx.
When it comes to libraries that handle the messy details of putting data on disk, sqlite is the only one I can name off hand so that's the favorite.
But the authors of SQLite have studied and correctly implemented the working fsync logic, so you don't have to. It ·may* be a correct approach if it's your implementation detail, and.nobody else expects that file you're replacing.
I think the idea was to store data in the db, instead of scattered over multiple files and directories. Then you only have to worry about one file to fsync (Two if using a wal).
`fsync()` is not necessarily enough if your on-disk format is complex because you still need to write recovery code. Of course, SQLite3's on-disk format is very complex, and so it requires correspondingly more complex power failure recovery code. But SQLite3 has an excellent test suite, and in particular they have an excellent power failure recovery test suite.
Using SQLite3 to avoid power failure issues is pretty good advice. I don't see why GP is getting downvoted.
Can anyone provide links to learn more about these misconceptions and the bugs they caused?
Edit: talking about filesystem misconceptions, not fsyncgate.
I have found a blog post from 2009 about this issue[1]. I think the "recent Ubuntu bug [that] has gotten slashdotted" is [2], so I think the most technical discussion about this would be [2].
I also found an interesting "scar tissue" from that bug in the current ext4 docs[0]:
"If auto_da_alloc is enabled, ext4 will detect the replace-via-rename and replace-via-truncate patterns and force that any delayed allocation blocks are allocated such that at the next journal commit, in the default data=ordered mode, the data blocks of the new file are forced to disk before the rename() operation is committed. This provides roughly the same level of guarantees as ext3, and avoids the “zero-length” problem that can happen when a system crashes before the delayed allocation blocks are forced to disk."
[0]https://docs.kernel.org/admin-guide/ext4.html
[1] https://thunk.org/tytso/blog/2009/03/12/delayed-allocation-a...
[2] https://bugs.launchpad.net/ubuntu/+source/linux/+bug/317781
EDIT: realized you were asking about the ext4 bugs, not the parent comment of that one. Oh well, keeping this up anyway for others.
Postgres has a whole wiki page [0] about it, it's quite a read. They also link a [1] MySQL commit to fix the same issue.
[0]: https://wiki.postgresql.org/wiki/Fsync_Errors
[1]: https://github.com/mysql/mysql-server/commit/8590c8e12a3374e...
So if I do fopen/fwrite/fsync/fclose, that is not enough? That is crazy, I think 90% of apps don't fsync the parent directory. Also, how many levels of parents do you need to fsync?
That should be only for creating files, and maybe updating their metadata (not sure about that one).
The confusion stems from people thinking that files and directories are more different than they are. Both are inodes, and both are basically containers for data. File inodes are containers for actual data, while directory inodes are containers for other inodes.
All inodes need to be fsynced when you write to them. For files this is obviously when you write data to them. For directories, this is any time you change the inodes they contain, since you’re effectively writing data to them.
You only need to sync the direct parent because the containers aren’t transitive; the grandparent directory only stores a reference to the parent directory, not to the files within the parent directory. It’s basically a graph.
In the case of atomic file writing, the rename() doesn't cause a fsync of the parent directory?
> rename() doesn't cause a fsync of the parent directory?
It does not.
At best it will schedule a journal commit asynchronously (I recall that ext4 maintainer complained about adding this "workaround for buggy user code" on lkml). If you want to receive an IO error when renaming fails, make sure to call fsync() yourself.
Depends on the file system. In NFS, yes. In others, maybe not.
> while directory inodes are containers for other inodes.
Uh? No.
You can imagine them as a list of name,inode tuples.
I’m not well versed enough on the subject to know if there’s important nuance here that I’m missing… but that doesn’t sound like a contradiction to me. That’s effectively a map with O(n) access, right? I think I would more generally refer to that as a “collection”, but it certainly contains inodes as you describe it.
IMO it's better to think of a directory as "containing" only the named references to inodes. Other directories on the same filesystem may contain other named references to the same inodes ("hard links").
The inodes themselves are more like free floating anonymous objects independent of any particular directory, and might not have a named reference at all (O_TMPFILE; or the named reference was deleted but a file descriptor is still open; or orphaned inodes due to filesystem corruption or someone forgetting to fsync the directory after creating a file as masklinn pointed out - e2fsck will link these in /lost+found). This is also why chmod appears to affect other hard links in different directories: Because it actually modifies the inode (file permissions are recorded in that), not the named reference.
This definitely seems like a more meaningful distinction to me. It’s also closer to the mental model I had coming into the discussion, FWIW.
And I think it makes the collection/container terminology distinction sharper too. Depending on context, I think it’s usually reasonable (if imprecise) to describe a bucket of references or pointers to things as a collection of those things. But I don’t think it makes as much sense to call it a container of those things, except in a really abstract sense.
> It’s basically a graph.
do you mean it's a basically a tree? Because if it were just a graph, you could still have edges from the grandparent to the grandchild in addition to the one from the former to the child
This ... depends. In normal POSIX land, hard links break the tree structure, for one, so you get a DAG but not a tree. I think some file systems do enforce tree structure, though - hard links are not supported everywhere.
It used to be possible ages ago to hard link to directories, which meant that you could have actual cycles and a recursive tree-walking algorithm would never terminated. (As far as I know you can still do this by editing the disk manually, although I think fsck will make a fuss if it detects this.)
You can still, with the right syscalls and drivers, do something like hard links on NTFS (I think they're technically called mount points but it's not the same thing as POSIX ones). I'm not sure if you can still make directory cycles, and you're probably a bad person if you do.
> , hard links break the tree structure, for one, so you get a DAG but not a tree.
More than DAGs, but instead pretty arbitrary graphs since you can express cycles with hard links.
> since you can express cycles with hard links
You can't if you only allow hard links to files.
Logically, ".." is a hard link to the parent directory. This may or may not actually be the case in the filesystem on the disk.
A tree is a specific kind of graph.
Crazy is the right term. File system APIs in general have too many sharp edges and need a ground-up rethink.
Consider S3-like protocols: these recognise that 99% of the time applications just want “create file with given contents” or “read back what they’ve previously written.”
The edge cases should be off the beaten path, not in your way tripping you up to when you want the simple scenario.
Aren't the edge cases features? An abstraction (or different API, sure) is in order to prevent footguns. However, this abstraction should not force fsyncs for example, due to the performance impact mentioned. It puts the choice of guaranteed writes vs performance to the developer.
A better abstraction, designed from the ground up, wouldn’t force fsyncs to work.
For example, write groups or barriers (like memory barriers) would be wonderful. Or a transaction api, or io completion ports like on windows.
In a database (and any other software designed for resiliency), you want the file contents to transition cleanly from state A to B to C, with no chance to end up in some intermediate state in the case of power loss. And you want to be notified when the data has durably written. It’s unnecessarily difficult to write code that does that on top of POSIX in an efficient way. Most code that interacts with files is either slow, wrong or both. All because the api is bad.
> Aren't the edge cases features?
What features do you have in mind?
> It puts the choice of guaranteed writes vs performance to the developer.
Yes, and it's a completely false choice. This entire point of this thread is that fsync is an incredibly difficult API to use in a way that gets you the guarantees you need ("don't lose the writes to this file"). And that the consistency guarantees of specific filesystems, VFS, POSIX, and their interactions are not easy to understand even for the experienced -- and it can be catastrophic to get wrong.
It isn't actually a choice between "Speed vs correctness". That's a nice fairy tale where people get to pretend they know what they're up against and everyone has full information. Most programmers aren't going to have the attention to get this right, even good ones. So then it's just "99.9% chance you fucked it up and it's wrong" and then your users are recovering data from backups.
It sounds more like you are asking for an abstraction
The file system is already an abstraction. I think they are asking if it's the right one.
Is the filesystem the correct abstraction? For most applications, a database-like API is more appropriate, hence SQLite.
The filesystem is very easy to use for simple things by comparison to a DB, and it's more accessible from the shell. But you're right, the filesystem is very difficult to use in a power failure safe way. SQLite3 has great power failure recovery testing, so the advice to use SQLite3 for any but the simplest things is pretty good.
It'd be very nice to get some sort of async filesystem write barrier API. Something like `int fbarrier(int fd)` such that all writes anywhere in the filesystem will be sync'ed later when you `fsync()` that fd.
It would also be very nice to have an async `sync()`/`fsync()`. That may sound oxymoronic, but it's not. An async `sync()`/`fsync()` would schedule and even start the sync and then provide a completion notice so the application can do other work while the sync happens in the background. One can do sync operations in worker threads and then report completion, but it'd be nice to have this be a first class operation. Really, every system call that does "I/O" or is or can be slow should be / have been designed to be async-capable.
> So if I do fopen/fwrite/fsync/fclose, that is not enough?
That is my understanding.
> Also, how many levels of parents do you need to fsync?
Only one, at least if you didn't create the parent directory (if you did then you might have to fsync its parent, recursively). The fsync on the parent directory ensures the dir entry for your new file is flushed to disk.
I've never heard about this in my years of programming. I just tried to read through the Win32 documentation, as I've done several times over the years, and it mentions a lot of edge cases but not this that I could see.
Is this some Linux/Unix specific thing? Am I blind?
As far as I know, it's specific to the combination of certain POSIX-ish OS and file systems, like linux/Ext3. I have no clue what BSD does here, or whether ReiserFS is different.
Windows/NTFS is a different world, there are still edge cases that can go wrong but I don't think this particular one is a problem because FAT/NTFS is not inode-based.
I imagine if you looked at the SQLite source code you'd see different edge-case-handling code for different OSes.
NTFS has inodes.
The thing about Windows is that because the file open operation (`CreateFile*()`) by default prevents renames of the file, Windows apps have come to not depend so much on file renaming, which makes one of the biggest filesystem power failure hazards less of an issue on Windows. But not being able to rename files over others as easily as in POSIX really sucks. And this doesn't completely absolve Windows app devs of having to think about power failure recovery! "POSIX semantics" is often short-hand for shortcomings of POSIX that happen to also be there in the WIN32 APIs, such as the lack of a filesystem write barrier, file stat-like system calls that mix different kinds of metadata (which sucks for distributed filesystems protocols), and so on. And yes, you can open files in Windows such that rename-over is allowed, so you still have this problem.
Ya NTFS file naming and long path is a wreck. Unzipping files from mac/linux is a easy way to end up with missing data. Applications quite often break on long file paths especially Microsofts own stuff like powershell.
I am talking about posix semantics yes, I have no idea how things work on windows.
There's the old saying that on Windows*, files have names; in POSIX, names have files. I think that's what makes the difference here.
* technically it's the filesystem as much as the OS that is relevant here.
* a very limited set of names in relation to what's possible on other operating systems.
If your processing files from other systems on NTFS you'll very likely have rename said files in an application and store an index of the names.
Phew. I've primarily used Windows. Not that any of the posix programs I've been exposed to have done the dir sync though.
For cross-platform stuff I've mainly used Boost, which I assumed handled such details.
I'm sure it does not.
Also these things are needed very very rarely (which is why few even know about the issue) and are not good for performance and battery life.
More specifically, these things are for trying to improve the behavior on unclean system shutdown (e.g. power loss) which is inherently chaotic and unless all parts (most critically the disk and its controller) are well behaved you don't have any real guarantees anyway.
Windows also doesn't guarantee that data is written to disk by the time WriteFile/CloseHandle returns, the Windows version of fsync is FlushFileBuffers.
One of the advantages of using a database for files is that it's relatively more likely that these platform-dependent considerations were indeed considered and you don't have to moonlight as an DBMS engineer while writing your application.
One reason Iike use sqlite so much in my personal projects, sometimes just using it to store various config data bits within row entries of the most generic table ever, works well with small projects, probably not so much with large scaled ones.
Hence the saying, "SQLite competes with fopen()".
Yes. File system implementations never really thought this through, and hence here we are.
Is anything ever enough aside from powering down the PC and then reading it back again to make sure its there?
Great another case of simple thing everyone knows is simple but turns out to be horrifyingly complicated and heavily system dependent, but only on occasion so you can get 80% of the way through your career before encountering the gaps in your knowledge.
I guess I'll add it to the list.
Of course on the other hand, I was already thinking that I should just use SQLite for all my file handling needs. This little nugget makes me think that this was the correct intuition. [Queue horrifying revelations w.r.t. SQLite here.]
w.r.t SQLite, the only horrifying revelation I’ve had is that it allows NULLs in composite primary keys, which I’ve seen lead to some nasty bugs in practice.
Yes, you have to declare PRIMARY KEY columns as NOT NULL. There's lots of little caveats like this about SQLite3. So what.
You don't need even to bother with fsync unless you are developing a database or use the file system like a database.
There's a reason Apple made fsync useless and introduced F_FULLSYNC. They know developers have an incentive to overestimate the importance of their own files at the detriment to system responsiveness and power draw.
The problem is, Apple's fsync doesn't even introduce any useful semantics. All it says is "write this eventually".
What a lot of people really need is just a "provide ordering, don't leave me with inconsistent data" operation. If you lose power "after" a write but before the fsync, for any non-networked application that's no different than losing power before the write, as long as the filesystem doesn't introduce chaos.
That seems significantly harder to implement, though, given that dirty page cache entries don't have an order entry as far as I know, and so retroactively figuring out which writes not to reorder with others is anything but trivial.
Agreed, things like Firefox constantly fsyncing its profile database is absurd.
You need to `fflush()` too, before the `fsync()`.
"if you are creating a file, to ensure full synchronisation you also need to fsync the parent directory, otherwise the file can be fsynced but the update to the directory lost."
And if you need this in Java you still have resort to ugly hacks.
https://github.com/apache/lucene/issues/7231
This says the bug is fixed? https://bugs.openjdk.org/browse/JDK-8066915
The point is that there is no official way to fsync a directory in Java and that everyone is relying on an unintentional side effect of an unrelated function to accomplish it. The link I supplied is about the fact that the side effect briefly disappeared in Java 9 until enough people complained.
We're still living in xkcd 1172 land with this, have been for a decade or who knows how long.
Java does "abstract" the operating system away from you and in systems programming with java you can end up with leaky abstractions.
More on fsyncgate: https://lwn.net/Articles/752063/
"listen to what close() says"
and
"don't believe what close() says"
are two different things.
The article is only (initially) talking about the first, and that is valid.
The second is just second-guessing the OS and hardware environment, and is invalid.
Here's the rule to figure out if you need fsync() or not: "If you think you might need fsync(), you don't." ;)
There are almost no cases where you should worry about the underlying layers. Basically if you aren't writing the filesystem itself, then you shouldn't be calling fsync().
You have to check what open()/close() etc said, but if close() said it worked, then it worked. You're done. The fact that lightning might have struck the drive just exactly then is not your problem and not something you should try to do anything about.
Unreliable networks, busses, batteries, etc none of that changes this. Those things all already have their own layers with their own responsibilities to be doing all the necessary testing and verifying and retrying before they return a success code to you.
There is open(...,O_SYNC) and mount -o sync (udev rules) for the case of a camera or thumb drive connect by usb etc.
It's not merely that you don't have to, it's that it's actively wrong to. fsync() is just joggling someone else's elbow while they're trying to do their job, and if it seems to solve some problem, that actually just exposes that you have some logic or order of operations problem and you aren't doing your own job.
"trust the other layer" or "trust the api contract" is unrelated to and does not conflict with "Be forgiving in your inputs and strict in your outputs.".
It just means:
Do: Check the returned error from, say, malloc().
Don't: Get a success from malloc() and then go try to do things to prove that malloc() actually did work.
That would be insane and impossible because it would have to apply equally to everything, including every single keyword or function you would use as part of the verification. How do you know when you so much as set a value to variable that it actually got set? If you printf the variable to prove it, how do you know printf didn't lie?
The logic for trusting close() is no different from the logic for trusting malloc().
Our responsibility is just to stay within the bounds of defined behavior and not make any assumptions about anything that isn't promised.
> The fact that lightning might have struck the drive just exactly then is not your problem [...] If you think you might need fsync(), you don't. [...] Those things all already have their own layers with their own responsibilities to be doing all the necessary testing [...]
You're mixing up the concepts of durability and consistency in pretty significant ways here, and are implying that everybody that is fine with a lack of the former will also be fine with a lack of the latter.
This is absolutely not true, and can cause extremely painful and hard-to-track-down bugs.
For better or worse, there is no way to directly tell the OS "whatever you do, make sure you don't reorder these writes I just did with those other writes I'm about to do".
The next best (portable) thing we have to achieve that outcome is fsync. It's a bit heavy handed, in that it gives you durability even if you only want consistency. That absolutely doesn't mean it's redundant, though.
> Basically if you aren't writing the filesystem itself, then you shouldn't be calling fsync().
Given that fsync is a syscall, but file systems are generally implemented in the kernel, this is a pretty nonsensical statement by itself.
File systems usually have (and need, for performance) a much lower level view of the underlying block storage, and fine-grained control over it.
Just as one example, Linux has the concept of write barriers. Not using these correctly (in the filesystem driver) can cause data leaks across files owned by different users and processes.
"there is no way to directly tell the OS "whatever you do, make sure you don't reorder these writes I just did with those other writes I'm about to do"
Perhaps because there is no reason for such a thing to exist.
Tell me an example.
Text editors: I want my file content to either be what it was before I pressed "save", or what it was afterwards; I don't want my file to be mangled halfway through. (In other words, in this case I need consistency, but not necessarily durability.) How would you do that without fsync?
Databases: People generally don't like unrecoverable consistency errors just because their computer crashed during a write. Not generally possible with reordered writes.
Sometimes people also need durability on top of consistency, e.g. for everything where you want to make at most one request to some server; you can do that by e.g. writing "I did the thing" to a log file, fsync'ing it, and then making your request.
Here is my favorite solution to this problem
Use like this - you must name the error return This closes the file and if there wasn't an existing error, writes the error from f.Close() in there.I caution against this approach, as you are not really dealing with the error when it occurs. If the work you do after the defer has other side effects, you may have just gotten your application into an inconsistent state and it's very hard to see in code why this might be.
`defer` is really not well-suited for error handling, its benefit is mainly in resource cleanup where failure is impossible or doesn't matter. (This makes it fine for `Close` on read-only file I/O operations, and not so great for writes.)
> If the work you do after the defer has other side effects, you may have just gotten your application into an inconsistent state and it's very hard to see in code why this might be.
Can you give an example case of how this could happen?
This is a contrived example, but imagine a situation where I have a file I want to write on disk and then have a reference to it in a database. If I have a flow like:
This function might have updated the user in the database with a new file despite the fact that `CheckClose` (defined up-thread) does check to see if the `Close` failed and returned an error. The calling code won't have known this has happened.The core problem is that the error checking is not done soon enough, either because Go programmers are conditioned to `defer f.Close()` from nearly all example code -- most of it demonstrating reads, not writes -- or because they are handling the error, but only in a deferred function, not earlier.
A more correct way to do this would be:
`Sync()` flushes the data to disk, and `Close()` gives a "last-chance" opportunity to return an error. The `defer f.Close()` exists as a way to ensure resource cleanup if an error occurs before the explicit `f.Close()` toward the end of the function. As I mentioned in an update to the post, double `Close()` is fine.I think a better solution is to write smaller, single-purpose functions. To refer to your example downthread, you should have one function that only writes the file, and another that does the "whole" operation -- calling the function to write the file, checking for errors, and then updating the database.
Then you can use defer in the file-writing function if you so please, and not bother to close at the end explicitly, without issue. A more robust example might be to even include the sync call in the deferred function (and even clean up the file itself on error). To re-use your example from your blog post:.
I would probably move that out into a helper, though, so I could do something like instead, and be able to use it elsewhere. Hell, even without defer, it's nice to have a helper that will sync and close for you so you can avoid the boilerplate, if you have lots of different bits of code that writes files.FWIW, I'm not sure why you are so negative on named return values, but I'm at best a novice Go programmer, so perhaps I don't fully understand why they aren't great (I guess it does look weird to me to have bare `return` statements that do actually return a value even though it doesn't look like it). Your argument about the return value possibly being modified after the core function finishes being unintuitive doesn't really strike me as a big deal either.
> If the work you do after the defer has other side effects
Defer is by definition the last work you do in a function, there won't be more work except by the caller who will get the error returned to them.
If you are structuring a function that writes a file, and then does something with it, defer isn't appropriate, since you should close it before you do any more work.
It's possible to have multiple defers in a function though (so you have multiple "last work in a function"; nowhere is it dictated that a function should only have one operation that needs to clean something up at the end. Think for example copying one file to another.
I use a similar pattern, cribbed from one of the Go databases that aspired never to ignore errors. The difference is that errs.Capture preserves existing errors and formats the error with a message--important if generalized to handle any error function.
I conventionally use mErr to distinguish from err.I still question why defer doesn't support doing exactly that.
After all it's like the go language provide us with a cleanup function that in 99% of the time shouldn't be used unless we manually wrap what it's calling to properly handle error.
In the end, what's the point of defer ?
> I still question why defer doesn't support doing exactly that.
When would it ever be useful? You'd soon start to hate life if you actually tried using the above function in anything beyond a toy application.
> 99% of the time shouldn't be used
1. 99% of the time it is fine to use without further consideration. Even if there are errors, they don't matter. The example from the parent comment is a perfect case in point. Who cares if Close fails? It doesn't affect you in any way.
2. 0.999% of the time if you have a function that combines an operation that might fail in a manner you need to deal with along with cleanup it will be designed to allow being called more than once, allowing you, the caller, to separate the operation and cleanup phases in your code.
3. 0.001% you might have to be careful about its use if a package has an ill-conceived API. If you can, fix the API. The chances of you encountering this is slim, though, especially if you don't randomly import packages written by a high school student writing code for the first time ever.
The whole point of this post is that an error returned from file.Close DOES matter
I think from a Go point of view, the lesson to be drawn from that is "don't defer a function call if you need to check its error value", rather than "defer needs to support checking of function return values".
In the example at hand, it really makes more sense to call Close() as soon as possible after the file is written. It's more of an issue with the underlying OS file API making error checking difficult.
In 99% of cases, the solution to this problem will be to use a WriteFile function that opens, writes and closes the file and does all the error handling for you.
> the lesson to be drawn from that is "don't defer a function call if you need to check its error value"
Isn't the lesson here: If you must have a Close method that might fail in your API, ensure it can safely be called multiple times?
As long as that is true, you can approach it like you would any other API that has resources that might need to be cleaned up.
(os.File supports this, expectedly)> the solution to this problem will be to use a WriteFile function
If it were the solution you'd already be using os.WriteFile. It has a time and place, but often it is not suitable. Notably because it requires the entire file contents to be first stored in memory, which can become problematic.
Certainly you could write a custom WriteFile function that is tuned to your specific requirements, but now you're back to needing to be familiar with the intricacies of a lower-level API in order to facilitate that.
Sure, that's an alternative, although it means there will be some code paths where the error returned by f.Close() becomes the error returned by the entire function and others where it is ignored (though you could easily log it). That might be fine, but you also might want to handle all the cases explicitly and return a combined error in a case where, say, a non-file-related operation fails and then the file also fails to close.
> becomes the error returned by the entire function
If you find the error returned by f.Close to be significant, are you sure returning again it is the right course of action? Most likely you want to do something more meaningful with that state, like retrying the write with an alternate storage device.
Returning the error is giving up, and giving up just because a file didn't close does not make for a very robust system. Not all programs need to be robust, necessarily, but Go is definitely geared towards building systems that are intended to be robust.
It obviously depends on the context. There's no general right or wrong answer to that question.
You seem confused. The article is about writing a file where it does matter, but the comment example, which is what we're talking about, only reads a file. If close fails after read, who gives a shit? What difference is it going to make? All your read operations are complete already. Close isn't going to trigger a time machine that goes back and time and undos the reads you've performed. It is entirely inconsequential.
> If close fails after read, who gives a shit?
ulimit -n
You ignore errors on close, and one morning you wake up with your app in CrashLoopBackoff with the final log message "too many files". How do you start debugging this?
Compare the process to the case where you do log errors, and your log is full of "close /mnt/some-terrible-fuse-filesystem/scratch.txt: input/output error". Still baffling of course, but you have some idea where to go next.
To start, you need to figure out why Kubernetes isn't retaining your stack trace/related metadata when the app crashes. That is the most pressing bug. Which is probably best left to the k9s team. You outsourced that aspect of the business of good reason, no doubt.
After they've fixed what they need to fix you need to use the information now being retained to narrow down why your app is crashing at all. Failing to open a file is expected behaviour. It should not be crashing.
Then maybe you can get around to looking at the close issue. But it's the least of your concerns. You've got way bigger problems to tackle first.
The app crashes because "too many files" includes the fd accept(2) wants to allocate so your app can respond to the health check.
A file not able to opened is expected, always! accept is no exception here. Your application should not be crashing because of it.
If I recall, Kubernetes performs health checks over HTTP, so presumably your application is using the standard library's http server to provide that? If so, accept is full abstracted away. So, if that's crashing, that's a bug in Go.
Is that for you to debug, or is it best passed on to the Go team?
There isn't a bug, it's resource exhaustion. You open a bunch of files and they fail to close. You don't log errors on the close, so you have no idea it's happening. Now your app is failing to open new file descriptors to accept HTTP connections. You get a fixed number of fds per app; ulimit -n. If you don't close files you've read, the descriptor is gone.
The bug in this case is in the filesystem that hangs on close. It happens on network filesystems. You can't return the fd to the kernel if your filesystem doesn't let you.
The bug of which we speak is in that your app is crashing. Exhausting open file handles is expected behaviour! Expected behaviour should not lead to a crash. Crashing is only for exceptional behaviour.
The filesystem hanging is unlikely to be a bug. The filesystems you'd realistically use in conjunction with Kubernetes are pretty heavily tested. More likely it is supposed to hang under whatever conditions has lead that to happen.
And, sure, maybe you'll eventually want to determine why the filesystem has moved into that failure state, but most pressing is that your app is crashing. All that work you put into gracefully handling the failing situation going to waste.
You're really hung up on Kubernetes but it was an incidental comment in a hypothetical story.
"You wake up and find out that Heroku's staff is anxiously awaiting your departure from your apartment to tell you that your app is down."
Kubernetes is really here nor there. It's the crashing of the app that is our focus. An app should not be crashing on expected behaviour.
That's clearly a bug, and the bug you need to fix first so that you can have your failsafes start working again. You asked where to start and that's the answer, unquestionably.
The app doesn't crash, it's deadlocked. It can't do any more work because to do future work it needs to accept TCP connections. It can't do that because it has hit a resource limit. It hit the resource limit because it didn't correctly close files. It can't close files because of a bug in the filesystem. You don't know this because you didn't log the errors.
I really don't know how I can make my explanation simpler.
> The app doesn't crash
You literally said that it crashes:
https://news.ycombinator.com/item?id=41505892> I really don't know how I can make my explanation simpler.
Not making up some elaborate story that you now are trying to say didn't even happen would be a good start. What you are actually trying to communicate is not complicated at all. It didn't need a story. Not sure what you were thinking when you decided fiction writing was a good idea, but I certainly had fun making fun of you for it! So, at least it was not all for not.
Only if you can safely assume the OS, file system, or std lib cleans up any open file handles that failed to close; I'm 99% sure this is the case in 99% of cases, but there may be edge cases (very specific filesystems or hardware?) where it does matter? I don't know.
You can't safely assume that, but what are you going to do about it when it does fail? There is nothing you can do. There isn't usually a CloseClose function to use when Close fails. If Close fails, that's it. You're otherwise out of luck.
Certainly, in the write failure case where there is write failure you'd want to try writing to something else (ideally), notify someone that an operation didn't happen (as a last resort), or something to that effect in order to recover.
But in this case there is no need to try again and nobody really cares. Everything you needed the resources for is already successfully completed. If there is failure when releasing those resources, so what? There is nothing you can do about it.
>>>
What knowledge do you hope to gain of f.Close fails here?If I understand the OP correctly, if Close() fails then you can't trust that the data was written, even if the previous Write() succeeded.
Exactly that; for critical operations like e.g. a database, if a write fails you've got corrupted data and you have a Major Issue.
That said, I'm not sure how they would handle a file close failure, wouldn't the file be corrupted anyway because some of the bits may have been written? Then again, at least you can raise the alarms if Close fails, because silent failures are worse than failures.
> if a write fails.
We're talking about a read-only case. os.Open returns a read-only file handle. If you try writing to it, you'll get an error already at that point. If close fails, who cares?
> I'm not sure how they would handle a file close failure
Ideally there is some kind of failover you can resort to, but if there is no other option at very least you will want to notify a human that what they thought was written isn't actually. But when only reading, you don't need to fall back to anything – all the reads were successful – and what is it to the human? What they thought was supposed to happen did!
I don't know how you can confidently say "if close fails, who cares" all over this thread.
If close fails, I wanna know and I wanna know why.
Let me introduce you to the question mark, seen on your screen as the '?' character. It indicates that a question is asked. This is important as...
Never has "who cares" been confidently said. It has always been asked "who cares?". And not asked in a vacuum either, but specifically asked alongside the question of what is to be gained from the knowledge of the error.
We now know that you allegedly care, which is a promising start. But you purposefully ignored the other question, which questions the credibly of your care. You can't meaningfully care about something if you don't know why you care about it, and if you knew you'd have told us about it already as it nonsensical to answer to the "who cares?" question alone, so...
I care about it because it's unexpected. I seek to wrap my mind as fully as possible around the behavior of my programs. Unexpected behavior like this indicates there is something I do not understand. I cannot rely on my programs if I do not understand them. I cannot state they are robust and stable if I do not understand these sorts of details.
That's not it. You can't write to a file handle returned by os.Open.
https://pkg.go.dev/os#Open
Which of course isn't really a "solution", as it results in ambiguous error semantics; this is especially the case with Go's defer, as it pushes the code to the end of the entire function, not merely some intermediate relevant scope... the result is that a file used near the beginning of a function might fail to close but that error will not be realized until after something else in the function fails, after the point of no return on the close.
Really, this is kind of a fundamental limitation of automatic allocation semantics, which is effectively a monad that is being stacked with the error propagation monad in a confusing manner that means you "should" only defer operations which don't have errors.
Even in languages with exceptions, such as C++ and Java, they had to wrangle with this problem and failed to solve it: in C++ 11 or 17 or whatever, deconstructors are now by default nothrow in an attempt to prevent this kind of mistake.
...but then, what does one do with close?! In some sense, the entire concept of close must not fail, and yet it exists in a world where we don't really believe in anything that can't fail, as we like moving around failure semantics.
FWIW, Linus has suggested that the kernel should largely accept that application developers don't ever check the return value of close... but has also stated that developers should sync the file first if they care and also check close (at least, to be maximally correct).
But like, what does one do then? How do you ever recover from this? I actually think you can't, if you are in a cleanup operation... not without breaking your error regime. I think--and this is also where the article eventually goes in the updates (though without noting you can add your own boolean)--you should therefore both explicitly close (and/or maybe flush/sync) the file after writing to it and have a "if I didn't close this, close it" in your cleanup handler; critically, the (explicit) former checks for errors, while the (implicit) latter doesn't.
I think some of those issues with errors could be worked around. Wrapping errors provides a way to return a "cleanup failed" error that also includes the root cause of being a failed write. Likewise, there are packages for handling multi-errors.
I think the reality is that most of us push anything where the return status of Close would be important into the database, specifically because it handles semantics like this and simultaneous writes for us. It's like half the selling point of SQLite; you could write JSON documents and handle all the edge cases yourself, or just jam it in SQLite and quite worrying about Close and simultaneous writes and all that junk.
I mean, we can cause the same kind of problem there, as people might try to write a scope exit handler to commit a transaction. You'd then run into the same issue, and so "commit transaction" isn't a thing which should ever be in such a construct. Of course, deleting the objects for the transaction / closing the database connection / etc. would be fine to ignore errors from (and hopefully wouldn't/shouldn't fail anyway) and so those can and should be automated: you defer close but manually call and check a commit.
Once you accept this reality, the file case is the same: putting the sync and/or first critical close inline is equivalent work. The issue is that you simply can't -- no matter what the mechanism is -- slip back and forth between your scope maintenance and your error handling monads, resulting in needing cleanup operations where failure is not an option; and, so, you either must not care about the error in the context of the call or must do something even more drastic like terminate the entire program for violating semantics.
FWIW, I do appreciate that people are less likely to make that kind of mistake when working with a database, as people largely get that you should even try to commit a transaction if the code in it had failed somehow. Additionally, I appreciate that if you have a very tight scope -- which Go makes hard, but can still be pulled off -- the "close and throw an error if and only if we don't have an error right now" strategy is not at all horrible... it just isn't a "solution" to the underlying issue without an understanding of why.
Put elsewise, I think it is useful to appreciate that there is more of a universal theoretical / math reason why this is awkward and why it kind of needs to be built in a specific way, and that this issue transcends the syntax or even the implementation details of how you are trying to manage errors: at the end of the end of the day, all of these techniques people discuss are in some sense equivalent, and, at best, most of these workarounds at offer are ways to incorrectly model the problem due to some systems giving you too much rope.
I don't think that holds for database transactions because of the semantics of rollbacks. Trying to do a rollback after committing is effectively a no-op (I believe it returns an error, but doesn't actually change the DB), so you can defer a rollback and only call commit on the happy path.
I don't believe people generally care about the error context on the rollback, which makes it safe to defer into a context that can't interact with the error handling monads. Rollbacks shouldn't generally fail, even if they do there's basically nothing you can do about it, and there are few differences between a successful and failed rollback beyond resources on the DB server until the connection is closed.
The Commit is the portion that contains the context people care about in their errors, and that is still safely in a context where it can interact with error handling.
I believe files can get similar atomicity, but it requires doing IO in strange ways. E.g. updating a file isn't atomic, but mv'ing one is. So you can copy the file you want to update into /tmp, update the copy, and then mv the copy to the original file (commit is mv'ing it, rollback is rm'ing it or just ignoring it).
Database transactions aren't atomic and do have the same issue if they reference external resources, though. E.g. if you have a database that stores an index of S3 files, transactions won't save you from writing a file to S3 but then failing to write a record for it into the database. That does muddle the error handling again.
> FWIW, Linus has suggested that the kernel should largely accept that application developers don't ever check the return value of close... but has also stated that developers should sync the file first if they care and also check close (at least, to be maximally correct).
Close just behaves very differently depending on the actual filesystem, too. Usually it’s very fast because it doesn’t do much of anything, but e.g. on NFS close will actually wait for writeback to the server to complete (due to close-to-open semantics).
The solution is that there are multiple solutions that are suitable in different situations.
If you need "strict" requirements then it is likely impossible but you can get close enough if you use sqlite. Or more lightweight atomic/thread-safe/fault-tolerant file library. You could rollout your own: it easy to start, and continue until the error rate is tolerable for your application (though it may take more dev time).
If you don't need db-like strict guarantees. Just write your app knowing that it may fail (data may be lost, corrupted). It may be ok in a lot of cases.
I wouldn't mind the language adding some syntax for this, based on the java try-with-resource but using error values rather than exceptions. Something like
I believe your example doesn't introduce any new syntax though. Go is highly resistant to adding syntax (and that's a good thing, it keeps the tooling fast and barrier to entry low).
Rust has the same problem. Files are closed in `Drop` when the value goes out of scope, but all errors are silently ignored. To solve this there's `sync_all`[0].
Generally, relying on defer in Go or Drop in Rust for anything that can fail seems like an anti-pattern to me.
0: https://doc.rust-lang.org/std/fs/struct.File.html#method.syn...
An ownership consuming close(self) would make sense, but has not been added, there must be some good reason for that?
That nobody has gone through the effort of collating its requirements and writing an RFC after https://github.com/rust-lang/rfcs/pull/770 was closed (back in 2015).
I assume a big issue is that this is full of edge cases up the ass, and the value is somewhat limited in the sense that if you know you want durable writes you'll sync() and know you're fucked if you get an error, but close() does not guarantee a sync to disk, as the linux man page indicates:
> A successful close does not guarantee that the data has been successfully saved to disk, as the kernel uses the buffer cache to defer writes.
So you'd need a "close", and a "close_sync", and possibly also a "close_datasync" (if you're ok with discarding metadata). And one could argue at that point `close` has essentially no value beyond hopefully getting rid of the fd / handle, and drop already does a fine job of that.
IIRC the Rust book talks about files automatically being closed when dropped, and how that's better than having a close method. That's probably why it's not a separate method, even though it suppresses errors.
This is also why, in Rust, relying on drop to close a file (which ironically is the poster child for RAII) is a bad pattern. Closing a file can raise errors but you can't reasonably treat errors on drop.
What we really need is a way to handle effects in drop; one way to achieve that is to have the option to return Result in a drop, and if you do this then you need to handle errors at every point you drop such a variable, or the code won't compile. (This also solves the async drop issue: you would be forced to await the drop handling, or the code wouldn't compile)
> This is also why, in Rust, relying on drop to close a file (which ironically is the poster child for RAII) is a bad pattern.
Is it though? It ensures the fd is closed which is what you want, and if you have some form of unwinding in the language you can't really ask for more. And aborts are, if anything, worse.
It also works perfectly well for reading, there's no value to close errors then.
> Closing a file can raise errors but you can't reasonably treat errors on drop.
It's mostly useless anyway, since close does not guarantee that the data has been durably saved. If you want to know that, you need to sync the file, and in that case errors on close are mostly a waste of time:
- if you've opened the file for reading you don't care (errors are not actionable, since you can't retry closing on error)
- if you've flushed a write, you don't care (for the same reason as above)
The one case where it matters is if you care but missed it, in which case we'd need a bunch of different things:
- a version of must_use for implicit drops
- a consuming flush-and-close method on writeable files
- a separate type for readable and writeable files in order to hook both, and a suite of functions to convert back and forth because even if rust had the subtyping for you don't want to move from write to read without either flushing or explicitly opting out of it as you're moving into a "implicit drop is normal" regime
> one way to achieve that is to have the option to return Result in a drop, and if you do this then you need to handle errors at every point you drop such a variable
That is nonsensical, the entire point of drop is that it's a hook into default / implicit behaviour. How do you "handle errors" when a drop is called during a panic? It also doesn't make sense from the simple consideration that you can get drops in completely drop-unaware code.
Consuming methods is what you're looking for.
> close a file (which ironically is the poster child for RAII)
Yes, I call this "RAII is a lie" (T-shirt pending).
Closing file descriptors is univerally used to showcase RAII, but it should never be used for that.
C++ has the same problem:
https://github.com/isocpp/CppCoreGuidelines/issues/2203
In there, it is acknowledged that a manual Close() should always be provided, and used if you want guarantees.
> is a bad pattern
Good that Rust at least figured it out early that it's a bad pattern!
Never use RAII in situations where the cleanup can fail!
You can kind of achieve this at runtime like so
A method which takes foo and turns it into another type is slightly better. You can then unconditionally panic in the drop impl.
Neat, I didn't realize that would skip drop()!
I actually really like this idea. Is there somewhere I can read more about it?
Boggles my mind that after more than 60 years of computer science, we still design tools (programming languages) where the simplest tasks are full of gotchas and footguns. This is a great example.
> the simplest tasks
The tasks seem simple from 30,000 feet up in the air. Once you get down into the dirt you realize there's absolutely nothing simple about what you're proposing.
A filesystem is a giant shared data structure with several contractual requirements and zero guarantees. That people think a programming language could "solve" this is what is boggling to me.
It's hardly a footgun. Close may be able to report some additional errors with getting the data on persistent storage but it won't report all of them anyway. For most applications, ignoring the return of close is perfectly fine in practice.
Agreed. Just log it and move on. The code _probably_ wrote what it needed to even if it didn't close. If truly cared that you got everything out correctly, you'd need to do more work than a blind `defer Close()` anyway and you'd never have written the code like this.
the close() manpage says that it shouldn't be retried anyway, because one might end up closing a file that meanwhile had been opened with the same handle by a different thread.
The problem is that the filesystem primitives are garbage, so it is impossible to make something safe and reasonably performant. This is not a case of "speed at all costs" where huge footguns are added for marginal performance, this is avoiding 10x and up slowdowns that would be required to be safe due to the anemic primitives. If the filesystem had better primitives/APIs, like barriers and proper asynchronous completion, it would be trivial to design tools that are safe and performant. But, without them it is like trying to build a skyscraper out of mud and toothpicks.
To be honest, you sound like someone who has no experience with designing a filesystem and thinks he can do better because doesn't understand the problems at all.
I think this post overhypes the issue. So many writes we do just aren’t that important (e.g. logs, cli config, blah), Close fails rarely, and it’s pretty standard for casually developed application software to misbehave once the disk is full or breaking.
This is a classic safety / performance trade off that was properly selected in favor of performance.
The defer Close() is still quite useful as a way to avoid fd leaks.
Funny thing is that there is a near footgun with this go: if you defer and set a non named return in a defer, like cErr, that won’t actually set that variable. Not sure what actually happens in that case but godbolt would tell you. In that case, the error would get swallowed
Gotta keep the profession interesting somehow.
Windows doesn't have this gotcha, FWIW.
Surely this would all go away if Go had an exception handling mechanism like most mainstream languages do?
You'd just concentrate on the "happy path", you'd close the file, there'd be nothing to forget or write blog posts about because the exception would be propagated, without needing to write any lines of code.
This problem isn't solved with exceptions either. The problem is that finalizers (C++ destructors, Java's `finally` blocks, Go's `defer` etc.) shouldn't fail but `close()` can fail. Therefore, for 100% correctness, `close()` calls should be handled explicitly and not left to finalizers.
Finalizers shouldn't fail because they might be executed while another exception is already in flight. Three languages have three different behaviors when that happens but in my opinion they all do the wrong thing:
In C++, if you throw from a destructor while another exception is in flight, the program will be terminated. In Java, throwing from a `finally` block will "forget" the original exception. In Go (according to this article, I'm not familiar with it), error from `defer` will be ignored. None of these are ideal.
Java with try-with-resources does the correct thing: It attaches the new exception as a secondary exception to the currently in-flight exception.
Since function calls form a tree, exceptions must form a tree as well.
Doing this automatically is also one of the killer arguments for exceptions over error codes, IMO.
Go also has `errors.Join` since 1.20, and one of its uses is exactly this one: to be used in deferred close that can possible raise errors so the errors can stack each other: https://stackoverflow.com/a/78013962.
> Doing this automatically is also one of the killer arguments for exceptions over error codes, IMO.
Definitely doing this automatically is better than relying on the programmer to do this manually, but I wouldn't say this is the "killer" argument for exceptions over errors codes, because this doesn't add anything new to the argument of exceptions vs explicit error handling.
How automatic it is depends on the language, but error "objects" (rather than codes) can do this too. It is pretty great.
Yes, with enough syntactic sugar it can become equivalent. Exceptions can be viewed as sum types with special syntactic sugar in conjunction with the regular return types, and can in principle be implemented as such.
When I say "exceptions", I mean the source-level semantics, not how it's implemented behind the scenes.
> in my opinion they all do the wrong thing
What would be the right thing? Combining the original exception and the error from `close` into some kind of `MultipleError`?
That's probably the best option I think. I've heard Ada does that (but don't quote me on that). If you can access the original errors from the `MultipleError` object, at least you can tell the user what exactly went wrong.
I don't thing there's one true right thing™ though. That's why explicit handling is necessary: The compiler doesn't have enough context to handle it for you. The programmer needs to decide what's the right way to handle it.
Personally I do not believe the math of these two monads allows for any better solutions (and I do not believe multierror is correct ;P)... I am thereby also very curious what they think the correct thing to do here is.
Python handles this case by raising the new error but including a reference to the original error. By default, the formatted error shows both:
No. Your example is not similar to what we are discussing here
There is no way to catch that exception. It will just print a warning with the exception but it can't be handled.Java also does this.
The problem is handled with exceptions plus Java's try-with-resources or C#'s using statements or Python's context managers though, right?
Furthermore in Java since version 7 you can actually see both exceptions with the suppressed exceptions pattern.
See my reply to the sibling comment about try-with-resources. I'm not familiar with any of these mechanisms but giving access to suppressed exceptions is probably a step in the right direction.
It might fail, but usually doesn't. In C#, to handle an e.g. file open failure, you can just write
The exception handler is an enclosing scope for both file open and dispose (flush and close) operations. You can also hoist file variable to an outer scope to, for example, decide what to do if dispose throws for some reason. In practical terms, this is as much of an edge case as it gets, but can be expressed in a fairly straightforward way.I don't think this does the right thing. You are catching exception if Close fails and nothing else. The problem thought is what to do when after file is opened something fails first, and then Close also fails.
I'm catching any exception that occurs within the scope of try block.
This includes an exception when trying to open or write to the file. If I fail to write to a file, it will try to dispose the stream, which flushes it and closes the file handle. If disposing the file handle itself fails, which should never happen, the exception will occur in the finally block, which this exception handler catches too. If you need to disambiguate and handle each case differently, which is rarely needed, you can order try-catch-finally blocks differently with explicit dispose and different nesting. This, again, is not a practical scenario and most user code will just `using file = File.OpenWrite` it and let the exception bubble up.
Yes, and this ignoring of the original exception is the core of the problem discussed. If you are willing to lose supposedly written data, your approach is golden.
As said in the previous comment, you can place a variable in an outside scope and assign to it from within an try-catch block to handle an open file stream in a particular way upon failure. You can simply write more code to disambiguate and specifically handle errors from separate parts of the execution flow. In any case closing a file handle should never fail, but if it does - there are tools to deal with this.
Well the point it: the problem is not as simple in C# as your initial snippet suggests.
With exceptions you don’t silently ignore error and go on as if nothing happened.
Just simply not explicitly handling every single possible error is the correct choice in many scenarios - in which case it bubbles up to a general error handler, e.g. telling the user that something bad happened here and here.
What about Java's try-with-resources?
My Java knowledge is at least 10 years out of date so I'm not familiar with try-with-resources. But from what I can gather from Google, it looks like they now provide a way to access the suppressed exceptions, which is probably a step in the right direction.
Explicit error handling is a choice and implicit error handling through exceptions is not necessarily a feature.
Both have advantages and disadvantages, I’d say the more “modern” approach actually the opposite to what you state here, and is in my opinion the way to Go (pun int intended), though it’s also how Haskell does it. You’ll find the same philosophy in Rust, Zig, Swift and others which all build on the previous decades of throwing exceptions and how terribly that scales in terms of maintainability. Even in the “old world” like with Java you have Kotlin which does both.
I feel like you might be claiming that Haskell does something like Go does, but it actually doesn't: it supports monads, and so uses a monad to hide the error semantics entirely, providing exception-like syntax with automatic propagation.
(I might misunderstand your use of "though", though? It could be that you were just noting in passing how Haskell disagrees with all of these supposedly-"modern" languages, and instead leaned into the sane happy path semantics, thanks to monads.)
(edit: to be clear, though... I do not think exceptions solve this. I wrote a comment elsewhere on this thread about the semantics issue, but a few other people also wrote similar things while I was trying to type my overly-verbose reply ;P.)
It's worth noting that using Either style monadic error handling in IO code in Haskell is arguably an antipattern, as the Haskell runtime has its own built in exception handling (which actually works in a pretty conventional way). Some more info on this under 'Exceptions best practices ' here: https://tech.fpcomplete.com/haskell/tutorial/exceptions/
Go is doing the worst thing possible. It is neither expressive enough for proper sum types, nor does it have expressions (that are analogous to sum types with good defaults and syntactic sugar).
It is literally C’s shitty `errno` with syntactic sugar.
There is a lot of space between the spooky magic quasi-global errno integer and proper sum types. Go's errors are not magic globals nor are they mere integers, even if they aren't sum types either.
How is it not just errno? Especially that POSIX mandates errno to be thread-local, so not even that is a difference. Just because there is some syntactic sugar that converts it to a slightly more descriptive type than an int, doesn’t make it different.
Thread locals are not lexically scoped, they are not stored as part of the function call stack, and mutating them is not expressed anywhere in the function signature. They are global variables with thread-local storage, not local variables.
Go's error returns are not sum types, but they are product types. The return signature (T, error) indicates that two values will be returned essentially as a tuple by the function: one of type T and one of type error. Error-returning functions are pure functions (though they typically perform other, impure operations).
There is no syntactic sugar (both for good and for ill). The type of errors is an ordinary interface, with a single method. Any type can implement that interface, including strings and structs and slices. Errors can have as many contextual details as needed, including nested/wrapped error messages, specific parameters of loop iterations, multiple errors rolled up from multiple operations, etc.
Go's error return is just an ordinary but common use of its multiple-value returns. You can write a function/method that returns three ints and no errors:
You can even write a function that returns multiple errors: Try that with errno!> The return signature (T, error) indicates that two values will be returned essentially as a tuple by the function
Technically, yes, practically, it doesn’t tell you anything, as its most common usage is how it would be used as a sum type (either one or the other).
And yeah, I didn’t quite think of multiple return values, but that itself can be just syntactic sugar over an `out` parameter.
But these technical details aside, I am not convinced that it is not “as useless as errno-type error handling”, with the only caveat of it returning an interface that is slightly more informative.
> Explicit error handling is a choice
Yeah, it is - a bad choice IMO.
I know the "if err != nil" pattern is spoken up as some sort of cultural idiosyncrasy of Go, similar to the whitespace formatting in python. But so far, I haven't seen any actual data (or even arguments) why it is superior to exceptions, or which inherent problems of exceptions it solves. (The classical example of "it makes control flow more obvious and makes it easier to ensure that mandatory finalizers are not skipped" was just disproven by this very article)
So if there is more substantial criticism against exceptions than just FUD, I'd like to know it.
> But so far, I haven't seen any actual data (or even arguments)
OK, here's an argument.
- In order to write resilient software, programs must handle not only the "happy path" when things succeed, but the path where things might fail.
- Thus it is important for developers to 1) be aware of which operations may fail fail, and b) think about what the program should do in that case.
- Exceptions make it easier for the programmer to forget that something might fail, and to avoid thinking about what to do if it does fail.
- Go's error handling idiom makes it clear that an operation might fail, and prompts programmers to think about what to do in that case. (They may of course choose not to think about it, but at least they made a conscious choice at some level.)
Thus Go's error handling idiom nudges developers towards more resilient software than exception-based workflows.
Or to put it differently: Programming systems which may fail simply is ugly: there are an exponential number of ways a system may fail, and each one must be handled correctly. Exceptions hide this ugliness, but by doing so make it more likely that there will be cases not handled correctly. By exposing this ugliness, Go makes it more likely that most cases will be handled correctly.
> In order to write resilient software, programs must handle not only the "happy path" when things succeed, but the path where things might fail
And exceptions let you handle error conditions without making the actual business logic harder to read, with as little or much specificity as required.
> Thus it is important for developers to 1) be aware of which operations may fail fail, and b) think about what the program should do in that case
Checked exceptions/effect types exist, being explicit or implicit in function signatures is not a fundamental property of exceptions.
And what is clearer in terms of error handling — if err being every third line, with questionable handling logic, e.g. just printing or swallowing stuff (or gestures at the article), and definite human error from repetition —— or a well-defined block with proper scoping, without which the error case does the only reasonable thing — automatically bubbles up, making it possible to handle higher up. There is often no immediate action that can be done in certain exceptional situations, e.g. your ordinary function that writes a file can’t do anything about a full disc. The best it can do is to yell, so that the action that called it somewhere can do some evasive action, e.g. re-trying/notifying the user/etc.
> Exceptions make it easier for the programmer to forget that something might fail, and to avoid thinking about what to do if it does fail.
Disagree. If anything, something not being in a try-catch block says that it will be handled higher up (or checked exceptions making it part of the signature), and when it’s surrounded by it, I know what is the happy path, and unhappy path immediately, without it being crossed over (usually badly), as it would happen with if errs.
> Go's error handling idiom makes it clear that an operation might fail
What about the case when it both returns a value and an error?
> and prompts programmers to think about what to do in that case
Blindly if erring and printing out a random string is not error handling. That’s just noise, and a terrible trap for yourself, having to grep for useless error codes later on.
> If anything, something not being in a try-catch block says that it will be handled higher up
I don't think you get what I'm saying. Some functions will always succeed. Some functions fail in obvious ways. Some functions fail in non-obvious ways. How do you know, as you're scanning a long block of code, which operations may fail, and which will always succeed?
For instance, suppose you have code like the following:
Without looking at the function signature: If the key in the structure doesn't exist, what happens -- does it throw an exception, or return an empty value? Is it possible for JsonGetKey to fail to parse?And while checked exceptions might help, it's not perfect: Suppose your code block calls functions a(), b(), and c(); all of them return ErrParseFail, but while it's pretty obvious that a() or c() might fail that way, it's not at all obvious that b() would.
Secondly, even for operations that are obvious may fail: maybe you, as a senior programmer who has programmed with exceptions for years, are paranoid enough that you're always thinking in the back of your mind "what happens if this fails?" But I very much doubt a junior programmer is going to have that habit. Part of the intent of Go was to have a language for Google which would allow junior programmers could come to a previously unfamiliar bit of code and be reasonably effective very quickly.
> And what is clearer in terms of error handling — if err being every third line, with questionable handling logic, e.g. just printing or swallowing stuff (or gestures at the article), and definite human error from repetition
You'd almost never just print the result of error messages unless it's at the top level, or it's the equivalent of a script. In most cases, you bubble it up, often wrapping it with a message of what you were trying to do; e.g.:
That way at the top level (or wherever you do log the message), you have a stack not just of the function names and line numbers, but what the program was actually trying to do, potentially with specific values involved.Not having the equivalent of C's "must_check" is certainly a missing guard-rail in golang
> And what is clearer in terms of error handling...
It comes down to a judgement call. I think Golang's way is better. Yes, it makes the code look cluttered with exit paths, but that's because the code is cluttered with exit paths.
I can see that with experience, an exception-based developer would learn to see the implicit exit paths in most cases. So let me assert to you, that with experience, a check-the-return-values based developer also learns to filter out the explicit error paths to see the "happy path" algorithm clearly. But on the whole, I think the latter is likely to lead to fewer bugs, particularly for less experienced developers, but even for more experienced developers.
At any rate, now you've heard arguments for Go's error handling idiom; and if you don't agree, at least you can understand where the Golang crowd are coming from.
Because nobody has ever ignored errors in go?
I prefer it because of location. The error handling happens where the error occur. It makes large systems much more maintainable in an enterprise setting.
It’s not like exception handling and throwing around things to have them caught later is inherently bad. It’s just a different philosophy, one that I don’t personally like anymore. It’s down the same alley as things like OOP, SOLID or DRY. Things which have good concepts that way too often leads to code bases which are incredibly annoying to work with. Maybe not for small systems with short life times, but for systems where you’re going to be the 100th person working on something that’s been running for 30 years it’s just nice to not have to play detective. I’d like to put a little disclaimer in here, because that isn’t inherently a consequence of exception handling or any of the other concepts but it’s just what happens when people work on code on a Thursday afternoon after a tough week. The simpler less abstract things are made, the easier it’ll be to unravel, and simple error handling is dealing with the errors exactly where they occur.
As others point out, it’s not without its disadvantages. It’s just that in my experience, those are better disadvantages than the disadvantages of implicit error handling.
Sibling comments have good points but what made it click for me was the book "Exceptional C++" where all the possible sources of exceptions are pointed out -- it's usually twice as many as you would first guess by looking at the code, sometimes made more complicated by type-converting constructors and operator overloading and RAII.
Then consider that if any code you're calling is not exception-safe it makes the task of writing your code to be exception safe that much harder.
Then add on top of that the question of responsibility -- do you handle the exception close to where it gets thrown or farther up the call stack? Handling it too close to the issue may be missing some context and result in a lot of the same exception handling across many areas of the codebase. But handling it too far up misses context, too, and often leaves the program in an uncertain state where it's not clear whether the show must go on or if it's time to shut it down. It's not uncommon to see a mix of exceptions and returned error values to try and find a happy middle ground there.
Java tried to paper over some of these problems with fewer gotchas (more explicit separation between resource allocation and initialization) and compiler checks that exceptions are in the method signature and are always handled _somewhere_ up the call stack but this often results in handling of very vague exception types at a high level. Some developers, uncertain what the right way to handle an exception is, and not wanting to crash the program, will just silently drop the exception instead! Admittedly, you can do this in go-style and C-style error handling, too, but at least then it's not as far removed from the source of an error because it's annoying to have to keep passing an error response through so many function signatures.
I used to use exceptions a lot in C++ and Java. This changed when I started working at a place where a lot of C++ is used but without exceptions. It was ostensibly about runtime costs but when seniors were pressed on the issue it was clear there were a lot of these other reasons stated above, and ultimately about readability of the code (error handling being close to error source) and a philosophy of failing quickly when assertions fail (because keeping calm and carrying on leads to programs running that should have died before they could make more of a mess).
I know it's unpopular but, in light of some real problems with exceptions, I actually prefer the way Go makes you do it. It often encourages doing all the setup in one (or a few) function scopes, and it becomes pretty evident from a function's signature whether you should expect something to go wrong or not. Would I prefer it for a game-dev scripting language? no! But for a system language it is better IMHO
> I haven't seen any actual data (or even arguments) why it is superior to exceptions, or which inherent problems of exceptions it solves
It's faster. Doing all error handling via exception is not viable if you want speed.
Exceptions work well for errors in the sense that these rarely happen. But using them for general "this is the bad outcome" of an operation that can happen in the hot-path is problematic. For example pythons "next()" function which raises an exception if the end of the iterator is reached, if C++ did this it just wouldn't be used.
So in my opinion exceptions are nice if you want ease of use, and explicit error handling (look at rust with their question mark operation which makes it pretty easy to do) is the way to go for performance.
One advantage of explicit error handling like rust does it is: it forces you to handle the error path like other business logic. Again, if you write a short script, this is annoying and doesn't get you much but if the application is very important then doing so is a good thing since it forces you to think about where which errors can happen and how you handle them. With exceptions its very easy to completely forget that a can even happen and thus they get ignored and suddenly the program crashes with a unreadable error message.
This is merely an implementation tradeoff: I could trivially modify a compiler to support exceptions that were as fast as returning error sum types without any syntax changes, but it comes at the cost of making all non-failing code slower--as you have to litter the runtime with checks that exceptions elide--so we don't generally do it... but like, if you make a new language and don't go with the syntax of exceptions because you like the implementation of sum types, I feel you have misunderstood the problem space.
(As for "handling" errors, you should only have a few places in the entire codebase which do that... littering the entire codebase with opportunities to feel like you might could handle an error seems like a mistake as it just encourages more handling.)
Well, let's assume Go did commonly make use of exception handlers for error cases:
Cool. You've solved one problem layer, perhaps. But if you look closely you'll notice that code still has a bug!So clearly exception handlers aren't enough. There might be some good ideas in using exception handling, but if you strictly limit its use to errors you end up half-assing it. Why not go all the way and find a solution that works in all cases?
Maybe I'm too dumb too see the bug here.
Do you mean the issue that if both WriteString() and Close() throw an exception, the one from WriteString() will be swallowed?
That used to be a problem, but has been solved in more modern implementations using suppressed exception tracking.
Ha. I suppose you could argue that is also a bug, but not the one I was thinking of. Let's say this hypothetical Go where error over exception handling is the norm has no such issue.
Think more carefully about what we're actually trying to solve. It is not just about errors.
Is it that the defer() that recovers is called before the defer() that closes (and potentially fails)?
No. No rearranging of the code would fix the problem.
Hmm. Checked the docs, looks like os.Create returns *File and err, but the sample doesn’t check that error.
os.Create, per the topic of discussion, is said to throw an error using the exception handling mechanism rather than return error. The exception handler checks the error.
If you focus on errors, you’re going to never get it. The whole idea here is that the problem is bigger than errors.
You interpret Go using a different pattern for error handling as a criticism of or challenge to exceptions. Consider that languages can take different approaches, without having to get into “best” or “worse.” Every approach to error handling has advantages and shortcomings.
The Go designers have explained and talked about this decision many times. Some people don’t like it and maybe choose a different language. No one pushes FUD regarding exceptions. I get the impression that if you choose a Toyota you would think that your neighbor buying a Honda poses a criticism you have to address.
Discussed before many times on HN, such as:
https://news.ycombinator.com/item?id=4159641
Original article:
https://commandcenter.blogspot.com/2012/06/less-is-exponenti...
> Every approach to error handling has advantages and shortcomings
If under ‘every approach’ you explicitly exclude go’s terrible errno syntax sugar, and include exceptions and sum types, then yeah. There is zero advantage to go’s error handling compared to the proper sum typed solutions.
You don't like how Go implements error handling. We get it. Lots of people complain about that. Use a "proper" language more to your liking.
Go's designers have explained their decisions many times, I won't repeat their justifications here. Obviously they had to make choices in line with their overall vision for Go. They feel strongly that programmers should handle or at least acknowledge the possibility of every error, explicitly, right where it gets reported. Exceptions and sum types don't enforce that.
How are sum types not better in literally every possible way? Like, it’s as objective as it gets. Sum types makes it mandatory to unwrap the result - ergo some form of error handling. It is not the case with go.
The modern approach of Rust and Swift is basically checked exceptions done with a different painting.
You're not wrong, but that ship sailed away a dozen years ago.
Go does indeed has an exception handling system like most other languages. Errors and exceptions are very different things, though.
Of course, nothing stops you from building your own file handing package that overloads exception handlers to deal with errors. If it gains traction then it would prove the stdlib should consider a v2 API.
But that you already haven’t done so is telling…
You mean panics? They don't work across go routines so they're limited and hardly exception handling like most other languages
Which languages see exception handlers work across go routines?
Anyway, most don't. There is no difference from exception handlers in most other languages.
The syntax is a little different. Is that where you've become confused?
Languages in which try/catch/throw work across coroutine boundaries: Java, Javascript, and it's a standard feature in C++ coroutine libraries (and completely supported by the core C++ coroutine engine). So in my limited personal experience, among the languages that I am familiar with ... all of them except go.
Are there actually ANY languages other than go that have coroutines, and try/catch/throw mechanisms, where you cannot throw across a coroutine boundary?
And why would exception handlers NOT work across coroutine boundaries, other than laziness on the part of implementers?
> Languages in which try/catch/throw work across coroutine boundaries
You wha...? The question was about goroutines, not coroutines.
Besides, you'll notice that exception handlers cross coroutine boundaries in Go just fine. Your random tangent isn't even correct. Where did you dream up this idea to the contrary? I know coroutines are still new to Go, only officially landing in the latest release (experimentally in 1.22), but you'd think that would also mean their behaviour is fresh in your memory.
I'll take your avoidance of the original question to mean that no other language does it either.
I answered as I did because the original question seems strangely broken. The premise upon which the question is asked is flawed.
What are goroutines, other than peculiarly broken coroutines? (Notwithstanding your point that go has a non-broken implementation of coroutines at experimental release stage).
It is true that Javascript has a goroutine-like facility for executing coroutines on a seperate thread. But there are languages (c++, c# at least) where coroutines can execute on separate threads without suffering from the broken-ness of goroutines.
There's a self-selection problem though. People who prefer error handling through exceptions are just not going to use go. Period. Will not use it. Been there done that. No no no.
So it is telling. But I think what it actually tells is that people would have done it just use another language instead.
Most of the other options don't include native code compilation though. I suppose Swift is there but doesn't make much sense outside of the Apple ecosystem.
As they should. Scripting tasks are, indeed, best performed in scripting languages.
"You can always make your own" is missing the point. Of course it's useless if a single library implements error handling differently than the rest of the language. The question is why the language does have this kind of manual error handling as a standard in the first kind.
> If it gains traction then it would prove the stdlib should consider a v2 API.
Some library that behaves completely differently from the rest of the language and breaks all interop with the rest of the ecosystem will have a hard time gaining traction, no matter if the way the library does it is objectively better or not.
> The question is why the language does have this kind of manual error handling as a standard in the first kind.
Probably for the same reason Rust does, and why it suffers much the same problem:
1. It is what was in vogue in the 2010s.
2. More importantly, the problem isn't limited to errors. What have you gained treating errors as some hyper special case when they aren't any different than any other value?
I think we agree that we can do better, but seeing errors as special doesn't get you there. We need something that understands the all-encompassing problem.
So, failing that understanding, if you're going to do something that sucks, you may as well choose the least-sucky option, surely? Exception handling brings a horrible developer experience. To the point that in languages where errors over exception handling semantics are the norm, you will find that most developers simply give up on error handling entirely because it is so painful.
> Some library that behaves completely differently from the rest of the language and breaks all interop with the rest of the ecosystem will have a hard time gaining traction
I'm not sure history agrees. Ruby was also of the return values over exception handling mind before Rails came along. Rails pushed exception handlers for errors and developers went for it. Provide an API people actually want to use, and they'll use it. What was common before is inconsequential.
I expect what you are really saying is that exception handling wouldn't actually improve this example case even in the best case, and in the worst case developers would end up giving up on error handling leaving such a package to be a net negative to a codebase.
> Surely this would all go away if Go had an exception handling mechanism like most mainstream languages do?
Or monads, but that might be a step too far for the Go world considering the push-back against generics. When you have declarative error handling (a good thing imho) then monads really are the bees knees.
If you've never seen somebody type 'catch (Exception e) { logger.log("should never happen", e);}' then sure. In the real world people will often explicitly ignore the error, even when they are confronted with it.
Exceptions are a terrible error handling mechanism. You have no idea what throws and what doesn’t, it’s impossible to write defensive code that makes sense with exceptions. Errors as value is the only sane way to deal with errors. Granted, Go does it pretty badly but it’s still infinitely better than exceptions.
> You have no idea what throws and what doesn’t
You do in Java. It's called Checked Exceptions. It's a binding API contract and communicates this well not only when writing the code, but also when reviewing it.
> You have no idea what throws and what doesn’t
The answer here is that everything throws.
Any code can have a Null/Nil dereference error, any code can use an array and generate an out-of-bounds exception, etc.
When there's a null pointer dereference, that's a bug in the program; when closing a file fails, that's something external the program must handle.
[dead]
You don't want to just `fsync()`, but also flush whatever is buffered and then `fsync()`.
Another thing is that if you're holding an flock on that file, it's nice that closing it will drop the lock. Generally you want to drop the lock as soon as you're done writing to the file. Deferring the dropping of the lock to the deferred close might cause the lock to be held longer than needed and hold back other processes/threads -- this doesn't really apply in TFA's example since you're returning when done writing, but in other cases it could be a problem.
Do not risk closing twice if the interface does not specifically allow it! Doing so is a real good way to create hard-to-find corruption bugs. In general if you see `EBADF` when closing any fd values other than `-1`, that's a really good clue that there is a serious bug in that code.
The article suggests using a named return value `err` to allow the return value of `Close` to be be propagated - unless doing so would overwrite an earlier error:
Wouldn't it be better to use `errors.Join` in this scenario? Then if both `err` and `cerr` are non-nil, the function will return both errors (and if both are `nil`, it will return `nil`):OP here. Another commenter pointed out that `errors.Join` didn't exist when I wrote this, but I wouldn't have changed my guidance if it had.
The core issue here is that you want to deal with errors as soon as possible. The async nature of writes to files makes this more challenging. Part of the purpose of this post was to inform people that you don't necessarily always see write errors at write-time, sometimes you see them at close-time and should handle them there. (And sometimes you don't see them at all without fsync.)
Even if you do handle errors in your deferred close, you may be taking other actions that you'd rather not have until the file i/o is completed, and this could leave your program in an inconsistent state. Side effects in Go are common, so this is a practical concern and it is hard to spot and debug.
This is from 2017, errors.Join did not exist at the time. But yes, today you'd do it differently.
Go error handling is brutal.
I miss exceptions
Exceptions aren't exceptional though; they are too expensive for not-exceptional errors, like failing writes.
That said, a language feature where you can throw lightweight error values without generating a stack trace etc might be a middle ground. But it won't land in Go, given the discussion about alternative error handling some years ago.
Anyway, in practice it's not that bad. A write can go wrong, you as a developer should write code that handles that situation. Exceptions lead a developer to miss errors, or to not handle them in a finegrained manner - e.g. generic "catch all, log error, maybe" style error handling.
Exceptions are slow in some languages based upon how they are implemented. I'm not convinced that is fundamental to exceptions and rather a choice of how they were implemented. In Java exceptions arn't actually that slow, most of the cost is just allocating the exception object (and allocations in Java are fast).
> Exceptions lead a developer to miss errors, or to not handle them in a finegrained manner - e.g. generic "catch all, log error, maybe" style error handling.
I don't see how Go error handling makes people handle things any more explicitly than exceptions. Most people just `if err != nil { return err }`, which to be honest is the _correct_ logic in many cases, and it's pretty easy to forget to check if err and keep on trucking. At least with exceptions if you don't catch it your thread terminates making unhandled exceptions
Exception bubbling means its easier to catch the error at the level that makes sense, and because they are real objects type checking is easier as opposed to the performance of `errors.Is()` which is surprisingly slow.
> which to be honest is the _correct_ logic in many cases
It is almost never the correct logic. The only time it might be appropriate is in a private helper function that has limited scope around another function.
It is most definitely not the correct logic if you are returning that from a public function! For many reasons, but especially because it now binds you to the implementation of the function you called forevermore. That is a horrible place to be.
For example, find out your os.File usage would be better served by SQLite? Too bad. You can't change it now because the users of your function have come to rely on errors from the os file operations when they handle the error you give them. Their code can't deal with the errors coming out of SQLite.
Instead, you need to return errors that are relevant to your function. It may be appropriate to wrap the source error in some circumstances, but your error structures should compel the user to rely on your errors, leaving the wrapped error only for things like logging where a change in the future won't break the callers.
There was [1].
1 - https://go.googlesource.com/proposal/+/master/design/go2draf...
IMO the formatting of the error string returned by errors.Join is atrociously opinionated and not very logging-friendly - it adds a newline between each error message. I know I'm not the only one that has this opinion
It's a trivial function: https://cs.opensource.google/go/go/+/refs/tags/go1.23.1:src/...
You could write your own errorsJoin() and change Error() method to suit your needs.
But really in this particular scenario you would be better served by something like:
And then do: err = errorsConcat(err, f.Close())In the scenario described in this article, errors.Join() would most often reduce to that (in terms of what Error() string would produce).
Wouldn't you use slog if you want logging friendly?
Arguably, one should call `flush()` on the file first. Resource deallocation must always succeed; otherwise a lot of invariants break. This is why Zig's close method[0] ignores errors (with the exception of `EBADF`).
[0]: https://github.com/ziglang/zig/blob/fb0028a0d7b43a2a5dd05f07...
Note that the "unreachable" there is equivalent to assert(error != EBADF), so really it's not even an exception, it's just helpful to crash there when debugging if you get that error. Important to understand that EBADF is not a catchable error because the kernel may have already reused that file descriptor for something else, in which case you wouldn't get EBADF, you would close an unrelated file descriptor.
On production systems, it may often be better to completely ignore the problem at this level. On modern hardware if a disk is throwing an io error on write you are having a bad day. I can almost guarantee that while you might happily modify your code so it properly returns the error, there almost certainly aren't test cases for ensuring such situations are handled "correctly", especially since the error will almost certainly not occur in isolation.
It may often be better to handle the issue as a system failure with fanotify. https://docs.kernel.org/admin-guide/filesystem-monitoring.ht...
Please no. Just handle errors from `close()` when you're writing files.
> On modern hardware if a disk is throwing an io error on write you are having a bad day.
And how would you know you're having a bad day if apps ignore those errors?
I'm arguing that the proper thing to do here is to kill the process along with whatever else is using the block device.
Whether you handle the error or immediately or if you allow the error to occur after a defer, you still are almost certainly not handling it properly and are taking a speed hit for your troubles.
Absolutely not. Imagine a text editor failing when I hit save. Do you really want that to crash the application? No. You want a fallback to ctrl+A ctrl+C, paste into a email and send to yourself. Giving the user a choice to save on a different drive is also a possibility, maybe your thumb drive just got a nudge and lost connection for a second.
This is really all normal use cases under normal conditions.
Using signals for this really sucks. Anyways, you don't get to do this now because of backwards-compatibility. Just handle the error.
A write failing is totally normal. Unplug an external hard disk.
"Commit pending writes" and "discard file handle" should ideally be separate operations, with the former potentially returning errors and the latter being infallible. "Discard file handle" should be called from defer/destructiors/other language RAII constructs, while "commit pending writes" should be called explicitly in the program's control flow, with its error appropriately handled or passed up the stack.
Whether if it's allowed to write again after a "commit pending writes" operation or not is a separate design decision. If not, then in some languages it can be expressed as an ownership taking operation that still allows for handling the error.
More generally, don't ignore errors that you need to handle. This problem isn't exclusive to this very specific case on Go.
I think anything other than deferring Close() and then calling Close() again explicitly is overengineering. Like anything that requires creating a cleanup function, capturing a named return value, requires contorting in unnatural ways to handle a very common scenario. Just... defer Close() the soonest you can (after checking for errors), then call Close() again at the end. Most sane providers of Close() should handle being called multiple times (I know os.File does, as well as sql.Tx ).
I defer these AND check the errors. https://pkg.go.dev/go.uber.org/multierr#hdr-Deferred_Functio... has a nice API.
I wrote my own version for $WORK package (as our policy is that all errors must be wrapped with a message), used like:
You can steal the code from here: https://github.com/pachyderm/pachyderm/blob/master/src/inter...Finally, errcheck can detect errors-ignored-on-defer now, though not when run with golangci-lint for whatever reason. I switched to nogo a while ago and it found all the missed closes in our codebase, which were straightforward to fix. (Yes, I wish the linter auto-ignored cases where files opened for read had their errors ignored. It doesn't, so we just check them.)
Multi-errors are nice.
With your package, how do you suggest users of your function handle the error upstream?
Like this?
And if so, what lead you or your organization to prefer "stringly-typed" errors over the idiomatic approach?Stringly typed errors are also idiomatic in go, if you follow the stdlib.
Like, are you doing anything with TLS? String matching: https://github.com/golang/go/issues/35234
Using the stdlib ssh stuff? String matching: https://github.com/golang/go/issues/45207 / https://github.com/golang/go/issues/39259
Want to parse an address + port? netip.ParseAddrPort only returns strings ('errors.New' errors).
http/http2 is also a minefield of half-exported errors.
The go authors say to use 'errors.Is' and 'errors.As', but the go stdlib also defines an idiom, and the idiom it defines is that somewhere around 30% of all errors should be stringly typed, including many where you may want to have specific handling for them.
> Stringly typed errors are also idiomatic in go, if you follow the stdlib.
Realistically, you can't follow the standard library, except perhaps the newest additions. Idioms emerge and evolve with use. Much of the standard library was written before Go saw much use, being largely in place before the world got to see Go for the first time. Also, thanks to the Go1 guarantee, cannot be changed now.
If the aforementioned package was written in the 2000s, then it might be fair to say that it was in line with the idioms of the time. But it appears to have been written within the last year, and thus is not aligned with idioms of its age.
That's not to say it has to be. Idioms are not requirements. The "stringly-typed" design may be justified even knowing what we know in the 2020s. And, with that, we don't need to speculate about the justifications. We can let the author speak for himself as to why the choice was made.
netip is one of the newer additions to the stdlib (added in ~2022), and follows the venerable stringly typed error idiom.
I always interpreted the preference for stringly typed errors as a way to keep the Go language simpler. Good error handling is complicated and hard to read, and one of Go's values is that programs should be easy to read. As such, if you want good error handling, you should use a different language, like Java or Haskell or C++. This also helps keep people who might demand complicated things like generics away from the language, further keeping it simple.
My understanding was that many of the go idioms are there to scare off programming language theorists, who have a tendency to unnecessarily complicate everything with type theory, and error handling also seems to mostly be in that vein.
> and follows the venerable stringly typed error idiom.
Not exactly. It assumes that all error conditions within the functions provided by netip are of the same nature as it pertains to a single unit of work. In other words, there is only one type (not referring to the language's type system). Error type reuse where different failure points produce the same type of error does not violate current idioms. I cannot immediately think of any reason for why their assumption is wrong, so unless you have other ideas?
That is not the same situation as the deferred close wrapper, though. It is assuming that closing multiple file handles is the same operation, but clearly that's not true. If you were, say, writing a copy function the error handling of the read handle failure is unlikely to be the same as the handling of the write handle failure. The former doesn't tell you much, the latter is quite actionable. The failure points are distinct, and thus of different types (again, not referring to the type system).
The author's answer was basically that he is the only caller so if that problem arises in his code he'll simply modify the function to return idiomatic types. Which is fair for the lone wolf developer. When working alone anything goes! But it is not good API design generally speaking. It certainly wouldn't fly in something like the standard library or anywhere you have other developers.
> Error type reuse where different failure points produce the same type of error does not violate current idioms. I cannot immediately think of any reason for why their assumption is wrong, so unless you have other ideas?
I have a program that takes user input and parses it, and then displays an error. My program is for a language other than english so having it display a pop up with the message "invalid ip:port, square brackets can only be used with ipv6 addresses" in english is bad. Therefore I want to switch on the error message to display translated errors, but of course Go does not think that parsing errors are something that is important.
If parsing user input isn't a place to expose clear non-stringly-typed-errors, I don't know what is.
Note, it also gives bad errors in that some of them include details and the user input, and some don't, so displaying them to users will stutter or require parsing.
For example:
So in one case it has included the original user string, to make it clear what failed, and in another case it doesn't, so I'll have to always add in the context of the input (i.e. `fmt.Errorf("error parsing %q: %w", input, err)`) anyway in order to know what failed, but it'll stutter in every case where they do include the input.I know the answer to all my issues is the usual go thing of "a little copying is better than depending on the go stdlib" of course. At least for netip, forking it is fine, having to maintain a fork of the go net/http stack just to get halfway decent errors is a real pain.
I'm not sure the desire to perform a transformation on a value implies that there are multiple types. You speak to a real problem, of course, but perhaps at the wrong layer of abstraction. It seems the deeper seeded issue is that Go strings assume one language, which is not true. I wonder what native internationalization support might look like?
On multiple types, there's another reason there's multiple types.
Imagine a function that looks like:
The caller of this function will now want to distinguish between "Did we fail to parse input, or did we fail to do networking".The way to do that is to have `netip.ParseAddrPort` failures return a different error type than other methods, but the "idiomatic" go code above doesn't wrap the error with an additional type, so the caller can't distinguish between a network error (many of which are also just 'errors.New'), and a netip parse error (all of which are 'errors.New').
Pushing that responsibility onto the callers seems silly, and like a footgun, especially when several other packages do have typed errors that mean the caller can successfully identify the error without having to do verbose explicit wrapping.
> The way to do that is to have `netip.ParseAddrPort` failures return a different error type than other methods
No. Absolutely not. If you leave the callers of this hypothetical function to rely on the errors of the functions it calls, even if we assume those functions were written by an infallible deity, you've created a coupling that now binds you to the implementation forevermore. That's just plain reckless behaviour.
Return your own errors. I know it takes a tiny amount of extra thinking to figure out what types are relevant to your function, but is unquestionably worthwhile and would still be worthwhile even if all the functions you call were designed by an infallible entity.
> but the "idiomatic" go code
You're really stretching the use of idiomatic here. Not ever has that been considered idiomatic Go. The Go community has always been clear that you should never, ever write code like that. It is so painfully horrid for so many reasons that it could never be considered idiomatic.
errors.Is is true for whatever x.Close() returned.
If I'm going to do something upstream based on the error, I use a type. Most of the time, it's just log.Error text though.
> errors.Is is true for whatever x.Close() returned.
That may get you through if there is only one x.Close, but now you have to offer the guarantee to the callers that there will only ever be one Close error returned. Furthermore, any callers of your function have to ensure that they don't end up introducing additional Close error returns in the same vein.
Without such guarantees, you, the caller, need to resort to string matching to protect against undocumented functionality and/or future modifications when you handle the error. Sounds like a rough situation...
> If I'm going to do something upstream based on the error, I use a type.
Implying that you are a lone wolf developer? I think you make a good point that if you exist in your own world without other developers just about anything goes.
That said, the language used around the previously linked repository implies that it welcomes other developers using and working on the code, so it is not clear how you are "doing the upstream" in all cases.
> That may get you through if there is only one x.Close, but now you have to offer the guarantee to the callers that there will only ever be one Close error returned.
No you don't. It's a multierror, a feature of the standard library.
is true. This is a feature in the standard library.> Implying that you are a lone wolf developer?
You know that's not true from the history of the repository. We do have coding standards that reviewers enforce, this is one of them.
> It's a multierror, a feature of the standard library.
Understood (I read the code), but that doesn't help, and really has nothing to do with the topic at hand. Is the problem here that you don't have an understanding of what we're talking about?
> You know that's not true from the history of the repository.
Which is why the claim was identified as being strange and in need of clarification. How can "If I'm going to do something upstream based on the error, I use a type." be true? If there are many developers, you don't get to control the upstream.
I dislike the multiple close pattern - I was bitten by this behavior years ago, when the second close() ended up closing another file which had been opened between the first and second close ( I think they were actually sockets ). It was a bona fide bug on my side , but it made for unpleasant memories, and a general distrust of such idioms on my side unless there's a language wide guarantee somewhere in the picture.
I don't understand how that could happen, since the original file handle would have been invalidated.
Which operating system did you experience this under and was it the operating system, your Libc or what else in the stack which caused this?
The scenario is that after the original file handle is closed, a new open occurs and the file is assigned the same handle value.
Then code acting on the stale handle of the first file closes it, and accidentally closes the new file instead.
In the stdlib Go code for file.Close, it doesn't actually do the syscall after the first call to Close, so there's your language guarantee.
That is a scary sounding error though.
Older versions of go (1.0 for example ) were much less safe. I had a look at the code, and it closes the file directly, and marks it as unusable. However, if you do concurrent operations, you can race and close twice the underlying fd - which I think was my bug ( I shouldn’t have been closing things twice anyway !)
AFAIK Python/C# use a similar approach to Go - instead of `defer`, they have `using`/`with` statements. Go's `defer` seems more flexible though - it can execute custom code each time it's used, whereas `using`/`with` always call `__exit__`/`Dispose`.
How does the Python/C# approach compare to Go's in this situation? How are errors in `__exit__`/`Dispose` handled?
> Exceptions that occur during execution of this method will replace any exception that occurred in the body of the with statement
https://docs.python.org/3/library/stdtypes.html#contextmanag...
Thanks!
From the application point of view, exceptions are more natural to catch, so if the close inside the exit-function throws, the nearest catch block will not be far up the stack. Compared to go, where you are much more unlikely to have a recover block, because it is considered such a rare case.
Don't call close() twice if you're not using pidfds, this is racy. The fd could be reused in between the two calls. You'll risk closing random things a frustratingly small fraction of the time, creating very hard bugs for yourself.
Surely File.close will clear the fd on success...
You're right, looks like it does!
Yeah but is it safe for concurrent use?
If you’re performing concurrent unprotected closing of an fd, you’re deep into the wild lands of nonsense already.
(pretty much) only channels are safe for concurrent use in go
Didn't I see this thread the other day including comments? Investigating, Algolia search shows this thread as being posted 2 days ago, and the memorable comments too.
I noticed this before: https://news.ycombinator.com/item?id=40049170
@dang did the timestamps get messed up?
https://news.ycombinator.com/item?id=26998308
Why would the comments have changed timestamps though?
The posting timestamp is temporarily adjusted, such that people don't complain about a week day old submission being on the front page.
The comment timestamps are temporarily adjusted to be consistent with the adjusted posting times, such that readers aren't confused by the comments predating the apparent posting time.
The timestamps will revert back to their original values in a few days.
What I don't understand is why this went into the second chance pool if the original submission made it to the front page and got >100 points.
Can someone tell me what’s going on here? This very post including the comments appeared on HN two days as well. I thought I was getting crazy but Google confirms.
If a post doesn't get much attention, there's a small chance of it getting a boost a few days later. If this happens, the timestamps are all adjusted to make it look like it was just posted.
More details: https://news.ycombinator.com/item?id=26998308
Here is a related question that has been on my mind for a while, but I have yet to find a good answer for:
If I write to file on a reasonably recent Linux and a sane file system like ext or zfs, at which point do I have the guarantee that when I read the same file back, that it is consistent and complete?
Do I really need to fsync or is Linux smart enough to give me back the buffer cache? Does it make a difference if reader and and writer are in the same thread or same process?
Normally, immediately when write(2) returns.
It’s more complicated if the computer shut down in between, depending on how clean the shutdown was.
OP here. I appreciate the comments I've read here, and it might inspire a new blog post: "Defer is for resource cleanup, not error handling."
Confused why this is displayed as being 9 hours old when I remember seeing it on the front page a couple days ago. Search also seems to confirm this: https://hn.algolia.com/?dateRange=pastWeek&page=0&prefix=fal...
Best practice IMO would be to wrap the closable thing in a wrapper object that handles the case where Close is called multiple times and then defer close that one as well as closing at the end of the function. Another idea would be to define a small lambda that either returns the input value or returns error if Close returns an error if you want to use early returns.
The end of the article has my favorite solution.
This is also how I'd solve it in Rust, except the defer would be implicit.
Yes, note you’ll also see:
Instead of, I think this is likely a code style difference due to working with a linter that alarms on discarded error returns, but I’m not sure. Both options have the same behavior, unless you reassign f (defer f.Close() will use the original value, defer func() ... will use the current value).Yup, I wrote it that way because I have a linter for unused error returns with a handful of exceptions (like closing http request bodies).
I think I would add an exception for this too, because it crops up a lot.
I can't decide because in this pattern you ignore once and check once and I like the lint for the check. Ideally the linter could recognize this pattern. Even better would be if the linter could catch when you never close, I've made that mistake a few times.
Java's Closeable.close() has declared multiple invocations as safe for 20 years, when it first became an interface.
I’m a bit new to Golang, but not good programming practices. Isn’t ignoring the error value returned by a function a very bad practice in general? Regardless of if the function call returning it is used in defer? Not just for file write operations?
Ignoring errors is generally a poor practice. You don't have to stop your program on errors, but you should at least log some percentage of them so that when the failure cascades, you have some idea where the failure started.
Sometimes there is nothing you can do when there is an error, in that case there is no point in adding several layers of error forwarding until you ignore it somewhere higher up.
Is… NOT ignoring errors just not an option? I don’t get it. If you propagate errors up but not all the way to being handled, haven’t you failed in a very simple, easy to fix way? Should you have a linter catching these things?
In this case the issue is that defer is a very good way to ensure you don’t forget to close the file in any branches, but a bad way to return values (you have to set the value of a named return variable, which is one of Go’s odder features).
> Should you have a linter catching these things?
JetBrains’ GoLand will in fact warn you of this. If the error truly is immaterial you can instead do
defer func() { _ = f.Close() }()
which is verbose but explicit in its intent to ignore the error.
> […] you have to set the value of a named return variable
Ahhhh okay I see it now. I definitely prefer to not use that feature as well, and I’m surprised it’s even there given how well the rest of the language adheres to “only one way to do things”. Doubly agree that it’s a strange “hack” for forwarding the deferred return value… oof
> JetBrains’ GoLand will in fact warn you of this
Heh yeah that’s what prompted me to ask, as I noticed (and very much appreciated) these hints. 100% agree with the verbose-but-explicit example you gave, and do that myself.
Isn’t what the article suggests (with defer and then WriteString) technically a race condition? Is there no way that the closer can get called before WriteString executes?
This is the way.
EDIT: fixed the bugdefer gets executed at the end of the current function
Thanks! Not a Go programmer, so I saw the parentheses after the cloaure def and assume it translated to “execute this on a goroutine in the background immediately”
Never thought that `Close` could fail, as `CloseHandle` on Windows never fails. I wonder how .NET behaves in the same scenario on Linux.
The following from the first update:
if err := f.Close(); err != nil { return err } return nil
Is equivalent to
return f.Close()
The snippet in the first update is very wrong. The manpage for Linux’s implementation of close() explicitly says that it should not be called again if it fails. Apparently, this is the same under FreeBSD.
The implementation of `os.File.Close()` in Go clears the file descriptor.
https://pkg.go.dev/os#File.Close
“Close will return an error if it has already been called.”
An os.File is a data structure containing a file descriptor and some other fields. It is safe to call Close() multiple times, because it will only call the underlying syscall close() once.
how, when and why can close() fail? and what can you do about it if it does?
Fails if (say) OS can't write out pending cache and confirm data written to device.
Causes include memory failure, drive cable melted, network cable pulled, etc.
What to do?
How important is the data being written? Is the only copy of just aquired data from a $10 million day geophysical survey? How much time and resources can you spend on work arounds, multiple copies, alternative storage paths, etc.
In aquisition you flush often, worst case lose a minute rather than a day.
In, say, seismic quisition, you might aquire audio data from microphone array and multi track raw audio to SEGY tape banks AND split raw data to thermal plotter AND processing WHERE RAW DATA -> (digitally to DAT AND hard drives) and through processing WHERE COOKED DATA -> digital storage.
In processing pipelines a failed write() or close() isn't so bad, you flag that it happened and you can try to repipe the raw data to get a savable second result.
Ultimately you want human operator control on what and when to do something - it's a hardware problem or resource starvation at the root.
Handling it this way in a user process is insane and essentially cargo culting. If your data is that valuable, you have redundant systems.
Custom hardware, custom real time kernel (acquisition|processing DSP boards) + loadable RT firmware, custom kernel + comms + window manager on main terminal.
Other than the recording redundancies described (raw analog logged, raw digital logged, raw paper chart created, cooked data logged, cooked paper chart, (raw | cooked each on tape, disk, paper) what are these "redundant systems" that you speak of?
Keeping in mind, of course, that the client has raw data, etc. on the contract as deliverables.
Do you imagine two full ships pulling two full microphone arrays to offset a rare (but happens) recoring failure? Now you've doubled the per dium costs and halved the area that can be covered in a typically short season.
Do you imagine one ship pulling two arrays that magically don't tangle? It doesn't work that way.
The goal here, of course, is to do all that as feasibly possible upfront in order to minimise aquisition time on the water and to ensure that all pings | booms | etc and their returns to multiple mic's recorded so the ship doesn't have to do a repass.
Expand on your non cargo culting non insane design ideas for 1970-1990s offshore seismic exploration by all means as what you intend isn't clear in your terse comment.
Keep in mind your design will need to be moved on and off arbitrary ships and will operate in places like the North Sea, Spratly Islands, etc. and will have to survive the pitch and toss of stormy weather (eg: attention to card fit in bus backbone).
TL;DR - defer Close ignores the error from Close, so don't.
It might be well implementation-dependent.
On Linux, the close() system call rarely fails unless you provide an invalid file descriptor. L.Torvalds has stated that on Linux, close() immediately removes the file descriptor from the process, regardless of the underlying implementation's success or failure. Any errors related to the actual closing of the file are handled within the kernel and won't affect user-space programs. I know that go Close is not posix/linux close, but in majority of cases it'll boil down to it.
To quote:
Eh, the real answer is close twice (as in the article update):
The .Close() method is safe to call multiple times. This behavior is documented on the os.File