init.py files are optional. Here's why you should still use them

17 points | by todsacerdoti 5 hours ago

34 comments

zaptheimpaler 2 hours ago
Maybe Pythons import/module system makes sense to someone but its caused me a lot of trouble. I'm not even sure exactly why but I run into weird issues often.
Like the fact that the imports are tied to the working directory things are expected to be run from is just very weird. There should just be one project root and all imports can then import via full paths from the root or paths relative to the file's location. Tying it to working directory is incredibly confusing. The same imports can work when run from one directory and fail from another..
In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python. Rust has some confusing situations too, although Java and Go seem to have more straightforward systems.
[-]
- js2 an hour ago
  The module search path (sys.path) is the thing you need to know about.
  With `python -m part2.week2`, by default `.` is added at the start.
  With `python part2/week2.py`, `part2` (the script directory) is added at the start (instead of `.`).
  So in the former case, `week2.py` is able to do something like `from part2 import ...` but not in the latter case.
  But this should work `PYTHONPATH=. python part2/week2.py`.
  https://docs.python.org/3/tutorial/modules.html#the-module-s...
  https://docs.python.org/3/library/sys_path_init.html#sys-pat...
- zahlman 44 minutes ago
  >Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python.
  I won't speak about other languages, but Python's system is indeed very complex - you just aren't usually exposed to most of the complexity, because it mainly takes the form of optional hooks. (Thanks for the reminder that I need to do a blog post about this some time - once I've figured out how to structure everything I want to say.)
  js2 gave you the how-to; so here's the detailed explanation. You might not want to try to tackle it all at once.
  Importantly, Python modules don't necessarily map one-to-one to `.py` files. They can be `import`ed from a precompiled `.pyc` bytecode file, which could be either in the searched folder directly or in a `__pycache__` subdirectory (the modern default). Or they can be loaded from a `.py` file within an archive, where `sys.path` contains path and filename of the archive rather than just a folder path. Or they can come from within the Python interpreter executable itself - and that's not just for the builtins. Or they can be anything else that ends up creating a `module` object, as long as you write the necessary custom importer.
  Aside from the well-known `sys.path`, there is a `sys.meta_path` which contains objects that represent ways to import modules:
```
    Python 3.12.4 (main, Jun 24 2024, 03:28:13) [GCC 11.4.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys.meta_path
    [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
```
  (IIRC: the `PathFinder` delegates to a separate loader which may be either a .py file loader, .pyc file loader, .zip archive loader etc. The `FrozenImporter` is for the non-"builtin" modules within the interpreter.)
  >the fact that the imports are tied to the working directory things are expected to be run from
  It doesn't actually work that way. It is tied to sys.path - which, by default, is initialized to include the working directory or an appropriate equivalent. See https://docs.python.org/3/library/sys_path_init.html for details.
  >There should just be one project root
  And what about when you want to import things that aren't within the project? How will Python know whether it should look within the project or somewhere else?
  >and all imports can then import via full paths from the root or paths relative to the file's location.
  Aside from the fact that "path" might not be as meaningful as you expect for every import, this is inconvenient for people who like src layout. The point of the dotted-path notation is that you get to use a symbolic name for the module which reflects the logical rather than physical structure of your project. Plus, you get to use relative imports without being limited by the clunky relative-path syntax. (It's important not to have that limitation, because any given package can be split across different folders in unrelated locations. In fact, that's the precise purpose of the "namespace package" feature discussed in TFA, and in more detail in the PEP (https://peps.python.org/pep-0420/ .)
  If you really mean that any valid file path should work, then the "one project root" no longer has meaning, and also it sounds like you're now on the hook for knowing the absolute path to the standard library (and to the `site-packages` directory for third-party libraries). As I'm sure you can imagine, that's not great for portability.
  Many people seem to think they want an `import` statement that uses file paths instead. Generally they don't understand what they'd be giving up.
  If you want it to work both ways, then you have to define the package structure semantics when a path is used, so that modules imported both ways can properly interoperate. It's not a simple task with a clear and unambiguous specification - if it were, someone would have implemented it already.
  It is possible to import "dynamically" by directly telling the import system, at runtime, what file to use (you can tell it what loader to use, too). See for example https://stackoverflow.com/questions/67631.
  >In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
  There are many details in the system, but the top-level understanding is quite simple. `-m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run). Since you're specifying a qualified module name (i.e., the package is a namespace), you use the dot notation. Since you're not specifying a path, Python uses its own internal logic to decide where the code is and what form it takes. Since you're "importing" under the hood, Python will leave a bytecode cache behind by default.
  Giving a path to a file means to run the code as a script - i.e., Python still does the usual file I/O, bytecode compilation if not cached, and execution of the top-level code; but it sets `__name__` to the special value `'__main__'` (so that "guarded" code does run), and it doesn't import any parent packages (because there aren't any) and thus relative imports can't work. Since you're specifying a path, you're directly telling Python where the code file is; Python only uses its internal logic to determine how to load it. Since this file is a "script" that represents the conceptual top level of whatever you're doing, it's not treated as an import, and so Python doesn't create a bytecode cache.
lolinder 2 hours ago
> Now, it’s immediately clear that component_a and component_b are packages, and services is just a directory. It also makes clear that “scripts” isn’t a package at all, and my_script.py isn’t something you should be importing. __init__.py files help developers understand the structure of your codebase.
The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
Then later—once I'm familiar enough with the project to know that the developers are using the marker file strictly to indicate modules—I'm already familiar with the directory structure and the marker files aren't doing anything for me.
I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions than have them sprinkle __init__.py and count on readers to understand they were being intentional and not superstitious.
[-]
- zahlman 37 minutes ago
  >The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
  This wouldn't be a problem, and `__init__.py` files would unambiguously have the intended meaning, if cargo-cult programmers didn't get to overwhelm the explanation of the import system in early Stack Overflow questions on the topic (and create a huge mess of awkwardly-overlapping Q&As that are popular but don't work well as duplicate targets).
  Just the title of this article conveys information that many Python programmers seem completely unaware of. The overwhelming majority of recommendations for `sys.path` hacks are completely unnecessary. Major real-world projects like Tensorflow can span hundreds of kloc without using them in the main codebase (you might see one or two uses to help Pytest or Sphinx along).
  >I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions
  There are many ways to do this, and people like having a standard. Also, as TFA explains, tooling can automatically detect the presence of `__init__.py` files and doesn't have to assign any special meaning to your naming decisions.
e-dant 3 hours ago
I hate init py files the same way I hate Java’s verbosity.
If it looks like a package, it’s a package.
If I want to be explicit, I’ll write something in C.
[-]
- deepsun 2 hours ago
  There's nothing better than verbosity on large projects, especially in rarely-touched areas (build/test/deploy infra).
  The perfect anti-pattern example is sbt. It shortened commands to unintelligible codes, even though you need to write it maybe once per year.
  And by the way, you never write full lines anyway, autocomplete knows what to write using just 2-3 keystrokes, even without any AI. So verbosity doesn't affect writing speed anyway.
- zahlman 24 minutes ago
  The problem is that packages can also look very different. For example, Python can import an entire package hierarchy from a zip file. (This is essential to how `zipapp` works; and under limited circumstances, a wheel file can also be used this way - which plays an essential role in how Pip is bootstrapped.)
  [-]
  - RhysU 16 minutes ago
    > For example, Python can import an entire package hierarchy from a zip file.
    As could Python's contemporaries back in the day, JAR extensions notwithstanding. That fact does not justify bad design.
- throwup238 2 hours ago
  > If I want to be explicit, I’ll write something in C.
  Absolutely love those explicit void pointers.
- OutOfHere 2 hours ago
  Fwiw, I never use them in Python if I don't need to put anything inside them. They're optional for a reason.
joshdavham 3 hours ago
From the zen of python: ‘Explicit is better than implicit.’
[-]
- ranger_danger 2 hours ago
  duck typing has entered the room
  [-]
  - bigstrat2003 2 hours ago
    Also treating non-boolean objects with a boolean meaning. Python is fine and all, but the Zen of Python has always been a joke that the language doesn't bother to follow.
    [-]
    - zahlman 41 minutes ago
      It's called "the Zen of Python", and not e.g. "the laws of Python", for a reason. Not everyone is, as Tim Peters put it, Dutch.
      (Although I should also say: PEP 20 reflects a 20-year-old understanding of Python's design paradigm and of the lessons learned from developing it, and Tim Peters was highly deferential to Guido van Rossum when publishing it. In fact, GvR was invited to add the supposedly missing 20th principle, but never did - IMHO, this refusal is very much in the Zen spirit. Anyway, the language - and standard library - have changed a lot since then.)
OutOfHere 2 hours ago
Thr post is overrated. The files are optional. I use them only if I need to put something inside them. As for imports, they work fine. And as for static analyzers, they work for me, as in I don't work for them.
[-]
- zahlman 34 minutes ago
  Fundamentally it's an argument about style, which tries to make some default assumptions about a programmer's needs - so of course some people won't find it very convincing. FWIW, I'm in your camp for the most part, but I do very much appreciate the other view.
thrdbndndn 2 hours ago
Per the example (the services/component_b/child example) in the article, are you supposed to put __init__.py in the sub-folder of a module too ?
I'm asking because I rarely see this be done (nor have I).
[-]
- zahlman 27 minutes ago
  Yes, it goes in every "regular package" folder (a folder without one creates a "namespace package" - or rather, a "portion" thereof; this allows for a parallel file hierarchy somewhere else that holds another part of the same package).
  Python represents packages as `module` objects - the exact same type, not even a subclass. These are created to represent either the folder or an `__init__.py` file.
  The file need not be empty and in fact works like any other module, with the exception of some special meaning given to certain attributes (in particular, `__all__`, which controls the behaviour of star-imports). Notably, you can use this to force modules and subpackages within the package to load when the package is imported, even if the user doesn't request them (`collections.abc` does this); and you can make some other module appear to be within the package by assigning it as an attribute (`os.path` works this way; when `os` is imported, some implementation module such as `ntpath` or `posixpath` is chosen and assigned).
- bastawhiz 2 hours ago
  Yes, any folder that contains files that are expected to be imported (or other folders that have the same)
bagels 2 hours ago
I have been writing python for decades and had no idea they were made optional. Amazing.
CrendKing 2 hours ago
I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.
[-]
- ericvsmith 2 hours ago
  This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.
  [-]
  - sevensor an hour ago
    I appreciate your work on pep 420; I’ve benefited from it personally as a Python user. Thank you for a job well done.
    [-]
    - ericvsmith 39 minutes ago
      Thank you for the kind words.
  - cyanydeez 2 hours ago
    JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.
    I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.
    [-]
    - bastawhiz 2 hours ago
      > JavaScript tooling requires index files for everything
      This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools
    - ElectricalUnion an hour ago
      > JavaScript tooling requires index files for everything
      You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.
    - nophunphil 2 hours ago
      What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.
- kevin_thibedeau 2 hours ago
  They aren't necessarily empty.
thelastparadise 3 hours ago
Java does this way better.
[-]
- RhysU 2 hours ago
  Agreed.
  Fie on "Implicit namespace package". If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
  Either "namespace" or "package" is also pointless linguistically. Noun-noun names ("namespace package") in programming are always a smell. Meh, it's a job career that pays the bills rent.
  Maybe "namespace" (no dunder init) vs "package" (dunder init) would have saved countless person-years of confusion? Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!) so there's no reason they need the same nouns.
cuteboy19 3 hours ago
then why was it made optional? implicit is better than explicit?
[-]
- diggan 3 hours ago
  Because sometimes implicit is best, other times explicit is best. Good to have options as different problems require different solutions.

__init__.py files are optional. Here's why you should still use them

init.py files are optional. Here's why you should still use them