Maybe Pythons import/module system makes sense to someone but its caused me a lot of trouble. I'm not even sure exactly why but I run into weird issues often.
Like the fact that the imports are tied to the working directory things are expected to be run from is just very weird. There should just be one project root and all imports can then import via full paths from the root or paths relative to the file's location. Tying it to working directory is incredibly confusing. The same imports can work when run from one directory and fail from another..
In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python. Rust has some confusing situations too, although Java and Go seem to have more straightforward systems.
>Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python.
I won't speak about other languages, but Python's system is indeed very complex - you just aren't usually exposed to most of the complexity, because it mainly takes the form of optional hooks. (Thanks for the reminder that I need to do a blog post about this some time - once I've figured out how to structure everything I want to say.)
js2 gave you the how-to; so here's the detailed explanation. You might not want to try to tackle it all at once.
Importantly, Python modules don't necessarily map one-to-one to `.py` files. They can be `import`ed from a precompiled `.pyc` bytecode file, which could be either in the searched folder directly or in a `__pycache__` subdirectory (the modern default). Or they can be loaded from a `.py` file within an archive, where `sys.path` contains path and filename of the archive rather than just a folder path. Or they can come from within the Python interpreter executable itself - and that's not just for the builtins. Or they can be anything else that ends up creating a `module` object, as long as you write the necessary custom importer.
Aside from the well-known `sys.path`, there is a `sys.meta_path` which contains objects that represent ways to import modules:
Python 3.12.4 (main, Jun 24 2024, 03:28:13) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.meta_path
[<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib_external.PathFinder'>]
(IIRC: the `PathFinder` delegates to a separate loader which may be either a .py file loader, .pyc file loader, .zip archive loader etc. The `FrozenImporter` is for the non-"builtin" modules within the interpreter.)
>the fact that the imports are tied to the working directory things are expected to be run from
It doesn't actually work that way. It is tied to sys.path - which, by default, is initialized to include the working directory or an appropriate equivalent. See https://docs.python.org/3/library/sys_path_init.html for details.
>There should just be one project root
And what about when you want to import things that aren't within the project? How will Python know whether it should look within the project or somewhere else?
>and all imports can then import via full paths from the root or paths relative to the file's location.
Aside from the fact that "path" might not be as meaningful as you expect for every import, this is inconvenient for people who like src layout. The point of the dotted-path notation is that you get to use a symbolic name for the module which reflects the logical rather than physical structure of your project. Plus, you get to use relative imports without being limited by the clunky relative-path syntax. (It's important not to have that limitation, because any given package can be split across different folders in unrelated locations. In fact, that's the precise purpose of the "namespace package" feature discussed in TFA, and in more detail in the PEP (https://peps.python.org/pep-0420/ .)
If you really mean that any valid file path should work, then the "one project root" no longer has meaning, and also it sounds like you're now on the hook for knowing the absolute path to the standard library (and to the `site-packages` directory for third-party libraries). As I'm sure you can imagine, that's not great for portability.
Many people seem to think they want an `import` statement that uses file paths instead. Generally they don't understand what they'd be giving up.
If you want it to work both ways, then you have to define the package structure semantics when a path is used, so that modules imported both ways can properly interoperate. It's not a simple task with a clear and unambiguous specification - if it were, someone would have implemented it already.
It is possible to import "dynamically" by directly telling the import system, at runtime, what file to use (you can tell it what loader to use, too). See for example https://stackoverflow.com/questions/67631.
>In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
There are many details in the system, but the top-level understanding is quite simple. `-m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run). Since you're specifying a qualified module name (i.e., the package is a namespace), you use the dot notation. Since you're not specifying a path, Python uses its own internal logic to decide where the code is and what form it takes. Since you're "importing" under the hood, Python will leave a bytecode cache behind by default.
Giving a path to a file means to run the code as a script - i.e., Python still does the usual file I/O, bytecode compilation if not cached, and execution of the top-level code; but it sets `__name__` to the special value `'__main__'` (so that "guarded" code does run), and it doesn't import any parent packages (because there aren't any) and thus relative imports can't work. Since you're specifying a path, you're directly telling Python where the code file is; Python only uses its internal logic to determine how to load it. Since this file is a "script" that represents the conceptual top level of whatever you're doing, it's not treated as an import, and so Python doesn't create a bytecode cache.
> Now, it’s immediately clear that component_a and component_b are packages, and services is just a directory. It also makes clear that “scripts” isn’t a package at all, and my_script.py isn’t something you should be importing. __init__.py files help developers understand the structure of your codebase.
The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
Then later—once I'm familiar enough with the project to know that the developers are using the marker file strictly to indicate modules—I'm already familiar with the directory structure and the marker files aren't doing anything for me.
I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions than have them sprinkle __init__.py and count on readers to understand they were being intentional and not superstitious.
>The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
This wouldn't be a problem, and `__init__.py` files would unambiguously have the intended meaning, if cargo-cult programmers didn't get to overwhelm the explanation of the import system in early Stack Overflow questions on the topic (and create a huge mess of awkwardly-overlapping Q&As that are popular but don't work well as duplicate targets).
Just the title of this article conveys information that many Python programmers seem completely unaware of. The overwhelming majority of recommendations for `sys.path` hacks are completely unnecessary. Major real-world projects like Tensorflow can span hundreds of kloc without using them in the main codebase (you might see one or two uses to help Pytest or Sphinx along).
>I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions
There are many ways to do this, and people like having a standard. Also, as TFA explains, tooling can automatically detect the presence of `__init__.py` files and doesn't have to assign any special meaning to your naming decisions.
There's nothing better than verbosity on large projects, especially in rarely-touched areas (build/test/deploy infra).
The perfect anti-pattern example is sbt. It shortened commands to unintelligible codes, even though you need to write it maybe once per year.
And by the way, you never write full lines anyway, autocomplete knows what to write using just 2-3 keystrokes, even without any AI. So verbosity doesn't affect writing speed anyway.
The problem is that packages can also look very different. For example, Python can import an entire package hierarchy from a zip file. (This is essential to how `zipapp` works; and under limited circumstances, a wheel file can also be used this way - which plays an essential role in how Pip is bootstrapped.)
Also treating non-boolean objects with a boolean meaning. Python is fine and all, but the Zen of Python has always been a joke that the language doesn't bother to follow.
It's called "the Zen of Python", and not e.g. "the laws of Python", for a reason. Not everyone is, as Tim Peters put it, Dutch.
(Although I should also say: PEP 20 reflects a 20-year-old understanding of Python's design paradigm and of the lessons learned from developing it, and Tim Peters was highly deferential to Guido van Rossum when publishing it. In fact, GvR was invited to add the supposedly missing 20th principle, but never did - IMHO, this refusal is very much in the Zen spirit. Anyway, the language - and standard library - have changed a lot since then.)
Thr post is overrated. The files are optional. I use them only if I need to put something inside them. As for imports, they work fine. And as for static analyzers, they work for me, as in I don't work for them.
Fundamentally it's an argument about style, which tries to make some default assumptions about a programmer's needs - so of course some people won't find it very convincing. FWIW, I'm in your camp for the most part, but I do very much appreciate the other view.
Yes, it goes in every "regular package" folder (a folder without one creates a "namespace package" - or rather, a "portion" thereof; this allows for a parallel file hierarchy somewhere else that holds another part of the same package).
Python represents packages as `module` objects - the exact same type, not even a subclass. These are created to represent either the folder or an `__init__.py` file.
The file need not be empty and in fact works like any other module, with the exception of some special meaning given to certain attributes (in particular, `__all__`, which controls the behaviour of star-imports). Notably, you can use this to force modules and subpackages within the package to load when the package is imported, even if the user doesn't request them (`collections.abc` does this); and you can make some other module appear to be within the package by assigning it as an attribute (`os.path` works this way; when `os` is imported, some implementation module such as `ntpath` or `posixpath` is chosen and assigned).
I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.
This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.
JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.
I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.
> JavaScript tooling requires index files for everything
This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools
> JavaScript tooling requires index files for everything
You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.
What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.
Fie on "Implicit namespace package". If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
Either "namespace" or "package" is also pointless linguistically. Noun-noun names ("namespace package") in programming are always a smell. Meh, it's a job career that pays the bills rent.
Maybe "namespace" (no dunder init) vs "package" (dunder init) would have saved countless person-years of confusion? Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!) so there's no reason they need the same nouns.
Maybe Pythons import/module system makes sense to someone but its caused me a lot of trouble. I'm not even sure exactly why but I run into weird issues often.
Like the fact that the imports are tied to the working directory things are expected to be run from is just very weird. There should just be one project root and all imports can then import via full paths from the root or paths relative to the file's location. Tying it to working directory is incredibly confusing. The same imports can work when run from one directory and fail from another..
In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python. Rust has some confusing situations too, although Java and Go seem to have more straightforward systems.
The module search path (sys.path) is the thing you need to know about.
With `python -m part2.week2`, by default `.` is added at the start.
With `python part2/week2.py`, `part2` (the script directory) is added at the start (instead of `.`).
So in the former case, `week2.py` is able to do something like `from part2 import ...` but not in the latter case.
But this should work `PYTHONPATH=. python part2/week2.py`.
https://docs.python.org/3/tutorial/modules.html#the-module-s...
https://docs.python.org/3/library/sys_path_init.html#sys-pat...
>Part of it might just be some implicit complexity in module systems as a whole that I don't get, not just python.
I won't speak about other languages, but Python's system is indeed very complex - you just aren't usually exposed to most of the complexity, because it mainly takes the form of optional hooks. (Thanks for the reminder that I need to do a blog post about this some time - once I've figured out how to structure everything I want to say.)
js2 gave you the how-to; so here's the detailed explanation. You might not want to try to tackle it all at once.
Importantly, Python modules don't necessarily map one-to-one to `.py` files. They can be `import`ed from a precompiled `.pyc` bytecode file, which could be either in the searched folder directly or in a `__pycache__` subdirectory (the modern default). Or they can be loaded from a `.py` file within an archive, where `sys.path` contains path and filename of the archive rather than just a folder path. Or they can come from within the Python interpreter executable itself - and that's not just for the builtins. Or they can be anything else that ends up creating a `module` object, as long as you write the necessary custom importer.
Aside from the well-known `sys.path`, there is a `sys.meta_path` which contains objects that represent ways to import modules:
(IIRC: the `PathFinder` delegates to a separate loader which may be either a .py file loader, .pyc file loader, .zip archive loader etc. The `FrozenImporter` is for the non-"builtin" modules within the interpreter.)>the fact that the imports are tied to the working directory things are expected to be run from
It doesn't actually work that way. It is tied to sys.path - which, by default, is initialized to include the working directory or an appropriate equivalent. See https://docs.python.org/3/library/sys_path_init.html for details.
>There should just be one project root
And what about when you want to import things that aren't within the project? How will Python know whether it should look within the project or somewhere else?
>and all imports can then import via full paths from the root or paths relative to the file's location.
Aside from the fact that "path" might not be as meaningful as you expect for every import, this is inconvenient for people who like src layout. The point of the dotted-path notation is that you get to use a symbolic name for the module which reflects the logical rather than physical structure of your project. Plus, you get to use relative imports without being limited by the clunky relative-path syntax. (It's important not to have that limitation, because any given package can be split across different folders in unrelated locations. In fact, that's the precise purpose of the "namespace package" feature discussed in TFA, and in more detail in the PEP (https://peps.python.org/pep-0420/ .)
If you really mean that any valid file path should work, then the "one project root" no longer has meaning, and also it sounds like you're now on the hook for knowing the absolute path to the standard library (and to the `site-packages` directory for third-party libraries). As I'm sure you can imagine, that's not great for portability.
Many people seem to think they want an `import` statement that uses file paths instead. Generally they don't understand what they'd be giving up.
If you want it to work both ways, then you have to define the package structure semantics when a path is used, so that modules imported both ways can properly interoperate. It's not a simple task with a clear and unambiguous specification - if it were, someone would have implemented it already.
It is possible to import "dynamically" by directly telling the import system, at runtime, what file to use (you can tell it what loader to use, too). See for example https://stackoverflow.com/questions/67631.
>In a project I have I need to run it like `python3 -m part2.week2` instead of `python3 part2/week2.py` or the imports will fail. I'll probably never understand this system :)
There are many details in the system, but the top-level understanding is quite simple. `-m` means to run the code as a module, i.e., to figure out what package it's in (by importing its ancestors first) and to set its `__name__` to the actual name of the module (so that code inside `if __name__ == '__main__':` will not run). Since you're specifying a qualified module name (i.e., the package is a namespace), you use the dot notation. Since you're not specifying a path, Python uses its own internal logic to decide where the code is and what form it takes. Since you're "importing" under the hood, Python will leave a bytecode cache behind by default.
Giving a path to a file means to run the code as a script - i.e., Python still does the usual file I/O, bytecode compilation if not cached, and execution of the top-level code; but it sets `__name__` to the special value `'__main__'` (so that "guarded" code does run), and it doesn't import any parent packages (because there aren't any) and thus relative imports can't work. Since you're specifying a path, you're directly telling Python where the code file is; Python only uses its internal logic to determine how to load it. Since this file is a "script" that represents the conceptual top level of whatever you're doing, it's not treated as an import, and so Python doesn't create a bytecode cache.
> Now, it’s immediately clear that component_a and component_b are packages, and services is just a directory. It also makes clear that “scripts” isn’t a package at all, and my_script.py isn’t something you should be importing. __init__.py files help developers understand the structure of your codebase.
The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
Then later—once I'm familiar enough with the project to know that the developers are using the marker file strictly to indicate modules—I'm already familiar with the directory structure and the marker files aren't doing anything for me.
I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions than have them sprinkle __init__.py and count on readers to understand they were being intentional and not superstitious.
>The problem with this theory is that the presence of an __init__.py in a new project I've never seen before doesn't necessarily mean anything—it could just have been sprinkled superstitiously because no one really understands what exactly these files are for.
This wouldn't be a problem, and `__init__.py` files would unambiguously have the intended meaning, if cargo-cult programmers didn't get to overwhelm the explanation of the import system in early Stack Overflow questions on the topic (and create a huge mess of awkwardly-overlapping Q&As that are popular but don't work well as duplicate targets).
Just the title of this article conveys information that many Python programmers seem completely unaware of. The overwhelming majority of recommendations for `sys.path` hacks are completely unnecessary. Major real-world projects like Tensorflow can span hundreds of kloc without using them in the main codebase (you might see one or two uses to help Pytest or Sphinx along).
>I'd rather see people in the situation above make the module structure clear in the actual folder hierarchy and naming decisions
There are many ways to do this, and people like having a standard. Also, as TFA explains, tooling can automatically detect the presence of `__init__.py` files and doesn't have to assign any special meaning to your naming decisions.
I hate init py files the same way I hate Java’s verbosity.
If it looks like a package, it’s a package.
If I want to be explicit, I’ll write something in C.
There's nothing better than verbosity on large projects, especially in rarely-touched areas (build/test/deploy infra).
The perfect anti-pattern example is sbt. It shortened commands to unintelligible codes, even though you need to write it maybe once per year.
And by the way, you never write full lines anyway, autocomplete knows what to write using just 2-3 keystrokes, even without any AI. So verbosity doesn't affect writing speed anyway.
The problem is that packages can also look very different. For example, Python can import an entire package hierarchy from a zip file. (This is essential to how `zipapp` works; and under limited circumstances, a wheel file can also be used this way - which plays an essential role in how Pip is bootstrapped.)
> For example, Python can import an entire package hierarchy from a zip file.
As could Python's contemporaries back in the day, JAR extensions notwithstanding. That fact does not justify bad design.
> If I want to be explicit, I’ll write something in C.
Absolutely love those explicit void pointers.
Fwiw, I never use them in Python if I don't need to put anything inside them. They're optional for a reason.
From the zen of python: ‘Explicit is better than implicit.’
duck typing has entered the room
Also treating non-boolean objects with a boolean meaning. Python is fine and all, but the Zen of Python has always been a joke that the language doesn't bother to follow.
It's called "the Zen of Python", and not e.g. "the laws of Python", for a reason. Not everyone is, as Tim Peters put it, Dutch.
(Although I should also say: PEP 20 reflects a 20-year-old understanding of Python's design paradigm and of the lessons learned from developing it, and Tim Peters was highly deferential to Guido van Rossum when publishing it. In fact, GvR was invited to add the supposedly missing 20th principle, but never did - IMHO, this refusal is very much in the Zen spirit. Anyway, the language - and standard library - have changed a lot since then.)
Thr post is overrated. The files are optional. I use them only if I need to put something inside them. As for imports, they work fine. And as for static analyzers, they work for me, as in I don't work for them.
Fundamentally it's an argument about style, which tries to make some default assumptions about a programmer's needs - so of course some people won't find it very convincing. FWIW, I'm in your camp for the most part, but I do very much appreciate the other view.
Per the example (the services/component_b/child example) in the article, are you supposed to put __init__.py in the sub-folder of a module too ?
I'm asking because I rarely see this be done (nor have I).
Yes, it goes in every "regular package" folder (a folder without one creates a "namespace package" - or rather, a "portion" thereof; this allows for a parallel file hierarchy somewhere else that holds another part of the same package).
Python represents packages as `module` objects - the exact same type, not even a subclass. These are created to represent either the folder or an `__init__.py` file.
The file need not be empty and in fact works like any other module, with the exception of some special meaning given to certain attributes (in particular, `__all__`, which controls the behaviour of star-imports). Notably, you can use this to force modules and subpackages within the package to load when the package is imported, even if the user doesn't request them (`collections.abc` does this); and you can make some other module appear to be within the package by assigning it as an attribute (`os.path` works this way; when `os` is imported, some implementation module such as `ntpath` or `posixpath` is chosen and assigned).
Yes, any folder that contains files that are expected to be imported (or other folders that have the same)
I have been writing python for decades and had no idea they were made optional. Amazing.
I don't know why didn't PEP 420 make one file at root of the package that describe the structure of all the sub-directories, rather than going the controversial, lazy route to remove __init__.py. That way you get both the explicitness and avoid the litering of empty marking files.
This is spelled out in the PEP (I’m the author). There isn’t one “portion” (a term defined in the PEP) to own that file. And it’s also not a requirement that all portions be installed in the same directory. I’m sorry you feel this was lazy, it was a lot of work spanning multiple attempts over multiple years.
I appreciate your work on pep 420; I’ve benefited from it personally as a Python user. Thank you for a job well done.
Thank you for the kind words.
JavaScript tooling requires index files for everything, which makes development slow, particularly when you want to iterate fast or create many files with single output.
I think it makes sense to make the compiler or script loader to rely on just the file and their contents. Either way you're already defining everything, why create an additional redundant set of definitions.
> JavaScript tooling requires index files for everything
This just isn't true. I've never encountered tooling that forces you to have these by default. If it's enforced, it's rules defined in your project or some unusual tools
> JavaScript tooling requires index files for everything
You mean barrel files? Those are horrible kludges used by lazy people to pretend they're hiding implementation details, generate arbitrary accidental circular imports, and end up causing absolute hell if you're using any sort of naive transpiling/bundling tooling/plugin/adapter.
What do you mean by index files? It might depend on the bundler, but I haven’t heard of index.js/index.ts files being a hard requirement for a directory to be traversable in most tooling.
They aren't necessarily empty.
Java does this way better.
Agreed.
Fie on "Implicit namespace package". If only because making "implicit" explicit is linguistically pointless in that 3-word phrase.
Either "namespace" or "package" is also pointless linguistically. Noun-noun names ("namespace package") in programming are always a smell. Meh, it's a job career that pays the bills rent.
Maybe "namespace" (no dunder init) vs "package" (dunder init) would have saved countless person-years of confusion? Packages and "implicit namespace packages" are not substitutes for one another (fscking parent relative imports!) so there's no reason they need the same nouns.
then why was it made optional? implicit is better than explicit?
Because sometimes implicit is best, other times explicit is best. Good to have options as different problems require different solutions.