Empty files are perfectly fine:
The
__init__.pyfiles are required to make Python treat the directories as containing packages; this is done to prevent directories with a common name, such as string, from unintentionally hiding valid modules that occur later on the module search path. In the simplest case,__init__.pycan just be an empty file, but it can also execute initialization code for the package or set the__all__variable, described later.
Depending on what you plan to do it's a good place to import public stuff from the modules in your package so people can simply use from yourpackage import whatever instead of having to use from yourpackage.somemodule import whatever.
python - Empty __init__ list - Stack Overflow
python - Will an empty __init__.py interfere with a direct import and cause an error? - Stack Overflow
python - Why do I have to add a blank __init__.py file to import from subdirectories - Stack Overflow
python - Can I safely delete Django's empty __init__ files? - Stack Overflow
Videos
You can use the builtin __len__ method:
class rand:
def __init__(self):
self.c = []
def __len__(self):
return len(self.c)
c = rand()
print(len(c))
Output:
0
The global variable c and its attribute of the same name are two different objects. You want len(c.c), not len(c).
The top-level package suggested by your folder layout is our_project. That means the folder containing it should be in the Python search path.
Then you can do absolute imports from anywhere with from our_project.directory1.file1 import my_function.
From the my_project.directory3.file7 module, you could instead use a relative import, from ..directory1.file1 import my_function.
The relative import might not work though if you're running file7.py as a script (with e.g. python file7.py on the command line). It's generally a bad idea to run scripts that are in a package by filename, as the interpreter won't be able to tell where they're supposed to be put in the package hierarchy. Instead, use python -m our_project.directory4.file7 to run the module by its absolute name.
Here is an link to the python docs which has a nice tutorial where all of it is explained.
The init.py files are just there to tell python to treat that directory as a python package so i believe how you would import depends on which packages is installed on the system (it searches you PYTHONPATH to find installed packages/modules). If just the our_project package in your example is installed you would need to import like
from our_package.directory1.file1 import my_function
You could also use an relative import which would work regardless of what is installed
from .directory1.file1 import my_function
The documentation on packages is here: https://docs.python.org/3/tutorial/modules.html#packages
Info on the module search path is here: https://docs.python.org/3/tutorial/modules.html#the-module-search-path
The __init__.py has two main functions:
Marking packages: It marks a directory as a Python package, and the inner
.pyfiles as Python modules.Initialization code: As its name suggests, any code inside the
__init__.pyis automatically executed when the package is imported. This is the right place to run initialization code required for a certain package; however, it's perfectly OK to leave the__init__.pyfile empty.
The subdirectory that you are importing from is a package if it has an __init__.py file in it. You don't need to use packages, you can just add the subdirectory to the sys.path list. However they are a neat way of keeping related modules together and are generally encouraged.
The __init__.py file has a similar purpose to the __init__ in a class, it initialises the package. This enables attributes to be given to the package, and __all__ is an example (list of exported names for import *).
There is sometimes no initialisation required, so it can be empty. A good place to look for examples is in the standard library subdirectories of the Lib directory. There you will find huge __init__.py files, and others that are empty.
Whether this is mandatory or not depends on the Python version. From Python 3.3 the __init__.py is not mandatory, and such packages are called Namespace Packages, see PEP0420. This means that a package can span directories, but there is a price to pay. There can be no __init__.py initialisation code and you don't get a __file__ attribute for the package. So unless you specifically need to span directories it is probably better to stick with regular packages.
Option 1: Don't declare them at all
It seems that you come from a static language (Java, C#, C++) and you have difficulty adapting to Python. Since Python is dynamic, you can add and delete attributes from it any time you want, both inside and outside the class' methods. If you don't have an appropriate value, you don't declare it in __init__, declare it in some other method, that's perfectly fine.
Option 2. Set some default value
What exactly do you mean by an "existing but empty" value? What does it mean to be empty? In (some) statically typed languages, member values that you don't initialize explicitly are initialized automatically to their default value (integers initialized to 0, pointers/ references to null, etc). If you know an appropriate default value, just initialize your arg to that value; if there is no appropriate default, go to option 1.
2a. Assign 'manually'
def __init__(self):
self.arg1 = 0
self.arg2 = 'a'
self.arg3 = None
2b. Through default function arguments
def __init__(self, arg1=0, arg2='a', arg3=None):
self.arg1 = arg1
self.arg2 = arg2
self.arg3 = arg3
2c. Use properties (read more)
class ClassName(object):
@property
def arg1(self):
return self._arg1 if hasattr(self, '_arg1') else 0
@arg1.setter
def arg1(self, value):
self._arg1 = value
The usual way to do this is to provide a default value for that parameter in the constructor:
class ClassName(object):
def __init__(self, arg=None):
super(ClassName, self).__init__()
self.arg = arg
Exmaple:
>>> print ClassName().arg
None
>>> print ClassName("foo").arg
foo
Of course, you could use any other useful value as default instead of None, but be wary with using mutable data types, like lists.
Consult: "Least Astonishment" and the Mutable Default Argument
But to solve your problem do this:
def __init__(self,l=None):
self.list = l if l else []
alternatively, as has been suggested by an edit, (which should have been a comment). you can use or:
def __init__(self, l=None):
self.list = l or []
Don't use [] as default parameter.
Use this:
class Entry():
def __init__(self,l=list()):
...
Here the problem is that the same list is assigned to each Entry instance.
So what append is something like this:
lst = []
a = Entry(lst)
b = Entry(lst)
a.list == b.list # lst == lst -> True
I think you should avoid both solutions. Simply because you should avoid to create uninitialized or partially initialized objects, except in one case I will outline later.
Look at two slightly modified version of your class, with a setter and a getter:
class MyClass1:
def __init__(self, df):
self.df = df
self.results = None
def set_results(self, df_results):
self.results = df_results
def get_results(self):
return self.results
And
class MyClass2:
def __init__(self, df):
self.df = df
def set_results(self, df_results):
self.results = df_results
def get_results(self):
return self.results
The only difference between MyClass1 and MyClass2 is that the first one initializes results in the constructor while the second does it in set_results. Here comes the user of your class (usually you, but not always). Everyone knows you can't trust the user (even if it's you):
MyClass1("df").get_results()
# returns None
Or
MyClass2("df").get_results()
# Traceback (most recent call last):
# ...
# AttributeError: 'MyClass2' object has no attribute 'results'
You might think that the first case is better because it does not fail, but I do not agree. I would like the program to fail fast in this case, rather than do a long debugging session to find what happened. Hence, the first part of first answer is: do not set the uninitialized fields to None, because you loose a fail-fast hint.
But that's not the whole answer. Whichever version you choose, you have an issue: the object was not used and it shouldn't have been, because it was not fully initialized. You can add a docstring to get_results: """Always use set_results **BEFORE** this method""". Unfortunately the user doesn't read docstrings either.
You have two main reasons for uninitialized fields in your object: 1. you don't know (for now) the value of the field; 2. you want to avoid an expansive operation (computation, file access, network, ...), aka "lazy initialization". Both situations are met in real world, and collide the need of using only fully initialized objects.
Happily, there is a well documented solution to this problem: Design Patterns, and more precisely Creational patterns. In your case, the Factory pattern or the Builder pattern might be the answer. E.g.:
class MyClassBuilder:
def __init__(self, df):
self._df = df # df is known immediately
# GIVE A DEFAULT VALUE TO OTHER FIELDS to avoid the possibility of a partially uninitialized object.
# The default value should be either:
# * a value passed as a parameter of the constructor ;
# * a sensible value (eg. an empty list, 0, etc.)
def results(self, df_results):
self._results = df_results
return self # for fluent style
... other field initializers
def build(self):
return MyClass(self._df, self._results, ...)
class MyClass:
def __init__(self, df, results, ...):
self.df = df
self.results = results
...
def get_results(self):
return self.results
... other getters
(You can use a Factory too, but I find the Builder more flexible). Let's give a second chance to the user:
>>> b = MyClassBuilder("df").build()
Traceback (most recent call last):
...
AttributeError: 'MyClassBuilder' object has no attribute '_results'
>>> b = MyClassBuilder("df")
>>> b.results("r")
... other fields iniialization
>>> x = b.build()
>>> x
<__main__.MyClass object at ...>
>>> x.get_results()
'r'
The advantages are clear:
- It's easier to detect and fix a creation failure than a late use failure;
- You do not release in the wild a uninitialized (and thus potentially damaging) version of your object.
The presence of uninitialized fields in the Builder is not a contradiction: those fields are uninitialized by design, because the Builder's role is to initialize them. (Actually, those fields are some kind of forein fields to the Builder.) This is the case I was talking about in my introduction. They should, in my mind, be set to a default value (if it exists) or left uninitialized to raise an exception if you try to create an uncomplete object.
Second part of my answer: use a Creational pattern to ensure the object is correctly initialized.
Side note: I'm very suspicious when I see a class with getters and setters. My rule of thumb is: always try to separate them because when they meet, objects become unstable.
Following considerable research and discussions with experienced programmers please see below what I believe is the most Pythonic solution to this question. I have included the updated code first and then a narrative:
class MyClass:
def __init__(self,df):
self.df = df
self._results = None
@property
def results(self):
if self._results is None:
raise Exception('df_client is None')
return self._results
def generate_results(self, df_results):
#Imagine some calculations here or something
self._results = df_results
Description of what I learnt, changed and why:
All class attributes should be included in the
__init__(initialiser) method. This is to ensure readability and aid debugging.The first issue is that you cannot create private attributes in Python. Everything is public, so any partially initialised attributes (such as results being set to None) can be accessed. Convention to indicate a private attribute is to place a lead underscore at the front, so in this case I changed it to
self.resultstoself._results.Keep in mind this is only convention, and
self._resultscan still be directly accessed. However, this is the Pythonic way to handle what are pseudo-private attributes.The second issue is having a partly initialised attribute which is set to None. As this is set to
None, as @jferard below explains, we now have lost a fail-fast hint and have added a layer of obfuscation for debugging the code.To resolve this we add a getter method. This can be seen above as the function
results()which has the@propertydecorator above.This is a function that when invoked checks if
self._resultsisNone. If so it will raise an exception (fail-safe hint), otherwise it will return the object. The@propertydecorator changes the invocation style from a function to an attribute, so all the user has to use on an instance of MyClass is.resultsjust like any other attribute.(I changed the name of the method that sets the results to
generate_results()to avoid confusion and free up.resultsfor the getter method)If you then have other methods within the class that need to use
self._results, but only when properly assigned, you can useself.results, and that way the fail-safe hint is baked in as above.
I recommend also reading @jferard's answer to this question. He goes into depth about the problems and some of the solutions. The reason I added my answer is that I think for a lot of cases the above is all you need (and the Pythonic way of doing it).
It used to be a required part of a package (old, pre-3.3 "regular package", not newer 3.3+ "namespace package").
Here's the documentation.
Python defines two types of packages, regular packages and namespace packages. Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing an
__init__.pyfile. When a regular package is imported, this__init__.pyfile is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The__init__.pyfile can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported.
But just click the link, it contains an example, more information, and an explanation of namespace packages, the kind of packages without __init__.py.
Files named __init__.py are used to mark directories on disk as Python package directories.
If you have the files
mydir/spam/__init__.py
mydir/spam/module.py
and mydir is on your path, you can import the code in module.py as
import spam.module
or
from spam import module
If you remove the __init__.py file, Python will no longer look for submodules inside that directory, so attempts to import the module will fail.
The __init__.py file is usually empty, but can be used to export selected portions of the package under more convenient name, hold convenience functions, etc.
Given the example above, the contents of the init module can be accessed as
import spam
This answer is based on this webpage.
The sole purpose of __init__.py is to indicate that the folder containing this file is a package, and to treat this folder as package, that's why it is recommended to leave it empty.
Consider following hierarchy:
foo
__init__.py
bar.py
When you use from foo import bar or import foo.bar, Python interpreter will look for __init__.py in foo folder, if it finds it, the bar module will be imported else it won't; however, this behavior has changed over the time, and it may be able to successfully import the modules/packages even if __init__.py is missing, but remember Zen of Python: Explicit is better than implicit, so it's always safe to have it.
But in case, if you need some package level variables to be defined, you can do it inside the __init__.py file, and all the modules inside the package will be able to use it.
And in fact, if you look at PEP 257, it mentions that the __init__.py can also contain the documentation for package level information.
You're taking that statement as more general than it's meant. You're reading a statement from a tutorial, where they walk you through creating a simple example project. That particular example project's __init__.py should be empty, simply because the example doesn't need to do anything in __init__.py.
Most projects' __init__.py files will not be empty. Taking a few examples from popular packages, such as numpy, requests, flask, sortedcontainers, or the stdlib asyncio, none of these example __init__.py files are empty. They may perform package initialization, import things from submodules into the main package namespace, or include metadata like __all__, __version__, or a package docstring. The example project is just simplified to the point where it doesn't have any of that.
Many tutorials seem to imply that the typical __init__.py file in a package is empty. But this never works for me. Typically, I start a project by writing a series of .py modules in a directory. When I decide my project could be useful in other projects I'm working on, I try to turn it into a package by adding an empty __init__.py file. But when I try to import that package, I don't have access to any of the modules I wrote unless I import them in the __init__.py file.
Is this normal? How is a package with an empty __init__.py file structured to give access to its modules when you import it?
def __init__(self, x=[]):
self._x = x
In this case, whenever you initialize an instance of your class with no argument, self._x will reference the same object instead of creating a new empty list for each instance. In effect, self._x.append is appending to the same list.
The recommended way to do this is
def __init__(self, x=None):
if x is None:
# create a new list
x = []
self._x = x
This applies to all mutable objects (such as dict and set...).
In Python if you use a = b then it doesn't creates a new object instead both point to the same memory location, And we know that lists are mutable so any changes made would be reflected in all the references to that memory location. Now you can use [:] if you have a linear list or you can also use deepcopy() in case there are nested lists.
class Test:
def __init__(self, x=list()):
self._x = x[:] #Here is the magic
print x
def add_x(self, data):
self._x.append(data)
Overview
@Mike's answer is correct but too imprecise. It is true that Python 3.3+ supports Implicit Namespace Packages that allows it to create a package without an __init__.py file. This is called a namespace package in contrast to a regular package which does have an __init__.py file (empty or not empty).
However, creating a namespace package should ONLY be done if there is a need for it. For most use cases and developers out there, this doesn't apply so you should stick with EMPTY __init__.py files regardless.
Namespace package use case
To demonstrate the difference between the two types of python packages, lets look at the following example:
google_pubsub/ <- Package 1
google/ <- Namespace package (there is no __init__.py)
cloud/ <- Namespace package (there is no __init__.py)
pubsub/ <- Regular package (with __init__.py)
__init__.py <- Required to make the package a regular package
foo.py
google_storage/ <- Package 2
google/ <- Namespace package (there is no __init__.py)
cloud/ <- Namespace package (there is no __init__.py)
storage/ <- Regular package (with __init__.py)
__init__.py <- Required to make the package a regular package
bar.py
google_pubsub and google_storage are separate packages but they share the same namespace google/cloud. In order to share the same namespace, it is required to make each directory of the common path a namespace package, i.e. google/ and cloud/. This should be the only use case for creating namespace packages, otherwise, there is no need for it.
It's crucial that there are no __init__py files in the google and google/cloud directories so that both directories can be interpreted as namespace packages. In Python 3.3+ any directory on the sys.path with a name that matches the package name being looked for will be recognized as contributing modules and subpackages to that package. As a result, when you import both from google_pubsub and google_storage, the Python interpreter will be able to find them.
This is different from regular packages which are self-contained meaning all parts live in the same directory hierarchy. When importing a package and the Python interpreter encounters a subdirectory on the sys.path with an __init__.py file, then it will create a single directory package containing only modules from that directory, rather than finding all appropriately named subdirectories outside that directory. This is perfectly fine for packages that don't want to share a namespace. I highly recommend taking a look at Traps for the Unwary in Python’s Import System to get a better understanding of how Python importing behaves with regular and namespace package and what __init__.py traps to watch out for.
Summary
- Only skip
__init__.pyfiles if you want to create namespace packages. Only create namespace packages if you have different libraries that reside in different locations and you want them each to contribute a subpackage to the parent package, i.e. the namespace package. - Keep on adding empty
__init__.pyto your directories because 99% of the time you just want to create regular packages. Also, Python tools out there such asmypyandpytestrequire empty__init__.pyfiles to interpret the code structure accordingly. This can lead to weird errors if not done with care.
Resources
My answer only touches the surface of how regular packages and namespace packages work, so take a look at the following resources for further information:
- PEP 420 -- Implicit Namespace Packages
- The import system - Regular packages
- The import system - Namespace packages
- Traps for the Unwary in Python’s Import System
Python 3.3+ has Implicit Namespace Packages that allow it to create a packages without an __init__.py file.
Allowing implicit namespace packages means that the requirement to provide an
__init__.pyfile can be dropped completely, and affected ... .
The old way with __init__.py files still works as in Python 2.
If not having a value has a meaning in your program (e.g. an optional value), you should use None. That's its purpose anyway.
If the value must be provided by the caller of __init__, I would recommend not to initialize it.
If "" makes sense as a default value, use it.
In Python the type is deduced from the usage. Hence, you can change the type by just assigning a value of another type.
>>> x = None
>>> print type(x)
<type 'NoneType'>
>>> x = "text"
>>> print type(x)
<type 'str'>
>>> x = 42
>>> print type(x)
<type 'int'>
Another way to initialize an empty string is by using the built-in str() function with no arguments.
str(object='')
Return a string containing a nicely printable representation of an object.
...
If no argument is given, returns the empty string, ''.
In the original example, that would look like this:
def __init__(self, mystr=str())
self.mystr = mystr
Personally, I believe that this better conveys your intentions.
Notice by the way that str() itself sets a default parameter value of ''.