Well, there are separate implementations for separate operating systems. This means that if the logic to extract the extension of a file differs on Mac from that on Linux, this distinction will be handled by those things. I don't know of any such distinction so there might be none.
Edit: @Brian comments that an example like /directory.ext/file would of course not work with a simple .split('.') call, and you would have to know both that directories can use extensions, as well as the fact that on some operating systems, forward slash is a valid directory separator.
This just emphasizes the use a library routine unless you have a good reason not to part of my answer.
Thanks @Brian.
Additionally, where a file doesn't have an extension, you would have to build in logic to handle that case. And what if the thing you try to split is a directory name ending with a backslash? No filename nor an extension.
The rule should be that unless you have a specific reason not to use a library function that does what you want, use it. This will avoid you having to maintain and bugfix code others have perfectly good solutions to.
Answer from Lasse V. Karlsen on Stack Overflow"os.path.splitext(filename)" seems to be randomly assinging the split, how do I define what is filename, and what is extension?
Separating file extensions using python os.path module - Stack Overflow
`os.path.splitext()` can split UNC drive on Windows
Better `splitext` function.
Videos
Well, there are separate implementations for separate operating systems. This means that if the logic to extract the extension of a file differs on Mac from that on Linux, this distinction will be handled by those things. I don't know of any such distinction so there might be none.
Edit: @Brian comments that an example like /directory.ext/file would of course not work with a simple .split('.') call, and you would have to know both that directories can use extensions, as well as the fact that on some operating systems, forward slash is a valid directory separator.
This just emphasizes the use a library routine unless you have a good reason not to part of my answer.
Thanks @Brian.
Additionally, where a file doesn't have an extension, you would have to build in logic to handle that case. And what if the thing you try to split is a directory name ending with a backslash? No filename nor an extension.
The rule should be that unless you have a specific reason not to use a library function that does what you want, use it. This will avoid you having to maintain and bugfix code others have perfectly good solutions to.
os.path.splitext will correctly handle the situation where the file has no extension and return an empty string. .split will return the name of the file.
I am trying to create a script that will create a dataframe of files in a directory and create two columns 'FileName' and 'Extension'.
I thought I was getting somewhere, but the output is not what I am expecting at all. Or more specifically, it is, but is an unexpected manner.
This is my current code:
import os
import pandas as pd
import pyodbc
driveO = "O:\Labels"
driveO_files = []
for filename in os.listdir(driveO):
name, file_extension = os.path.splitext(filename)
driveO_files.append({name, file_extension})
driveO_files = pd.DataFrame(driveO_files,columns=['FileName','Extension'])
print(driveO\_files.head())
btw_filesO = driveO_files.loc[driveO_files['Extension'] =='.btw']
print(btw\_filesO.head())Expected output:
| {Index} | FileName | Extension |
|---|---|---|
| 0 | LabelDesing1 | btw |
| 1 | LabelDesing1 | dat |
| 2 | LabelDesign2 | btw |
| 3 | LabelDesign2 | dat |
| 4 | LabelDesign3 | btw |
However, what I am getting is:
| {Index} | FileName | Extension |
|---|---|---|
| 0 | LabelDesing1 | btw |
| 1 | dat | LabelDesing1 |
| 2 | LabelDesign2 | btw |
| 3 | LabelDesign2 | dat |
| 4 | btw | LabelDesign3 |
It is random which items get put on which side, and changes each time I run the script.
How do I tell os.path.splitext() which side of the dot to put into which vatiable?
For what its worth:
-
There are no files that do not have an extension.
-
There is one folder within the list that always seems to go on the
'Extension'side. -
There are only 65 items in this list at present.
Any help is appreciated.
EDIT: Layout adjustments
Split with os.extsep.
>>> import os
>>> 'filename.ext1.ext2'.split(os.extsep)
['filename', 'ext1', 'ext2']
If you want everything after the first dot:
>>> 'filename.ext1.ext2'.split(os.extsep, 1)
['filename', 'ext1.ext2']
If you are using paths with directories that may contain dots:
>>> def my_splitext(path):
... """splitext for paths with directories that may contain dots."""
... li = []
... path_without_extensions = os.path.join(os.path.dirname(path), os.path.basename(path).split(os.extsep)[0])
... extensions = os.path.basename(path).split(os.extsep)[1:]
... li.append(path_without_extensions)
... # li.append(extensions) if you want extensions in another list inside the list that is returned.
... li.extend(extensions)
... return li
...
>>> my_splitext('/path.with/dots./filename.ext1.ext2')
['/path.with/dots./filename', 'ext1', 'ext2']
you could try with:
names = pathname.split('.')
filename = names[0]
extensions = names[1:]
if you want to use splitext, you can use something like:
import os
path = 'filename.es.txt'
while True:
path, ext = os.path.splitext(path)
if not ext:
print path
break
else:
print ext
produces:
.txt
.es
filename
Use os.path.splitext:
Copy>>> import os
>>> filename, file_extension = os.path.splitext('/path/to/somefile.ext')
>>> filename
'/path/to/somefile'
>>> file_extension
'.ext'
Unlike most manual string-splitting attempts, os.path.splitext will correctly treat /a/b.c/d as having no extension instead of having extension .c/d, and it will treat .bashrc as having no extension instead of having extension .bashrc:
Copy>>> os.path.splitext('/a/b.c/d')
('/a/b.c/d', '')
>>> os.path.splitext('.bashrc')
('.bashrc', '')
New in version 3.4.
Copyimport pathlib
print(pathlib.Path('/foo/bar.txt').suffix)
# Outputs: .txt
print(pathlib.Path('/foo/bar.txt').stem)
# Outputs: bar
print(pathlib.Path("hello/foo.bar.tar.gz").suffixes)
# Outputs: ['.bar', '.tar', '.gz']
print(''.join(pathlib.Path("hello/foo.bar.tar.gz").suffixes))
# Outputs: .bar.tar.gz
print(pathlib.Path("hello/foo.bar.tar.gz").stem)
# Outputs: foo.bar.tar
I'm surprised no one has mentioned pathlib yet, pathlib IS awesome!