An introduction to pathlib module of Python

First at all, I would like to clarify something that maybe sounds evident, but sometimes we forget, the path is not a string, it is a fact that we must keep in mind to avoid headaches in our journey in Python.  Since Python 3.4 in advance, we count on a powerful and versatile module called pathlib.

In this short article, I want to share with you some specific functions that can help you to code in a more simple and agile way with respect to the management of OS tasks with Python.

On the other hand, I want to mention the following points that I believe are important to being aware always that we have to interact with OS tasks:

  1. Absolute path: It begins with the root folder, Linux eg: /home/ and Window is C:\
  2. Relative path: It would be relative to your application’s working directory
  3. The different Operative System (OS) use specific separators.  Linux and Mac use /, meanwhile Windows use \.

The pathlib module simplifies dramatically the required code to work with folders and files.  I want to show you through a series of examples some key benefits of this module and how we can reduce the lines of codes, but mainly to do our code more readable and easier to maintain.

Here a simple and multiplatform example, where we need to get the home directory associated with the user who executes the python method.  The first step would be the importation of the module, for learning and simplicity I am going to do it in a way that allows us to access all the classes embedded into the module.

import pathlib
return pathlib.Path().home()

In case of executing the previous sentence in Windows you should get an output like:

C:\Users\Your_UserName

In Linux OS should be something as:

/home/Your_UserName

Another useful function that is part of pathlib is cwd, it is the stand for “current working directory”, here an example:

pathlib.Path.cwd()

The expected output in your case must be the current working directory, it is similar to the value returned by os.getcwd function.  Typically for composing a path we could use forward slash to append a directory or filename as we were dealing with simple string, for instance:

pathlib.Path().home()/ 'mydirectory' /'myfile.py'

As you can see in the previous examples, we do not have to concern about how Python has to deal with the OS for managing the path, it is the key and might be one of the main things to taking account for starting to use the pathlib module.  Probably you find some common methods that were used for older modules as os.  Now, imagine that we have our project executing under a parametrized and specific location, and we are going to get the absolute path, the following code is the right option:

def absolutepath():
    return pathlib.Path().absolute()

Let me add a little dynamism and build a simple function that receives a filename input and should return a complete path through the joining with the joinpath function.

def getPathFileName(fileName):
    path = pathlib.Path().cwd()
    return pathlib.Path.joinpath(path, fileName)

If we need to verify if a given path filename is recognized as a file before executing any required command, we count with the is_file function, so we can use this approach before deciding to create or raise any specific action.

(pathlib.Path.home() / 'mypython.txt').is_file()

As we have a function to know if a specific path is a file, equally we have a function which gives us the possibility to know if a path represents a valid directory, here an example.

(pathlib.Path.home() / 'mydirectory').is_dir()

In some cases, we need to verify if a specific file exists in a given location, pathlib offers another simple method to do it.

os.path.exists(filename_expected_here)

The following example will be combining a function to remove a file if it exists.

def removeFile(file_name):
    if os.path.exists(file_name):
        os.remove(file_name)

Finally, we are addressing an effective way of listing and filtering all the files which have a specific extension, in this case, we will be using our Python working directory and filtering all the .py files, this time we are going to use the method Path.glob(pattern), so given a relative pattern in the directory of our path, it will return all the files requested a finally we will iterate over this object, let me show you the complete example.

def getAllPythonFiles():
    return pathlib.Path('.').glob('**/*.py')

files = getAllPythonFiles()
for f in files:
    print(f.name)

As the official Python documentation says: “the pattern «**» means «this directory and all the sub directories in a recursive way”, finally the slash forward (/) is used to allow the interpret understand that it must take the following character as a literal, so in this case *.py means all the files (not matter the name) which extension is .py.

I hope you have learned something new and find useful some of the methods explained in this article, it is the first of a series dedicated to pathlib and other python modules which can be useful to interact with os tasks from Python.  Happy coding!!!

geohernandez

Recent Posts

Data Modeling and its relevance in the Cloud Era

Since 2005, I've immersed myself in the dynamic world of data and its modeling. It's…

4 days ago

Formatting our Postgres scripts with pgformatter in DBeaver

Are you a PostgreSQL enthusiast using DBeaver on a Windows Platform? If you find yourself…

5 months ago

Looking back to Kimball’s approach to Data Warehousing

Over time, it's fascinating to witness how certain concepts, approaches, or visions age. Time, in…

6 months ago

List Comprehension and Walrus operator in Python

When we are working with lists, dictionaries, and sets in Python, we have a special…

9 months ago

Playing with some Pandas functions and Airflow operators

Recently, I was dealing with a task where I had to import raw information into…

11 months ago

Using interpolated format strings in Python

The release of Python 3.6 came with exciting functionalities. I want to speak about the…

1 year ago