geohernandez
Menu
  • HOME
  • ABOUT
  • CONTACT ME
  • WORK WITH GEO
    • Data Specialist
    • Speaker Events
    • Resume
  • English
    • English
    • Español
Menu

An introduction to pathlib module of Python

Posted on February 22, 2021February 22, 2021 by geohernandez

First at all, I would like to clarify something that maybe sounds evident, but sometimes we forget, the path is not a string, it is a fact that we must keep in mind to avoid headaches in our journey in Python.  Since Python 3.4 in advance, we count on a powerful and versatile module called pathlib.

In this short article, I want to share with you some specific functions that can help you to code in a more simple and agile way with respect to the management of OS tasks with Python.

On the other hand, I want to mention the following points that I believe are important to being aware always that we have to interact with OS tasks:

  1. Absolute path: It begins with the root folder, Linux eg: /home/ and Window is C:\
  2. Relative path: It would be relative to your application’s working directory
  3. The different Operative System (OS) use specific separators.  Linux and Mac use /, meanwhile Windows use \.

The pathlib module simplifies dramatically the required code to work with folders and files.  I want to show you through a series of examples some key benefits of this module and how we can reduce the lines of codes, but mainly to do our code more readable and easier to maintain.

Here a simple and multiplatform example, where we need to get the home directory associated with the user who executes the python method.  The first step would be the importation of the module, for learning and simplicity I am going to do it in a way that allows us to access all the classes embedded into the module.

[crayon-681d921d935c0512883184/]

In case of executing the previous sentence in Windows you should get an output like:

[crayon-681d921d935c8547358125/]

In Linux OS should be something as:

[crayon-681d921d935cb726870293/]

Another useful function that is part of pathlib is cwd, it is the stand for “current working directory”, here an example:

[crayon-681d921d935cc156067250/]

The expected output in your case must be the current working directory, it is similar to the value returned by os.getcwd function.  Typically for composing a path we could use forward slash to append a directory or filename as we were dealing with simple string, for instance:

[crayon-681d921d935ce686047918/]

As you can see in the previous examples, we do not have to concern about how Python has to deal with the OS for managing the path, it is the key and might be one of the main things to taking account for starting to use the pathlib module.  Probably you find some common methods that were used for older modules as os.  Now, imagine that we have our project executing under a parametrized and specific location, and we are going to get the absolute path, the following code is the right option:

[crayon-681d921d935d0959133940/]

Let me add a little dynamism and build a simple function that receives a filename input and should return a complete path through the joining with the joinpath function.

[crayon-681d921d935d2824076373/]

If we need to verify if a given path filename is recognized as a file before executing any required command, we count with the is_file function, so we can use this approach before deciding to create or raise any specific action.

[crayon-681d921d935d3702526223/]

As we have a function to know if a specific path is a file, equally we have a function which gives us the possibility to know if a path represents a valid directory, here an example.

[crayon-681d921d935d5802932415/]

In some cases, we need to verify if a specific file exists in a given location, pathlib offers another simple method to do it.

[crayon-681d921d935d7783194860/]

The following example will be combining a function to remove a file if it exists.

[crayon-681d921d935d8424404835/]

Finally, we are addressing an effective way of listing and filtering all the files which have a specific extension, in this case, we will be using our Python working directory and filtering all the .py files, this time we are going to use the method Path.glob(pattern), so given a relative pattern in the directory of our path, it will return all the files requested a finally we will iterate over this object, let me show you the complete example.

[crayon-681d921d935da813414475/]

As the official Python documentation says: “the pattern «**» means «this directory and all the sub directories in a recursive way”, finally the slash forward (/) is used to allow the interpret understand that it must take the following character as a literal, so in this case *.py means all the files (not matter the name) which extension is .py.

I hope you have learned something new and find useful some of the methods explained in this article, it is the first of a series dedicated to pathlib and other python modules which can be useful to interact with os tasks from Python.  Happy coding!!!

Category: Chronicles from the trenches, Data Engineering, Python

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Search for articles

Recent Posts

  • Quick Guide: BigQuery Service Account Setup Using gcloud
  • The Art of Data Modeling in AI times
  • Getting Started with Snowflake’s Snowpipe for Data Ingestion on Azure

Categories

  • Airflow (1)
  • Azure (6)
  • Azure DevOps (2)
  • Bash script (1)
  • Blog (1)
  • Cassandra (3)
  • Chronicles from the trenches (26)
  • Data Architecture (3)
  • Data Engineering (11)
  • DB optimization (2)
  • Events (2)
  • GIT (1)
  • MySQL (1)
  • Python (7)
  • Snowflake (3)
  • SQL Saturday (1)
  • SSIS (2)
  • T-SQL (5)
  • Uncategorized (2)

Archives

  • May 2025 (1)
  • March 2025 (1)
  • January 2025 (2)
  • October 2024 (1)
  • July 2024 (1)
  • May 2024 (1)
  • December 2023 (1)
  • November 2023 (1)
  • August 2023 (1)
  • June 2023 (1)
  • December 2022 (1)
  • November 2022 (1)
  • July 2022 (1)
  • March 2022 (1)
  • September 2021 (1)
  • May 2021 (1)
  • March 2021 (1)
  • February 2021 (3)
  • December 2020 (1)
  • October 2020 (3)
  • September 2020 (1)
  • August 2020 (1)
  • January 2020 (1)
  • August 2019 (1)
  • July 2019 (1)
  • June 2019 (1)
  • May 2019 (1)
  • April 2019 (1)
  • March 2019 (1)
  • November 2018 (3)
  • October 2018 (1)
  • September 2018 (1)
  • August 2018 (2)
© 2025 geohernandez | Powered by Minimalist Blog WordPress Theme