ZhangZhihui's Blog  

These are common programming practices that, while not necessarily wrong, often lead to less efficient, less readable, and less maintainable code. By understanding these pitfalls, you can write cleaner, more efficient code for your Python applications.

 

• Code style violations

The Python style guide, also known as Python Enhancement Proposal no 8 (PEP 8), provides recommendations for readability and consistency in your code, making it easier for developers to collaborate and maintain projects over time. You can find the style guide details on its official page here: https://peps.python.org/pep-0008.

Tools for fixing coding style violations

Note that we have formatting tools such as Black (https://black.readthedocs.io/en/stable/), isort (https://pycqa.github.io/isort/), and/or Ruff (https://docs.astral.sh/ruff/) that can help you fix code that does not follow the style guide recommendations.

Imports

The way you write, organize, and order your import lines is also important. According to the style guide, imports should be on separate lines and grouped into three categories in this order: standard library imports, related third-party imports, and local-specific imports within the application’s or library’s code base. Also, each group should be separated by a blank line.

import os
import sys

import numpy as np

from mymodule import myfunction

Naming conventions

You should use descriptive names for variables, functions, classes, and modules. The following are specific naming conventions for different types of cases:

Name for function and variable (including class attributes and methods): Use lower_case_with_underscores

Name for class: Use CapWords

Name for constant: Use ALL_CAPS_WITH_UNDERSCORES

Comments

Comments should be complete sentences, with the first word capitalized, and should be clear and concise. We have specific recommendations for two cases of comments—block comments and inline comments:

• Block comments generally apply to some (or all) code that follows them and are indented to the same level as that code. Each line of a block comment starts with # and a single space.

• Inline comments should be used sparingly. An inline comment is placed on the same line as a statement, separated by at least two spaces from the statement.

# This is a block comment.
# It spans multiple lines.
def foo():
    pass  # This is an inline comment.

Whitespace in expressions and statements

You should avoid extraneous whitespace in the following situations:

• Immediately inside parentheses, brackets, or braces
• Immediately before a comma, semicolon, or colon
• More than one space around an assignment operator to align it with another

• Correctness anti-patterns

These anti-patterns can lead to bugs or unintended behavior if not addressed. We are going to discuss the most common of these anti-patterns and alternative, recommended ways and approaches. We are going to focus on the following anti-patterns:

• Using the type() function for comparing types

• Mutable default argument

• Accessing a protected member from outside a class

Note that using IDEs such as Visual Studio Code or PyCharm or command-line tools such as Flake8 will help you spot such bad practices in your code, but it is important to know the recommendations and the reason behind each one.

Using the type() function for comparing types

Sometimes, we need to identify the type of a value through comparison, for our algorithm. The common technique one may think of for that is to use the type() function. But using type() to compare object types does not account for subclassing and is not as flexible as the alternative which is based on using the isinstance() function.

Imagine we have two classes, CustomListA and CustomListB, that are subclasses of the UserList class, which is the recommended class one should inherit from when defining a class for a custom list, as follows:

from collections import UserList


class CustomListA(UserList):
    pass


class CustomListB(UserList):
    pass

If we wanted to check if an object is of one of the custom list types, using the first approach, we would test the type(obj) in (CustomListA, CustomListB) condition.

Alternatively, we would simply test isinstance(obj, UserList), and that would be enough since CustomListA and CustomListB are subclasses of UserList.

Mutable default argument

When you define a function with a parameter that expects a mutable value, such as a list or a dictionary, you may be tempted to provide a default argument ([] or {} respectively). But such a function retains changes between calls, which will lead to unexpected behaviors.

The recommended practice is to use a default value of None and set it to a mutable data structure within the function if needed.

Let’s create a function called manipulate() whose mylist parameter has a default value of []. The function appends the "test" string to the mylist list and then returns it, as follows:

def manipulate(mylist=[]):
    mylist.append("test")
    return mylist

In another function called better_manipulate() whose mylist parameter has a default value of None, we start by setting mylist to [] if it is None, then we append the "test" string to mylist before returning it, as follows:

def better_manipulate(mylist=None):
    if not mylist:
        mylist = []

    mylist.append("test")
    return mylist

 

if __name__ == "__main__":
    print("function manipulate()")
    print(manipulate())
    print(manipulate())
    print(manipulate())

    print("function better_manipulate()")
    print(better_manipulate())
    print(better_manipulate())

 

function manipulate()
['test']
['test', 'test']
['test', 'test', 'test']
function better_manipulate()
['test']
['test']

As you can see, with the non-recommended way of doing this, we end up with the "test" string several times in the list returned; the string is accumulating because each subsequent time the function has been called, the mylist argument kept its previous value instead of being reset to the empty list. But, with the recommended solution, we see with the result that we get the expected behavior.

Accessing a protected member from outside a class

Accessing a protected member (an attribute prefixed with _) of a class from outside that class usually calls for trouble since the creator of that class did not intend this member to be exposed. Someone maintaining the code could change or rename that attribute later down the road, and parts of the code accessing it could result in unexpected behavior.

If you have code that accesses a protected member that way, the recommended practice is to refactor that code so that it is part of the public interface of the class.

To demonstrate this, let’s define a Book class with two protected attributes, _title and _author, as follows:

class Book:
    def __init__(self, title, author):
        self._title = title
        self._author = author

Now, let’s create another class, BetterBook, with the same attributes and a presentation_line() method that accesses the _title and _author attributes and returns a concatenated string based on them. The class definition is as follows:

class BetterBook:
    def __init__(self, title, author):
        self._title = title
        self._author = author

    def presentation_line(self):
        return f"{self._title} by {self._author}"

Finally, in the code for testing both classes, we get and print the presentation line for an instance of each class, accessing the protected members for the first one (instance of Book) and calling the presentation_line() method for the second one (instance of BetterBook), as follows:

if __name__ == "__main__":
    b1 = Book(
        "Mastering Object-Oriented Python",
        "Steven F. Lott",
    )
    print(
        "Bad practice: Direct access of protected members"
    )
    print(f"{b1._title} by {b1._author}")

    b2 = BetterBook(
        "Python Algorithms",
        "Magnus Lie Hetland",
    )
    print(
        "Recommended: Access via the public interface"
    )
    print(b2.presentation_line())

This shows that we get the same result, without any error, in both cases, but using the presentation_line() method, as done in the case of the second class, is the best practice. The _title and _author attributes are protected, so it is not recommended to call them directly. The developer could change those attributes in the future. That is why they must be encapsulated in a public method.

Also, it is good practice to provide an attribute that encapsulates each protected member of the class using the @property decorator.

• Maintainability anti-patterns

These anti-patterns make your code difficult to understand or maintain over time. We are going to discuss several anti-patterns that should be avoided for better quality in your Python application or library’s code base. We will focus on the following points:

• Using a wildcard import

This way of importing (from mymodule import *) can clutter the namespace and make it difficult to determine where an imported variable or function came from. Also, the code may end up with bugs because of name collision.

The best practice is to use specific imports or import the module itself to maintain clarity.

• Look Before You Leap (LBYL) versus Easier to Ask for Forgiveness than Permission (EAFP)

LBYL often leads to more cluttered code, while EAFP makes use of Python’s handling of exceptions and tends to be cleaner. 

For example, we may want to check if a file exists, before opening it, with code such as the following:

if os.path.exists(filename):
    with open(filename) as f:
        print(f.text)

This is LBYL, and when new to Python, you would think that it is the right way to treat such situations. But in Python, it is recommended to favor EAFP, where appropriate, for cleaner, more Pythonic code. So, the recommended way for the expected result would give the following code:

try:
    with open(filename) as f:
        print(f.text)
except FileNotFoundError:
    print("No file there")

 

• Overusing inheritance and tight coupling

Inheritance is a powerful feature of OOP, but overusing it – for example, creating a new class for every slight variation of behavior – can lead to tight coupling between classes. This increases complexity and makes the code less flexible and harder to maintain.

It is not recommended to create a deep inheritance hierarchy such as the following (as a simplified example):

class GrandParent:
    pass


class Parent(GrandParent):
    pass


class Child(Parent):
    Pass

The best practice is to create smaller, more focused classes and combine them to achieve the desired behavior, as with the following:

class Parent:
    pass


class Child:
    def __init__(self, parent):
        self.parent = parent

As you may remember, this is the composition approach.

• Using global variables for sharing data between functions

Global variables are variables that are accessible throughout the entire program, making them tempting to use for sharing data between functions—for example, configuration settings that are used across multiple modules or shared resources such as database connections.

However, they can lead to bugs where different parts of the application unexpectedly modify global state. Also, they make it harder to scale applications as they can lead to issues in multithreaded environments where multiple threads might attempt to modify the global variable simultaneously.

Here is an example of the non-recommended practice:

# Global variable
counter = 0


def increment():
    global counter
    counter += 1


def reset():
    global counter
    counter = 0

Instead of using a global variable, you should pass the needed data as arguments to functions or encapsulate state within a class, which improves the modularity and testability of the code. So, the best-practice equivalent for the counterexample would be defining a Counter class holding a counter attribute, as follows:

class Counter:
    def __init__(self):
        self.counter = 0

    def increment(self):
        self.counter += 1

    def reset(self):
        self.counter = 0

Next, we add code for testing the Counter class as follows:

if __name__ == "__main__":
    c = Counter()
    print(f"Counter value: {c.counter}")
    c.increment()
    print(f"Counter value: {c.counter}")
    c.reset()

This shows how using a class instead of global variables is effective and can be scalable, thus the recommended practice.

• Performance anti-patterns

These anti-patterns lead to inefficiencies that can degrade performance, especially noticeable in large-scale applications or data-intensive tasks. We will focus on the following such anti-patterns:

• Not using .join() to concatenate strings in a loop
• Using global variables for caching

Not using .join() to concatenate strings in a loop

Concatenating strings with + or += in a loop creates a new string object each time, which is inefficient. The best solution is to use the .join() method on strings, which is designed for efficiency when concatenating strings from a sequence or iterable.

Let’s create a concatenate() function where we use += for concatenating items from a list of strings, as follows:

def concatenate(string_list):
    result = ""
    for item in string_list:
        result += item
    return result

Then, let’s create a better_concatenate() function for the same result, but using the str.join() method, as follows:

def better_concatenate(string_list):
    result = "".join(string_list)
    return result

Using global variables for caching

Using global variables for caching can seem like a quick and easy solution but often leads to poor maintainability, potential data consistency issues, and difficulties in managing the cache life cycle effectively. A more robust approach involves using specialized caching libraries designed to handle these aspects more efficiently.

In this example, a global dictionary is used to cache results from a function that simulates a time-consuming operation (for example, a database query) done in the perform_expensive_operation() function. The complete code for this demonstration is as follows:

import time
import random


# Global variable as cache
_cache = {}


def get_data(query):
    if query in _cache:
        return _cache[query]
    else:
        result = perform_expensive_operation(query)
        _cache[query] = result
        return result


def perform_expensive_operation(user_id):
    time.sleep(random.uniform(0.5, 2.0))
    user_data = {
        1: {"name": "Alice", "email": "alice@example.com"},
        2: {"name": "Bob", "email": "bob@example.com"},
        3: {"name": "Charlie", "email": "charlie@example.com"},
    }
    result = user_data.get(user_id, {"error": "User not found"})
    return result


if __name__ == "__main__":
    print(get_data(1))
    print(get_data(1))

This works as expected, but there is a better approach. We can use a specialized caching library or Python’s built-in functools.lru_cache() function. The lru_cache decorator provides a least recently used (LRU) cache, automatically managing the size and lifetime of cache entries. Also, it is thread-safe, which helps prevent issues that can arise in a multithreaded environment when multiple threads access or modify the cache simultaneously. Finally, libraries or tools such as lru_cache are optimized for performance, using efficient data structures and algorithms to manage the cache.

Here’s how you can implement the functionality of caching results from a time-consuming function using functools.lru_cache.

import random
import time
from functools import lru_cache


@lru_cache(maxsize=100)
def get_data(user_id):
    return perform_expensive_operation(user_id)


def perform_expensive_operation(user_id):
    time.sleep(random.uniform(0.5, 2.0))
    user_data = {
        1: {"name": "Alice", "email": "alice@example.com"},
        2: {"name": "Bob", "email": "bob@example.com"},
        3: {"name": "Charlie", "email": "charlie@example.com"},
    }
    result = user_data.get(user_id, {"error": "User not found"})
    return result


if __name__ == "__main__":
    print(get_data(1))
    print(get_data(1))
    print(get_data(2))
    print(get_data(99))

 

posted on 2024-08-27 10:25  ZhangZhihuiAAA  阅读(3)  评论(0编辑  收藏  举报