Python - Performance Patterns - ZhangZhihuiAAA

Performance patterns address common bottlenecks and optimization challenges, providing developers with proven methodologies to improve execution time, reduce memory usage, and scale effectively.

Technical requirements

• Add the Faker module to your Python environment using the following command: python -m pip install faker
• Add the Redis module to your Python environment using the following command: python -m pip install redis
• Install the Redis server and run it using Docker: docker run --name myredis -p 6379:6379 redis

• The Cache-Aside pattern

In situations where data is more frequently read than updated, applications use a cache to optimize repeated access to information stored in a database or data store. In some systems, that type of caching mechanism is built in and works automatically. When this is not the case, we must implement it in the application ourselves, using a caching strategy that is suitable for the particular use case.

One such strategy is called Cache-Aside, where, to improve performance, we store frequently accessed data in a cache, reducing the need to fetch data from the data store repeatedly.

Use cases for the cache-aside pattern

The cache-aside pattern is useful when we need to reduce the database load in our application. By caching frequently accessed data, fewer queries are sent to the database. It also helps improve application responsiveness, since cached data can be retrieved faster.

Note that this pattern works for data that doesn’t change often and for data storage that doesn’t depend on the consistency of a set of entries in the storage (multiple keys). For example, it might work for certain kinds of document stores or databases where keys are never updated and occasionally data entries are deleted but there is no strong requirement to continue to serve them for some time (until the cache is refreshed).

Implementing the cache-aside pattern

We can summarize the steps needed when implementing the Cache-Aside pattern, involving a database and a cache, as follows:

• Case 1 – When we want to fetch a data item: Return the item from the cache if found in it. If not found in the cache, read the data from the database. Put the item we got in the cache and return it.

• Case 2 – When we want to update a data item: Write the item in the database and remove the corresponding entry from the cache.

Let’s try a simple implementation with a database of quotes from which the user can ask to retrieve some quotes via an application. Our focus here will be implementing the Case 1 part.

Here are our choices for the additional software dependencies we need to install on the machine for this implementation:

• An SQLite database, since we can query an SQLite database using Python’s standard module, sqlite3
• A Redis server and the redis-py Python module

We will use a script (populate_db.py file) to handle the creation of a database and a quotes table and add example data to it. For practical reasons, we also use the Faker module there to generate fake quotes that are used when populating the database.

import sqlite3
from pathlib import Path
from random import randint

import redis
from faker import Faker


fake = Faker()

DB_PATH = Path(__file__).parent / Path("quotes.sqlite3")
DB_INITIALIZED = True if DB_PATH.exists() else False
cache = redis.StrictRedis(host="localhost", port=6379, decode_responses=True)
CACHE_KEY_PREFIX = "quote."


def setup_db():
    if not DB_INITIALIZED:
        try:
            with sqlite3.connect(DB_PATH) as db:
                cursor = db.cursor()
                cursor.execute("""CREATE TABLE quotes(id INTEGER PRIMARY KEY, text TEXT)""")
                db.commit()
                print("Table 'quotes' created")
        except Exception as e:
            print(e)


def add_quotes(quotes_list):
    added = []
    try:
        with sqlite3.connect(DB_PATH) as db:
            cursor = db.cursor()

            for quote_text in quotes_list:
                quote_id = randint(1, 100)
                quote = (quote_id, quote_text)

                cursor.execute("""INSERT OR IGNORE INTO quotes(id, text) VALUES(?, ?)""", quote)
                added.append(quote)

            db.commit()
    except Exception as e:
        print(e)

    return added


def main():
    msg = "Choose your mode! Enter 'init' or 'update_db_only' or 'update_all':"
    mode = input(msg)

    if mode.lower() == "init":
        if not DB_INITIALIZED:
            setup_db()
    elif mode.lower() == "update_all":
        if not DB_INITIALIZED:
            print("DB is not initialized. Please do 'init' first.")
        else:
            quotes_list = [fake.sentence() for _ in range(1, 11)]
            added = add_quotes(quotes_list)
            if added:
                print("New (fake) quotes added to the database:")
                for q in added:
                    print(f"Added to DB: {q}")
                    print("  - Also adding to the cache")
                    cache.set(CACHE_KEY_PREFIX + str(q[0]), q[1], ex=300)
    elif mode.lower() == "update_db_only":
        if not DB_INITIALIZED:
            print("DB is not initialized. Please do 'init' first.")
        else:
            quotes_list = [fake.sentence() for _ in range(1, 11)]
            added = add_quotes(quotes_list)
            if added:
                print("New (fake) quotes added to the database ONLY:")
                for q in added:
                    print(f"Added to DB: {q}")


if __name__ == "__main__":
    main()

zzh@ZZHPC:/zdata/Github/ztest$ python populate_db.py 
Choose your mode! Enter 'init' or 'update_db_only' or 'update_all':init
Table 'quotes' created
zzh@ZZHPC:/zdata/Github/ztest$ python populate_db.py 
Choose your mode! Enter 'init' or 'update_db_only' or 'update_all':update_all
New (fake) quotes added to the database:
Added to DB: (70, 'They represent speak well crime past require.')
  - Also adding to the cache
Added to DB: (56, 'Occur recognize husband enough.')
  - Also adding to the cache
Added to DB: (75, 'Military clearly just response again big.')
  - Also adding to the cache
Added to DB: (36, 'Last without skin throughout shake lot travel.')
  - Also adding to the cache
Added to DB: (15, 'Strong think movement hear than establish from.')
  - Also adding to the cache
Added to DB: (72, 'Skin itself term begin form investment huge.')
  - Also adding to the cache
Added to DB: (28, 'Discussion audience child.')
  - Also adding to the cache
Added to DB: (47, 'Test person hotel.')
  - Also adding to the cache
Added to DB: (50, 'Figure save fast young.')
  - Also adding to the cache
Added to DB: (45, 'Herself hit manage two certainly professional.')
  - Also adding to the cache

Now, we will create another module and script for the cache-aside-related operations themselves (cache_aside.py file).

import sqlite3
from pathlib import Path

import redis


CACHE_KEY_PREFIX = "quote"
DB_PATH = Path(__file__).parent / Path("quotes.sqlite3")
cache = redis.StrictRedis(host="localhost", port=6379, decode_responses=True)


def get_quote(quote_id: str) -> str:
    out = []
    quote = cache.get(f"{CACHE_KEY_PREFIX}.{quote_id}")

    if quote is None:
        # Get from the database
        query_fmt = "SELECT text FROM quotes WHERE id = {}"
        try:
            with sqlite3.connect(DB_PATH) as db:
                cursor = db.cursor()
                res = cursor.execute(query_fmt.format(quote_id)).fetchone()
                if not res:
                    return "There was no quote stored matching that id!"

                quote = res[0]
                out.append(f"Got '{quote}' FROM DB")
        except Exception as e:
            print(e)
            quote = ""

        # Add to the cache
        if quote:
            key = f"{CACHE_KEY_PREFIX}.{quote_id}"
            cache.set(key, quote, ex=300)
            out.append(f"Added TO CACHE, with key '{key}'")
    else:
        out.append(f"Got '{quote}' FROM CACHE")

    if out:
        return " - ".join(out)
    else:
        return ""


def main():
    while True:
        quote_id = input("Enter the ID of the quote:")
        if quote_id.isdigit():
            out = get_quote(quote_id)
            print(out)
        else:
            print("You must enter a number. Please retry.")


if __name__ == "__main__":
    main()

zzh@ZZHPC:/zdata/Github/ztest$ python cache_aside.py 
Enter the ID of the quote:45
Got 'Herself hit manage two certainly professional.' FROM CACHE
Enter the ID of the quote:50
Got 'Figure save fast young.' FROM CACHE
Enter the ID of the quote:20
There was no quote stored matching that id!
Enter the ID of the quote:

• The Memoization pattern

The Memoization pattern is a crucial optimization technique in software development that improves the efficiency of programs by caching the results of expensive function calls. This approach ensures that if a function is called with the same inputs more than once, the cached result is returned, eliminating the need for repetitive and costly computations.

Real-world examples

We can think of calculating Fibonacci numbers as a classic example of the memoization pattern. By storing previously computed values of the sequence, the algorithm avoids recalculating them, which drastically speeds up the computation of higher numbers in the sequence.

Another example is a text search algorithm. In applications dealing with large volumes of text, such as search engines or document analysis tools, caching the results of previous searches means that identical queries can return instant results, significantly improving user experience.

Use cases for the memoization pattern

1.Speeding up recursive algorithms: Memoization transforms recursive algorithms from having a high time complexity. This is particularly beneficial for algorithms such as those calculating Fibonacci numbers.
2.Reducing computational overhead: Memoization conserves CPU resources by avoiding unnecessary recalculations. This is crucial in resource-constrained environments or when dealing with high-volume data processing.
3.Improving application performance: The direct result of memoization is a noticeable improvement in application performance, making applications feel more responsive and efficient from the user’s perspective.

For our example, we will apply memoization to a classic problem where a recursive algorithm is used: calculating Fibonacci numbers.

We start with the import statements we need:

from datetime import timedelta
from functools import lru_cache

Second, we create a fibonacci_func1 function that does the Fibonacci numbers computation using recursion (without any caching involved). We will use this for comparison:

def fibonacci_func1(n):
    if n < 2:
        return n
    return fibonacci_func1(n - 1) + fibonacci_func1(n - 2)

Third, we define a fibonacci_func2 function, with the same code, but this one is decorated with lru_cache, to enable memoization. What happens here is that the results of the function calls are stored in a cache in memory, and repeated calls with the same arguments fetch results directly from the cache rather than executing the function’s code. The code is as follows:

@lru_cache(maxsize=None)
def fibonacci_func2(n):
    if n < 2:
        return n
    return fibonacci_func2(n - 1) + fibonacci_func2(n - 2)

Finally, we create a main() function to test calling both functions using n=30 as input and measuringthe time spent for each execution. The testing code is as follows:

def main():
    import time

    n = 30

    start_time = time.time()
    result = fibonacci_func1(n)
    duration = timedelta(time.time() - start_time)
    print(f"Fibonacci_func1({n}) = {result}, calculated in {duration}")

    start_time = time.time()
    result = fibonacci_func2(n)
    duration = timedelta(time.time() - start_time)
    print(f"Fibonacci_func2({n}) = {result}, calculated in {duration}")

You should get an output like the following one:
Fibonacci_func1(30) = 832040, calculated in 7:38:53.090973
Fibonacci_func2(30) = 832040, calculated in 0:00:02.760315

• The Lazy Loading pattern

The Lazy Loading pattern is a critical design approach in software engineering, particularly useful in optimizing performance and resource management. The idea with lazy loading is to defer the initialization or loading of resources to the moment they are really needed. This way, applications can achieve more efficient resource utilization, reduce initial load times, and enhance the overall user experience.

Real-world examples

Browsing an online art gallery provides a first example. Instead of waiting for hundreds of high-resolution images to load upfront, the website loads only images currently in view. As you scroll, additional images load seamlessly, enhancing your browsing experience without overwhelming your device’s memory or network bandwidth.

Another example is an on-demand video streaming service, such as Netflix or YouTube. Such a platform offers an uninterrupted viewing experience by loading videos in chunks. This approach not only minimizes buffering times at the start but also adapts to changing network conditions, ensuring consistent video quality with minimal interruptions.

In applications such as Microsoft Excel or Google Sheets, working with large datasets can be resource-intensive. Lazy loading allows these applications to load only data relevant to your current view or operation, such as a particular sheet or a range of cells. This significantly speeds up operations and reduces memory usage.

Use cases for the lazy loading pattern

1.Reducing initial load time: This is particularly beneficial in web development, where a shorter load time can translate into improved user engagement and retention rates.
2.Conserving system resources: In an era of diverse devices, from high-end desktops to entry-level smartphones, optimizing resource usage is crucial for delivering a uniform user experience across all platforms.
3.Enhancing user experience: Users expect fast, responsive interactions with software. Lazy loading contributes to this by minimizing waiting times and making applications feel more responsive.

Implementing the lazy loading pattern – lazy attribute loading

Consider an application that performs complex data analysis or generates sophisticated visualizations based on user input. The computation behind this is resource-intensive and time-consuming. Implementing lazy loading, in this case, can drastically improve performance. But for demonstration purposes, we will be less ambitious than the complex data analysis application scenario. We will use a function that simulates an expensive computation and returns a value used for an attribute on a class.

For this lazy loading example, the idea is to have a class that initializes an attribute only when it’s accessed for the first time. This approach is commonly used in scenarios where initializing an attribute is resource-intensive, and you want to postpone this process until it’s necessary.

We start with the initialization part of the LazyLoadedData class, where we set the _data attribute to None. Here, the expensive data hasn’t been loaded yet:

class LazyLoadedData:
    def __init__(self):
        self._data = None

We add a data() method, decorated with @property, making it act like an attribute (a property) with the added logic for lazy loading. Here, we check if _data is None. If it is, we call the load_data() method:

    @property
    def data(self):
        if self._data is None:
            self._data = self.load_data()
        return self._data

We add the load_data() method simulating an expensive operation, using sum(i * i for i in range(100000)). In a real-world scenario, this could involve fetching data from a remote database, performing a complex calculation, or other resource-intensive tasks:

    def load_data(self):
        print("Loading expensive data...")
        return sum(i * i for i in range(100000))

def main():
    obj = LazyLoadedData()
    print("Object created, expensive attribute not loaded yet.")

    print("Accessing expensive attribute:")
    print(obj.data)

    print("Accessing expensive attribute again, no reloading occurs:")
    print(obj.data)

posted on 2024-08-24 09:50 ZhangZhihuiAAA 阅读(6) 评论(0) 编辑收藏举报

刷新页面返回顶部


Copyright © 2024 ZhangZhihuiAAA Powered by .NET 9.0 on Kubernetes 博客园

导航

• The Cache-Aside pattern

• The Memoization pattern

• The Lazy Loading pattern