From time to time, I am hacking around and I need to find the checksum of a file. Reasons for this could be that you need to check if a file has changes, or if two files if two files with the same filename have the same contents. Or you just need to get your fix of 32 byte hexadecimal strings. So I wrote this little Python script that calculates the MD5 hash (also known as checksum) of a file.

import hashlib

def md5Checksum(filePath):
    fh = open(filePath, 'rb')
    m = hashlib.md5()
    while True:
        data = fh.read(8192)
        if not data:
            break
        m.update(data)
    return m.hexdigest()

As you can see, the function takes a single parameter: the path to the file for which you want to get the MD5 hash. It uses Python’s standard hashlib. Keep in mind that this function might take a while to run for large files! Also, you don’t need to worry about the whole file’s contents being loaded into the memory. The file is read in 8192 byte chunks, so at any given time the function is using little more than 8 kilobytes of memory.

Here’s an example of the function in action.

import hashlib

def md5Checksum(filePath):
    fh = open(filePath, 'rb')
    m = hashlib.md5()
    while True:
        data = fh.read(8192)
        if not data:
            break
        m.update(data)
    return m.hexdigest()

print 'The MD5 checksum of text.txt is', md5Checksum('test.txt')

And this is the output:

The MD5 checksum of text.txt is 098f6bcd4621d373cade4e832627b4f6