ZhangZhihui's Blog  

Problem: You want to generate random test data for running your test functions.

 

Solution: Use fuzzing , which is an automated testing technique to generate random test data for your test functions.

 

Fuzzing , or fuzz testing, is an automated testing technique that generates random, unexpected data for your program in order to detect bugs. Fuzzing has been around for quite a while; the first paper on fuzzing was published in 1990. Go has had fuzzing libraries for a while as well, but in Go 1.18, fuzzing was added as a feature. The feature was added as part of the go test tool as well as the standard library.

You can use fuzzing to test the max heap implementation:

type Heap struct {
    elements []int
}

func (h *Heap) Push(ele int) {
    h.elements = append(h.elements, ele)
    i := len(h.elements) - 1
    for ; h.elements[i] > h.elements[parent(i)]; i = parent(i) {
        h.swap(i, parent(i))
    }
}

func (h *Heap) Pop() (ele int) {
    ele = h.elements[0]
    h.elements[0] = h.elements[len(h.elements)-1]
    h.elements = h.elements[:len(h.elements)-1]
    h.rearrange(0)
    return
}

func (h *Heap) rearrange(i int) {
    largest := i
    left, right, size := leftChild(i), rightChild(i), len(h.elements)
    if left < size && h.elements[left] > h.elements[largest] {
        largest = left
    }
    if right < size && h.elements[right] > h.elements[largest] {
        largest = right
    }
    if largest != i {
        h.swap(i, largest)
        h.rearrange(largest)
    }
}

Fuzzing is useful because it automates input data into your test functions such that it tests unexpected cases. If you were to test the max heap implementation discussed earlier, this is a typical test function you might write, which will test the Push and Pop functions:

func TestHeap(t *testing.T) {
    var h *Heap = &Heap{}
    h.elements = []int{452, 23, 6515, 55, 313, 6}
    h.Build()
    testCases := []int{51, 634, 9, 8941, 354}

    for _, tc := range testCases {
        h.Push(tc)
        //  make  a  copy  of  the  elements  in  the  slice  and  sort  it  in
        //  descending  order
        elements := make([]int, len(h.elements))
        copy(elements, h.elements)
        sort.Slice(elements, func(i, j int) bool {
            return elements[i] > elements[j]
        })

        //  pop  the  heap  and  check  if  the  top  of  heap  is  the  largest
        //  element
        popped := h.Pop()
        if elements[0] != popped {
            t.Errorf("Top  of  heap  %d  is  not  the  one  popped  %d\n  heap is  %v",
                elements[0], popped, elements)
        }
    }
}

First, create a max heap and prepopulate the heap with data. Next, use a set of test cases (which are just a bunch of integers), and push them into the heap. You want to pop the heap, which will give you the largest integer in the heap.

To check if this is the case, take the slice of elements that is the data for the heap and sort it in descending order. The first element of the slice is the largest integer and should be the same as the integer you get from popping the heap.

When you run the test function with these test cases, everything works fine:
% go test -run=TestHeap -v

=== RUN TestHeap

- - - PASS: TestHeap (0.00s)

PASS

ok github.com/sausheong/gocookbook/ch18_testing 0.229s

 

As you can see, you test only with this input data into the heap. This is where fuzzing comes in. In Go 1.18, fuzzing was introduced in the go test toolset. Fuzz tests are added as fuzz functions in the same _test.go files you use for the test functions.

Each fuzz function must start with Fuzz , similar to how test functions start with Test ; and each takes only one parameter, which is a pointer to testing.F .

There are two parts to creating a fuzz function:
• Seeding the input to the fuzz function using the f.Add function.
• Running the fuzz test itself by calling the f.Fuzz function and passing it a fuzz target , which is a function that has a pointer to the testing.T parameter, as well as a set of fuzzing arguments .

Take a look at how you can convert your test function to a fuzz function:

func FuzzHeap(f *testing.F) {
    var h *Heap = &Heap{}
    h.elements = []int{452, 23, 6515, 55, 313, 6}
    h.Build()
    testCases := []int{51, 634, 9, 8941, 354}
    for _, tc := range testCases {
        f.Add(tc)
    }
    f.Fuzz(func(t *testing.T, i int) {
        h.Push(i)
        //  make  a  copy  of  the  elements  in  the  slice  and  sort  it  in
        //  descending  order
        elements := make([]int, len(h.elements))
        copy(elements, h.elements)
        sort.Slice(elements, func(i, j int) bool {
            return elements[i] > elements[j]
        })
        //  pop  the  heap  and  check  if  the  top  of  heap  is  the  largest
        //  element
        popped := h.Pop()
        if elements[0] != popped {
            t.Errorf("Top  of  heap  %d  is  not  the  one  popped  %d\n  heap is  %v", elements[0], popped, elements)
        }
    })
}

You create a function named FuzzHeap that accepts a pointer to testing.F . In this function, you start off by setting up the max heap as before. Then you take the test cases and add them to the seed corpus , the collection of seed input for the fuzz tests, using f.Add .

The fuzz target has a pointer testing.T as well as a single integer. The fuzzing arguments must be the same and also in the same sequence as the parameters you pass into f.Add as you register the inputs into the seed corpus. In your fuzz function, you pass a single integer into the f.Add function, so you will have only a single integer as the fuzzing argument.

The fuzz target body is the same as the earlier test function, and you’re done! Run it!

To run a fuzz function, you need to use the -fuzz flag, passing it a part of the function name (or simply a period to indicate everything). You can also pass in a -fuzztime parameter to indicate how long you want to run the fuzz function, because fuzz functions will run forever if they can’t find any bugs!
% go test -v -fuzz=Heap -fuzztime=30s

=== RUN TestHeap

- - - PASS: TestHeap (0.00s)

=== FUZZ FuzzHeap

fuzz: elapsed: 0s, gathering baseline coverage: 0/1484 completed

fuzz: elapsed: 0s, gathering baseline coverage: 1484/1484 completed, now fuzzing

with 10 workers

fuzz: elapsed: 3s, execs: 692916 (230887/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 6s, execs: 1343416 (216901/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 9s, execs: 2078265 (244901/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 12s, execs: 2827429 (249737/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 15s, execs: 3527717 (233462/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 18s, execs: 4256457 (242874/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 21s, execs: 5014656 (252735/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 24s, execs: 5757659 (247697/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 27s, execs: 6447953 (230105/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 30s, execs: 7175096 (242388/sec), new interesting: 1 (total: 1485)

fuzz: elapsed: 30s, execs: 7175096 (0/sec), new interesting: 1 (total: 1485)

- - - PASS: FuzzHeap (30.30s)

PASS

ok github.com/sausheong/gocookbook/ch18_testing 30.935s

 

The first line indicates that the baseline coverage is gathered by executing the test with the seed corpus and the generated corpus before fuzzing begins. If the test doesn’t work in the first place, there’s no point doing fuzzing.

The number of workers indicates how many fuzz targets are run in parallel. You can actually specify this using the -parallel flag, but if you leave it empty, it will use GOMAXPROCS , which by default is the number of cores available.

In the following lines, elapsed shows how long the fuzzing has been running, execs shows the total number of inputs that have been run against the fuzz target, while new interesting shows how many inputs have expanded the code coverage beyond existing corpora, with the size of the entire corpus.

The fuzz function itself can be run as a normal test function with the seed corpus. If you run it with go test as you would any test function, you should get these results:
% go test -run=FuzzHeap -v

=== RUN FuzzHeap

=== RUN FuzzHeap/seed#0

=== RUN FuzzHeap/seed#1

=== RUN FuzzHeap/seed#2

=== RUN FuzzHeap/seed#3

=== RUN FuzzHeap/seed#4

- - - PASS: FuzzHeap (0.00s)

- - - PASS: FuzzHeap/seed#0 (0.00s)

- - - PASS: FuzzHeap/seed#1 (0.00s)

- - - PASS: FuzzHeap/seed#2 (0.00s)

- - - PASS: FuzzHeap/seed#3 (0.00s)

- - - PASS: FuzzHeap/seed#4 (0.00s)

PASS

ok github.com/sausheong/gocookbook/ch18_testing 0.246s

 

As you can see, you have five runs of the fuzz target against the five seed inputs in the seed corpus, and all of them pass.

This is all good, but it doesn’t really show how fuzzing helps make the software more robust. A simple example can show this. Change your rearrange function a bit. Instead of comparing h.elements[left] you compare h.elements[left - 1] . It’s a small change that can result in an error, and it can easily go undetected:

func (h *Heap) rearrange(i int) {
    ...
    if left < size && h.elements[left-1] > h.elements[largest] {
        largest = left
    }
    ...
}

To prove this, run it against your TestHeap test function. You should see that it runs perfectly well and the test case passes. Now run it against the FuzzHeap fuzz function:
% go test -v -fuzz=Heap -fuzztime=30s

=== RUN TestHeap

- - - PASS: TestHeap (0.00s)

=== FUZZ FuzzHeap

fuzz: elapsed: 0s, gathering baseline coverage: 0/1484 completed

fuzz: elapsed: 0s, gathering baseline coverage: 19/1484 completed

- - - FAIL: FuzzHeap (0.28s)

- - - FAIL: FuzzHeap (0.00s)

testing_test.go:260: Top of heap 313 is not the one popped 158

heap is [313 158 55 23 6 - 327 - 349]

 

Failing input written to testdata/fuzz/FuzzHeap/03b1c861389a9c041082690dc8b

25528f6ff6debab2a7fc99524a738895bea1f

To re-run:

go test -run=FuzzHeap/03b1c861389a9c041082690dc8b25528f6ff6debab2a7fc99524a

738895bea1f

FAIL

exit status 1

FAIL github.com/sausheong/gocookbook/ch18_testing 0.540s

 

As you can see, it fails at the baseline coverage, and the element that was popped from the heap wasn’t the maximum. You can also see that the input to the failed test case is written to a test data file. If you open it, you should see something like this:
go test fuzz v1

int( - 349)

And if you run the FuzzHeap function as a normal test function, you will immediately see that the other test cases pass with the other input, but with - 349 the max heap doesn’t work any more:
% go test -run=FuzzHeap -v

=== RUN FuzzHeap

=== RUN FuzzHeap/seed#0

=== RUN FuzzHeap/seed#1

=== RUN FuzzHeap/seed#2

=== RUN FuzzHeap/seed#3

=== RUN FuzzHeap/seed#4

=== RUN FuzzHeap/03363930589906b56680eea723dd29e2744bd87e28b0995dd65209094

ef3080d

testing_test.go:260: Top of heap 313 is not the one popped 51

heap is [313 55 51 48 23 9 6]

- - - FAIL: FuzzHeap (0.00s)

- - - PASS: FuzzHeap/seed#0 (0.00s)

- - - PASS: FuzzHeap/seed#1 (0.00s)

- - - PASS: FuzzHeap/seed#2 (0.00s)

- - - PASS: FuzzHeap/seed#3 (0.00s)

- - - PASS: FuzzHeap/seed#4 (0.00s)

- - - FAIL: FuzzHeap/03363930589906b56680eea723dd29e2744bd87e28b0995dd65209094

ef3080d (0.00s)

FAIL

exit status 1

FAIL github.com/sausheong/gocookbook/ch18_testing 0.475s

 

You can imagine this can be pretty hard to detect! If you fix the code, you can run the same FuzzHeap test again and see that it has passed all the tests, including a regression one that was automatically generated from a failed fuzz test:
% go test -run=FuzzHeap -v

=== RUN FuzzHeap

=== RUN FuzzHeap/seed#0

=== RUN FuzzHeap/seed#1

=== RUN FuzzHeap/seed#2

=== RUN FuzzHeap/seed#3

=== RUN FuzzHeap/seed#4

=== RUN FuzzHeap/03b1c861389a9c041082690dc8b25528f6ff6debab2a7fc99524a

738895bea1f

- - - PASS: FuzzHeap (0.00s)

- - - PASS: FuzzHeap/seed#0 (0.00s)

- - - PASS: FuzzHeap/seed#1 (0.00s)

- - - PASS: FuzzHeap/seed#2 (0.00s)

- - - PASS: FuzzHeap/seed#3 (0.00s)

- - - PASS: FuzzHeap/seed#4 (0.00s)

- - - PASS: FuzzHeap/03b1c861389a9c041082690dc8b25528f6ff6debab2a7fc99524a

738895bea1f (0.00s)

PASS

ok github.com/sausheong/gocookbook/ch18_testing 0.283s

 

Fuzzing is a powerful tool. However, it can be pretty expensive to run, especially in an automated continuous integration pipeline, since it can be CPU intensive.

 

posted on 2023-10-18 18:50  ZhangZhihuiAAA  阅读(6)  评论(0编辑  收藏  举报