The Nine Indispensable Rules for HW/SW Debugging 软硬件调试之9条军规
I read this book in the weekend, and decided to put the book on my nightstand. It's a short and funny book, clear insight and good stories, strongly recommend entry even senior engineers to read it.
Introduction
This book tells you how to find out what’s wrong with stuff, quick. It indeed short and fun. I finished reading in the weekend and made some notes. And it convinced me the Nine Rules powerful to hardware/software design and design, as some rules I already have, which helped me find the bugs efficiently. Some engineer probably say it’s an old book, which is not applicable today, as we have big progress on HW/SW debug tools and instrument. Although tools can help reduce or find bugs more or less, it will bring some latent bugs, which is harder to find. Besides, over-dependency on tools blocks a new engineer to become a clear thinker, obviously, this side effect is harmful in one’s engineer career life.
how can that work?
a. when it took us a long time to find a bug, it was because we had neglected some essential, fundamental rule; once we applied the rule, we quickly found the problem.
b. People who excelled at quick debugging inherently understood and applied these rules. Those who struggled to understand or use these struggled to find bugs.
Obvious vs Easy
These things are obvious (fundamentals usually are), but how they apply to a particular problem isn't always so obvious.
Don't confuse obvious with easy - these rules aren't always easy to follow, and thus they're often neglected in the heat of battle.
Debugging vs Troubleshooting
Debugging usually means figuring out why a design doesn’t work as planned. Troubleshooting usually means figuring out what’s broken in a particular copy of a product when the product’s design is known to be good.
The Nine Indispensable Rules
1. Understand the System 了解系统
No1 rule is the most important, and it deserves much time if you really want to fix bugs ASAP.
1.1. Read the manual
HW engineer has to read chip datasheet, and SW engineer has to read API/Frame document.
1.2. Read everything in depth
Read everything, cover to cover.
1.3. Know the fundamentals
1.4. Know the road map
1.5. Understand your tools
1.6. Look up the details
2. Make it Fail 复现故障
It seems easy, but if you don't do it, debugging is hard.
2.1. Do it again
2.2. Start at the beginning
2.3. Stimulate the failure
2.4. But don't simulate the failure
2.5. Find the uncontrolled condition that makes it intermittent
2.6. Record everything and find the signature of intermittent bugs
2.7. Don't trust statistics too much
2.8. Know that "that" can happen
2.9. Never throw away a debugging tool
3. Quit Thinking and Look 观察!而不是瞎想
3.1. See the failure
3.2. See the details
3.3. Build instrumentation in
Use source code debuggers, debug logs, status messages, flashing lights, and rotten egg odors.
3.4. Add instrumentation on
Use analyzers, scopes, meters, metal detectors, electrocardinography machines, and soap bubbles...
3.5. Don't be afraid to dive in
3.6. Watch out for Heisenberg
Don't let your instruments overwhelm your system.
3.7. Guess only to focus the search
4. Divide and Conquer 分而治之,各个击破
4.1. Narrow the search with successive approximation
4.2. Get the range
4.3. Determine which side of the bug you are on
4.4. Use easy-to-spot test patterns
4.5. Start with the bad
4.6. Fix the bugs you know about
4.7. Fix the noise first
5. Change One Thing at a time 每次只改变一个因子
5.1. Isolate the key factor
5.2. Grab the brass bar with both hands
5.3. Change one test at a time
5.4. Compare it with a good one
5.5. Determine what you changed since the last time it worked
6. Keep an Audit Trail 调试日志
6.1. Write Down What You Did, in What Order, and What Happened as a result
6.2. Understand that any detail could be the important one
6.3. Correlate events
6.4. Write it down! No matter how horrible the moment, make a memorandum of it
7. Check the Plug 检查所有设定条件
7.1. Question your assumptions
7.2. Start at the beginning
7.3. Test the tool
8. Get a Fresh View 换个角度看问题
8.1. Ask for fresh insight
8.2. Tap expertise
8.3. Listen to the voice of experience
8.4. Know that help is all around you
8.5. Don't Be Proud
8.6. Report Symptoms, Not Theories
8.7. Realize that you don’t have to be sure
9. If you didn't fix it, it ain't fixed 你不解决bug,bug就不会解决
9.1. Check that it’s really fixed
9.2. Check that it’s really your fix that fixed it
9.3. Know that it never just goes away by itself
9.4. Fix the cause
9.5. Fix the process