I have been designing electronic hardware for most of my 25 years of professional life including FPGAs, controllers, and processors. Debugging electronic circuits and performing root cause analysis can be a challenge depending on the symptoms of the circuit, the device, or the system. It can feel like trying to find a needle in a haystack. Where do you start?
I personally do not believe Universities spend much (if any) time on this subject. And since younger and less experienced Engineers are no longer buddied up with experienced Engineers, they often need to figure this out for themselves. I hope this article may help by providing some possible strategies on how to zero in on root causes. Debugging takes experience, instinct, and practice. Some issues are easier to spot than others. Document the analysis. It makes it easier to follow a plan for getting to the root cause. Sometimes it is necessary to follow possible root causes and they do not pan out to be the root cause; having this written down can help with making better choices on next steps. I define possible root causes as an interim step to explain the symptom(s) I am observing. They may or may not be the actual root cause. It is often necessary to gain access to signals and power that may be buried in the PCB. Especially today it becomes more and more difficult to access pins as less and less copper is exposed. Allow for some debugging features in your design even if it means you connect some pads to unused pins that signals can be routed to via FPGA or processor fabric. If necessary, you may need to sacrifice one or more circuit boards to scratch solder resist away, cut traces, or solder wires to nets so you can observe them. Some creativity may be required. I recently had a situation where some boards were reset by a micro during power-up (brown out). I was quickly able to identify it was a power dip issue caused by a combination of low impedance, high capacitance, and a DCDC converter current limit going into a semi shutdown state during over-current events. The time of the reset, the power dip, and enabling the supply all coincided. The root cause in the end was a power switch used to power up a portion of the supply for the purpose of power sequencing. The fix was finding a different switch with the same footprint, but with a slower soft start (10x). Using some creative soldering, I was able to prove root cause by putting a series power resistor into the switch path and measure the current and voltage dip across that resistor. Study the Symptom To find the cause we need to first get a better understanding of the symptom.
Follow the Symptom Sometimes it is necessary to follow the symptom to the next level, meaning that especially system level issues symptoms may be caused by a series of events purposefully designed for the system, but triggered in the wrong way. Understanding these mechanisms and finding the trigger is necessary to determine and zero in on possible root causes. Chasing a Possible Root Cause There are a number of generic steps that can be taken to verify possible root causes. If at any of these steps you find a correlation to the symptom, follow the trail.
Verifying the Root Cause The importance of this step should not be underestimated. It is vital proof the root cause has been found.
I wish you great success in debugging and root cause analysis. Make sure you use the right tools for the job. Too many times I have seen people use inadequate tools and missed the obvious root cause. This way, problems propagated into different designs and were left undetected for many months or even years. This article was originally published on HaraldSiefkan.com and has been reposted here with permission. Comments are closed.
|