Ad

Saturday, December 5, 2020

Reducing the bill of materials of my Hardware Watchdog timer

One nuance of the first few iterations of any new design is that rarely comes optimized in every important aspect from scratch.

It is very difficult to cover everything, when we are creating something new from the ground up. The creation process is by itself a very good sink of our cognitive energy and focus, leaving little room for a more pedantic mode of thought.

But once we stabilize a design and the functional aspects are addressed (e.g. no bugs left to fix), then we normally can think of optimizing it. That is the phase where we are either able to look in hindsight and work on the technical debt, or acknowledge the suboptimal aspects of our design, and start addressing these. In a real world case, the budget to do this normally comes associated with some kind of business pressure that allows the engineering focus to be placed in optimization tasks. For example a product that consumes too much battery power or is too slow, it is very likely to require engineers to go back to the working bench and focus on optimizing it. Also, a product that is too expensive to manufacture because requires too many components or expensive/exotic components will likely also require attention because in all these cases there is a business impact, either because the product provides poor User Experience (UX) or it doesn't meet the projected revenue targets due to the high cost of production.

Albeit in a less "real world" level of importance, optimization is the activity I came across while revisiting this small project.

I found that sometimes my Home Assistant instance freezes. Not a full blown crash of the entire device (e.g. it keeps responding to pings), but due to some mostly unknown condition (at the moment of writing this post at least), the services running in its docker containers seem to stop responding. This started being more noticeable in the last few updates of the home-assistant and HassOS, so that it is possible that a bug may have been introduced meanwhile.

Given that this device provides automations and monitoring of my house, in order to be dependable, the system needs to be available as long as possible. I even invested in adding an UPS and storage into SSD in an attempt to minimize catastrofic failures. This was covered in this post:

https://www.creationfactory.co/2020/10/configuring-home-assistant-to-run-off.html

On the other hand sticking to a stable version of the the software and never update can also pose security risks and limit the integrations and exposure to the evolution of the platform.

With that in mind, I considered that it would be important to also equip the Raspberry Pi 2 (where I run the Home Assistant) with the dedicated watchdog timer (WDT) that I built in the past for another RPi based solution, the Kiosk. This is detailed in this post:

https://www.creationfactory.co/2020/06/building-hardware-watchdog-timer-for.html

The reason behind doing this is twofold: while most Raspberry Pi's have a hardware watchdog built into their SoC's, the HassOS doesn't currently provide the software required to take advantage of it. 

If the a different linux distribution is used, for example Raspbian, it it possible to setup this internal watchdog and use it to reset the system. You may want to take a look at this blog, for indications on how to do it:

https://diode.io/raspberry%20pi/running-forever-with-the-raspberry-pi-hardware-watchdog-20202/

While HassOS may in the future begin to support this internal WDT, it is however limited to being able to restart the Raspberry Pi SoC. This means that if external peripherals are the cause of the system being hung, it may not be sufficient by itself to bring the system back to working order. Nevertheless it may be a good complement to the external watchdog timer I am describing here - e.g. use the internal watchdog timer for faster response to freezing of the Pi itself, and the external one as a secondary watchdog timer (with more conservative timeouts), to power cycle everything if some peripheral remains stuck.

With this in mind, the way the WDT basically works is if the Home Assistant and its major components become unavaliable, heartbeat messages will cease to be sent to the watchdog timer (either as a consequence or through logic deliberately added to Home Assistant), causing it to restart the Raspberry Pi and the connected peripherals.

Because I needed to make a new board, I took the opportunity to revise  the design and  make it better. One aspect that kind of upset me in the previous design was that I had to add a level shifter circuit for the serial port pins. 

Because I was using 115200 bps baud rate on the serial port, I could not use the internal oscillator. Instead I had to use an external 20 MHz crystal and this had the implication of requiring the PIC microcontroller to be powered from 5 Volts instead of 3.3 Volts. This in turn would prevent the serial port pins from using the 3.3 Volt levels that the RPi is rated for. Providing 5 Volts to the Raspberry Pi serial port will fry it.

So in this new iteration I decided to rewind a bit and figure out if instead I could use another baud rate for the communication, for example 9600 bps. With this speed I could still use the internal oscillator (running it at 4 MHz), and on top of that, power the PIC at 3.3 Volts.

And so I went and tested this approach. It basically allowed me to ditch both the level shifter for the serial port and the external crystal and capacitors from the design. In a commercial product this would have been a substantial saving.

As for my application the 9600 bps would be a sufficiently high speed (the Home Assistant will just send the "kiosk_wdtToken" string once every minute to reset the watchdog timer, so it is more than enough), this would in fact optimize my previous design without having any functional tradeoff.

Because I was now powering the PIC from 3.3 Volts, the only tradeoff was that I needed a regulator to drop the voltage from 5 Volts to 3.3 Volts to power the PIC. While the Raspberry Pi could source that voltage, I could not use it, because the way the watchdog restarts the RPi is by cycling its power. As such I added a LM1117 regulator in order to obtain the 3.3 Volts needed for the PIC. 

In order to obtain the source code for the PIC and the schematic diagram for this device you can go to my github repository and clone the project:

https://github.com/teixeluis/generic-watchdog-timer

You will find all the details necessary to build the circuit and flash the PIC.

No comments: