Saturday, June 20, 2020

Building a hardware watchdog timer for a kiosk or other system that needs to run 24x7 - version 2.0

My previous post covered a first version of a watchdog timer that I used in the past for another project. 

You may check it here.

As I mentioned there, I suspected that a somewhat different design would be necessary, because the target device could not have either GPIO pins available for allowing a keep-alive signal to be sent to the watchdog timer, or even if it had, it would be unlikely that the underlying linux OS on the Android system could not have the necessary drivers or support for changing the output of such pins.

And so it seemed to be the case (at least from the software side, by inspecting the /proc filesystem and other similar clues, and on the hardware side, by looking at the existence of obvious pads or unpopulated components that could be traced back to the SoC of this device).

On the other hand, there was what strongly suggested it could be the pads for a serial port used for factory testing or to recover the device in the case of a failed firmware update:

As  such, I first checked with the oscilloscope which voltage levels were used, and I went on and connected it to the computer through a USB serial bridge set to 3.3 Volts logic levels.

Upon setting a serial console (putty) at 115200 bps 8N1 and booting up the device, I could immediately see the usual linux kernel log output, as the system started. This was a very positive indication that this could be the way to go.

[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.0.36+ (hyz@ubuntu01) (gcc version 4.6.x-google 20120106 (prerelease) (GCC) ) #59 SMP PREEMPT Mon Dec 8 10:07:57 CST 2014
[    0.000000] CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=10c5387d
[    0.000000] CPU: VIPT nonaliasing data cache, VIPT aliasing instruction cache
[    0.000000] Machine: RK30board
[    0.000000] ddr size = 2048 M, set ion_reserve_size size to 125829120
[    0.000000] memory reserve: Memory(base:0x98800000 size:120M) reserved for
[    0.000000] memory reserve: Memory(base:0x97d00000 size:11M) reserved for
[    0.000000] memory reserve: Total reserved 131M
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] bootconsole [earlycon0] enabled
[    0.000000] CPU SRAM: copied sram code from c0ce2000 to fef00100 - fef02040
[    0.000000] CPU SRAM: copied sram data from c0ce3f40 to fef02040 - fef02970
[    0.000000] sram_log:   @      0q   @       P    1      /  !    aB@ 2   `" 2q    : 4q
[    0.000000] CLKDATA_MSG: pll_flag = 0x01
[    0.000000] L310 cache controller enabled
[    0.000000] l2x0: 16 ways, CACHE_ID 0x4100c0c8, AUX_CTRL 0x76050001, Cache size: 524288 B
[    0.000000] DDR DEBUG: version 1.00 20131106
[    0.000000] DDR DEBUG: DDR3 Device
[    0.000000] DDR DEBUG: Bus Width=32 Col=10 Bank=8 Row=16 CS=1 Total Capability=2048MB
[    0.000000] DDR DEBUG: init success!!! freq=222MHz
[    0.000000] DDR DEBUG: DTONE=0x1, DTERR=0x0, DTIERR=0x0, DTPASS=0x4, DGSL=0 extra clock, DGPS=270
[    0.000000] DDR DEBUG: DTONE=0x1, DTERR=0x0, DTIERR=0x0, DTPASS=0x4, DGSL=0 extra clock, DGPS=180
[    0.000000] DDR DEBUG: DTONE=0x1, DTERR=0x0, DTIERR=0x0, DTPASS=0x4, DGSL=0 extra clock, DGPS=270
[    0.000000] DDR DEBUG: DTONE=0x1, DTERR=0x0, DTIERR=0x0, DTPASS=0x4, DGSL=0 extra clock, DGPS=180

On the device side, I then tried to understand to which serial device in the linux OS this port corresponded to. After a small amount of trial and error - because there were multiple serial devices in the /dev filesystem, all of which were good candidates for being serial ports:

root@rk3188:/dev # ls -las /dev/tty*
     0 crw-rw-rw-    1 root     root        5,   0 Jan  1  2011 /dev/tty
     0 crw-------    1 root     root      254,   0 Jun 19 21:42 /dev/ttyFIQ0
     0 crw-------    1 root     root      251,   0 Jan  1  2011 /dev/ttyGS0
     0 crw-------    1 root     root      251,   1 Jan  1  2011 /dev/ttyGS1
     0 crw-------    1 root     root      251,   2 Jan  1  2011 /dev/ttyGS2
     0 crw-------    1 root     root      251,   3 Jan  1  2011 /dev/ttyGS3
     0 crw-rw----    1 bluetoot 3008        4,  64 Jan  1  2011 /dev/ttyS0
     0 crw-------    1 root     root        4,  67 Jan  1  2011 /dev/ttyS3

I ended up determining that /dev/ttyFIQ0 was the correct port. By sending something to it:

echo "test" > /dev/ttyFIQ0

I could see it appear on the serial console. So all seemed good as a starting point.

The next challenge would be of how could I build a hardware watchdog timer that would be capable of receiving a message via this serial port, and control the Android device in order to reboot it.

Given that the PIC micro-controller from the previous Watchdog timer (WDT) project was too basic for this purpose (it didn't have a hardware UART, and bit banging would probably be too unreliable, especially at 115200 bps), I decided to consider a different micro-controller.

The option that was readily available in my case was the PIC16F628A because I had a stock of these at hand.

Like the PIC12F683, this is also an 8-bit micro-controller, but it features a hardware USART (like a UART but more versatile as it can be configured for interfacing a synchronous serial bus as well).

It also features a 16-bit timer (and two other 8-bit timers), allowing the timer code from the previous project to be taken as a basis (not that is anything especially complex).

And so I went on with coding the firmware for the new micro-controller. The basic idea is whenever the timer overflows, an interrupt is generated, and an interrupt service routine (isr) takes care of incrementing a counter variable or producing the WDT action (sending a command to the target device and briefly turning off and on the relay) when this variable overflows. Then TIMER1 is restarted and the cycle repeats.

On the other hand, while the timer is running, a software loop continuously reads from the serial port, comparing the stream of bytes with the expected string ("kiosk_wdtToken"). Everytime a match is found, the counter variable of the timer is reset to 0, preventing the triggering of the WDT.

I have shared this code on github, in this repository:

The microcontroller was then mounted in the perfboard I was already using as a base board for the Android board:

One caveat that this microcontroller presents, is at configuring an accurate enough approximation of the required baud rate. In the particular application for which I developed this WDT, I first intended to use the internal clock oscilator, which tops at 4 MHz. But for producing 115200, at this clock speed, the closest rate that is possible to obtain is 125000, which is quite a long shot. 

As such the only option in this case was to add an external 20 MHz crystal, and configure the clock to run at that speed. At this clock rate, the USART can generate 113 636 bps, which for most purposes is close enough (only 1.36 % off).

For turning the Android board on and off, I added a relay which in turn is controlled by one of the GPIO outputs of the microcontroller:

The Android board then sits on top of most of these added electronics:

At the end it worked like a charm. While my kiosk would usually crash every one or 2 days, now for all it matters I don't have to care about restarting it manually, and in a practical sense it is always available.

No comments: