Implementation of Serial Wire JTAG flash programming in ARM Cortex M3 Processors
Implementation of Serial Wire JTAG flash programming in ARM Cortex M3 Processors
The goal of the project was to use the Serial Wire JTAG protocol implemented
in the ARM cortex processors for programming the flash memory of it.
JTAG was actually implemented for PCB debugging in complex integrated circuits.
But later, because of its different capabilities, it was used for accessing sub-blocks of integrated circuits
and In System programming, by adding some protocol which enables access to the inner parts of the processor.
2.2.1 JTAG
A JTAG interface is a special four/five-pin interface added to a chip,
designed so that multiple chips on a board can have their JTAG lines daisy-chained together
if specific conditions are met, and a test probe need only connect to a single “JTAG port”
to have access to all chips on a circuit board. The connector pins are
- TDI (Test Data In)
- TDO (Test Data Out)
- TCK (Test Clock)
- TMS (Test Mode Select)
- TRST (Test Reset) optional.
2.2.2 Serial Wire JTAG
Serial Wire Debug (SWD) provides a debug port for severely pin limited packages,
often the case for small package microcontrollers but also complex ASICs
where limiting pin-count is critical and can be the controlling factor in device costs.
SWD replaces the 5-pin JTAG port with a clock and a single bi-directional data pin,
providing all the normal JTAG debug and test functionality
plus real-time access to system memory without halting the processor or requiring any target resident code.
SWD uses an ARM standard bi-directional wire protocol, defined in the ARM Debug Interface v5,
to pass data to and from the debugger and the target system in a highly efficient and standard way.
As a standard interface for ARM processor-based devices,
the software developer can count on a wide choice of interoperable tools from ARM and third party tool vendors.
SWD provides an easy and risk free migration from JTAG as the two signals SWDIO and SWCLK
are overlaid on the TMS and TCK pins, allowing for bi-modal devices that provide the other JTAG signals.
These extra JTAG pins can be switched to other uses when in SWD mode.
Firstly, for testing purposes, I was advised to use the parallel port of the IBM PC for interfacing the SWJ pins.
2.2.3 Parallel Port
Parallel port is one of the communication ports provided in computers to interface with the Printers,
because of what it is most often called the printer port (LPT).
It can be used for other purposes as well.
Later versions of windows don’t provide access the communication ports of computers easily.
But we can use some Kernel mode Dynamic Link Libraries (DLL) to avail the access.
The DLL file that I used was inpout32.dll which is provided with the free license.
The standard C functions that access the communication port registers
are _inp () to get the reading and _outp () to set the output.
But when the windows version doesn’t allow accessing the registers,
the .dll files modify those functions using some kernel mode drivers.
In our case those functions were modified to Inp32 () and Outp32 ().
Figure 2.2.1 Parallel port pins
Parallel port of the PC contains,
- DATA pins, Data Registers, address 0x378
- CONTROL pins, Control Registers, address 0x379
- STATUS pins, Status Registers, address 0x37A
In Standard Parallel Port mode (SPP),
- DATA pins – Output only
- CONTROL pins – Input/output
- STATUS pins – Input only
I used,
- DATA pin 1 as Clock (Pin 1 in the DB25 connector)
- DATA pin 2 as Output (Pin 2 in the DB25 connector)
- STATUS pin 5 as Input (Pin 11 in the DB25 connector)
Pin 1 CLK ________________ Pin 2 O/P ————-|————> Pin 11 I/P <———–|
And I created an Open Collector output at Pin 2 using a transistor,
to keep it in high impedance state, when it’s operating in input mode.
2.2.4 Serial Wire JTAG protocol
Each successful SWD transfer consists of 3 parts:
- A header (always from the external debugger)
- An acknowledgement from the target (provided it recognizes the header)
- A data payload, the direction of which is determined by the header
A write transfer is shown below.
Figure 2.2.2 Serial Wire Write transfer
A start bit is used to enable the line to idle, a period when the clock can be stopped or free running
(note that the clock also does not need to have a set frequency).
One bit defines the transfer as an AP access or a DAP access;
the direction of data on the SWD interface is provided, and 2 address bits are given.
These address bits allow a sequence of AP accesses to use the 4 registers in a bank of a specific AP
without having to change the AP select register in the DP.
A parity bit and a stop bit are added to provide some tolerance to data corruption and hot plugging.
The header ends by driving the line high, where it should be held by a pull-up.
After the header, the target will respond (after a single cycle) giving an indication of the status of the interface,
and if the acknowledgement matches the OK pattern, write data is sent with a parity bit (Even parity is used).
A successful read transfer is similar, as shown below.
Figure 2.2.3 Serial Wire Read Transfer
The turn-round cycle (TRN in the diagrams) is placed after the data phase for a read,
as there is no change of direction between ACK and RDATA.
Figure 2.2.4 Serial Wire WAIT Response
This response indicates to the debugger that the debug port is still active,
but there is an outstanding transfer which has not completed.
When a read transfer is issued on the SWD interface,
the response will be the result from the previous read.
Thus to read an ASIC memory location, typically 3 transfers are necessary:
- Write to the AP’s Transfer Address Register with the target address
- Read the AP’s Data Transfer Register to initiate the transaction.
- Read a benign register (DP status for example) to return the required target data.
Similarly, an AP write operation will consider the following phases.
- Write to the AP’s Transfer Address Register with the target address
- Write to the AP’s Data Transfer Register to initiate the transaction.
Figure 2.2.5 Serial Wire FAULT Response
The FAULT response indicates that the link is still active,
but the debug port will only respond to a read of its ID or Status registers,
or a write to its ABORT register which is used to clear the state of any sticky bits once they have been read.
That was an overall description about the Serial Wire JTAG protocol and the timings between data packets.
The table below shows the address-bits (important!! 3rd bit first, 2nd bit next) of the header for DP (Debug port) registers.
A(3:2) | R/W | CTRLSEL Bit of SELECT Register | Register | Notes |
00 | R | IDCODE | Identification Code for SWDP | |
00 | W | ABORT | ||
01 | R/W | 0 | DP-CTRL /STAT | Setting some Control bits and reading some Status bits |
01 | R/W | 1 | WIRE CONTROL | Configure the value for the turnaround time |
10 | R | READ RESEND | Recovery of the corrupted AP transfer without retransmitting the same | |
10 | W | SELECT | Selecting the Current AP register and the four word register window | |
11 | R/W | READ BUFFER | Reading the data of posted read operationsStores the reply of the previous AP read transaction |
Table 2.2.1 DP Register information
For example, the bit pattern of reading the IDCODE register will be,
1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | – | 1 | 0 | 0 | IDCODE | P |
NOTE: In our implementation, we had no need to wait for that TURN AROUND TIME; it may not have been implemented.
When accessing the AP (Access Port) registers, in addition to the address bits in the header,
their selection is influenced by another four bits in the AP SELECT register (Debug port).
Before accessing any AP register, we should make sure that we have written the AP SELECT register with the correct values.
Figure 2.2.6 SELECT Register
The figure shows the AP SELECT register bits. The highlighted four bits should be written with correct values.
The table below shows the AP registers and their respective SELECT register bits and address bits in the header.
MEM-AP Register | Address | Register Bank (APBANKSEL) | Offset(A[3:2]) |
CSW/Control Status Word | 0x00 | 0x0 | 00 |
TAR/Transfer Address Register | 0x04 | 0x0 | 01 |
DRW/Data Read And Write Register | 0x0C | 0x0 | 11 |
BD0/Bank Data Register 0 | 0x10 | 0x1 | 00 |
BD1/Bank Data Register 1 | 0x14 | 0x1 | 01 |
BD2/Bank Data Register 2 | 0x18 | 0x1 | 10 |
BD3/Bank Data Register 3 | 0x1C | 0x1 | 11 |
CFG/Configuration Register | 0xF4 | 0xF | 01 |
BASE/Base Address Register | 0xF8 | 0xF | 10 |
IDR/Identification Register | 0xFC | 0xF | 11 |
Table 2.2.2 AP Registers
Now we shall see how we can enable Serial Wire JTAG mode.
After a reset the default debug mode is JTAG.
We need to switch from JTAG to SWD mode.
1. Send more than 50 TCK cycles with TMS (SWDIO) =1 2. Send the 16-bit sequence on TMS (SWDIO) = 0111100111100111 (MSB transmitted first) 3. Send more than 50 TCK cycles with TMS (SWDIO) =1 4. Then drive the data line low and apply clock pulses for more than 2 clock periods
Once we have enabled SWD mode, we can access the DP registers and find the IDCODE,
write the necessary things in the ABORT,CTRL/STAT registers.
The next thing is unlocking the program memory for getting access of the program memory.
For that, we need to write some codes into two of the flash interface registers.
Once we have enabled them, we can directly access the memory registers.
Then, if you want to erase the contents of a register, the things you should do are,
- Write the Transfer Address Register with the ADDRESS of the register that you want to erase.
- Write the Data Read/Write Register with a whole 0 word (0x00000000), A full page starting from that address will be erased.
And more importantly, the words in the memory should be written in little endian format.
And I had to read the contents of an Intel Hex format file,
and transfer the data to the flash memory. (Intel Hex format is described in Annexure SWJ_A).
The next part of my project was to implement the same thing in a different platform.
It was Windows Mobile platform and I had to use two GPIO pins instead of using the parallel port.
2.2.5 Working on Windows mobile Platform
My task then was to set up another 2 pins instead of the parallel port pins,
so that we can drive the required signals.
The other parts of the code can be left unchanged if we modify the code for the register allocation for driving the pins.
So I had to find a way to access some of the GPIO pins of the new platform.
That isn’t a similar platform like Windows, i.e, we cant just directly access the registers related to those hardware pins that easily.
In order to access those pins, we need to first allocate some space in the virtual memory space (something like a RAM),
and then we have to copy(load) the physical memory space into the allocated memory space and get a pointer to that memory.
Having that done, we can now access the actual physical hardware using that pointer.
To do the memory allocation and copying, we can use the following microsoft’s platform dependant functions.
- VirtualAlloc () – Allocates some memory space in the virtual memory space.
- VirtualCopy () – Copies the physical memory to the allocated memory.
- MmMapIOSpace () – Does both the functions above in a single function.
- VirtualFree () – Frees the allocated memory space.
After getting the access to those GPIO pins, I was able to do the samething that I did using the parallel port.
I developed a Windows Mobile MFC application with a graphical user interface to select the hex file and interactive buttons
to erase and program the flash memory of the STM32L152 microcontroller.
2.2.5 Problems faced in the Project
- The first problem I encountered had occurred when I was trying to convert the program to Windows mobile platform.
That was, instead of parallel port pins, we had to make 2 designated GPIO pins to generate the signals.
As I mentioned already, we had to map the physical addresses of those hardware pins to some virtual memory space.
For that we used the function MmMapIOSpace (), which has 3 arguments to be passed with,
one of which is the physical address of the hardware we want to map from.
We had a lot of trouble in finding the physical address of those two pins.
It took almost three days to find those addresses.
It was got cleared from another branch in America of theirs, who actually developed the architecture. - Second problem had occurred just after fixing the first problem.
After mapping those hardware pins, I just had to do a small change in the program which worked with the parallel port
that is modifying the fundamental hardware accessing parts.
Previously when I was working with the parallel port the supply voltage of the development board was 3.3V,
but now in this platform supply voltage is 1.8V, and I was given with another board which works in1.8V.
Once I made the changes in the code, without being unaware of the capability reduction due to the drop in the supply voltage,
I just ran the application and got no response at all from the target.
That application actually used some MHz range frequency which understandably is too much for a microcontroller operating in 1.8V.
But I was not actually aware of that issue. But later, without having any reasons, I accidentally tried with a reduced frequency,
which of course miraculously at that time responded and worked perfectly.
Then only I understood about the capability reduction due to a drop in supply voltage. - Next problem had occurred at the very last stage of the project.
That was, everything worked fine until I used my own hand made wires to connect the host and target.
But when I was asked to use the standard cables that were produced for this purpose, the application failed.
I had no clue what the problem was. When I asked the mentor about this issue, he gave me a solution.
The actual problem was with the sampling of the SWDIO signals using the SWDCLK.
Previously the signals generated by my application set up the DATA at the exact same point where the CLK raised.
This works fine as long as we use very small connecting cables.
But to be on the safe side, we need to have some delay between the DATA set up and CLK rise.
That was the actual reason why it failed when I used the standard cables.
And these problems can not only be caused by lengthy wires,
but also by the probe capacitances which might misarrange the signals and cause wrong signal receptions.