Monday, September 1, 2014

How to detect and correct faults in NOC ?

The use of system on chip (SOC) has increased exponentially due to high integration of a number of IP on a single chip. The higher number of IP needs higher number of bus based interconnections. The bus based interconnection leads to a parallel communication which is not efficient for bandwidth, latency and power consumption. To solve this problem a switching network is used, called Network On Chip(NOC). The complexity and the technology scale increase the occurrence of intermittent and transient faults.

In order to run a fault-tolerant system smoothly the first thing to be done is to detect the location of the faults. The fault detection mechanism should also be able to distinguish transient faults from permanent faults. In order to detect transient link errors the methods used are error coding techniques viz. cyclic redundancy check (CRC) and parity codes. To detect permanent errors in NoC there is an in-line test method to test each adjacent pair of wires and a syndrome storing-based error detection method based on evaluation of consecutive code syndromes at the receiver and there are also few works focusing on detecting transient faults and permanent faults at the meantime.

There are mainly three techniques to handle transient faults in NoC and they are Automatic repeat request (ARQ), Forward error correction (FEC), and Hybrid ARQ (HARQ). Also transient faults can be handled at both link-level and transport level. In ARQ-based error control, it is found to have errors the packet is retransmitted. They are retransmitted until it is received error free packet. The error detection is usually implemented through a cyclic redundancy check (CRC). For a simple error detecting, the code is applied to the packet before transmitting, and at the receiver side a checksum will be calculated to ensure that no error has occurred. The packet is retransmitted, if the checksum does not add up to the right value.

·         114 bits, contains a 34 bits head and an 80 bits payload. A valid bit (V) is used to mark a packet valid or not. Relative addressing is used for the source and destination address fields (SA and DA) which are 12 bits respectively. The HC field (9 bits) records the number of hops the packet has been routed.
·         No. of input should be equal to the no. of output.
A 2-hop fault information transmission mechanism isused to reduce the average hop counts. In the 2-hop fault information transmission mechanism, four additional signals (fault from[d] (1 bit), fault to[d] (1 bit), FoN from[d] (3bits), FoN to[d] (3 bits)), which are 8 bits in total for each direction of a switch and they are used to transmit fault information. Each switch is not only responsible for transmitting its own link status to four neighbours but also collecting the link status from its three neighbours and transmitting to the fourth neighbour. For example, switch A can get the status of 16 links within 2 hops. 

  Fault information transmission mechanism
The signal FoN to[d] collected by the current switch is a 3-bit vector to denote link status along the other three directions except d and is transmitted to the neighbour along d.

Tags :    HDL                 FPGA                    ASIC                   HSPICE                 VHDL 

                                                                                                         Author- Dharmendra Kumar
(Research Associate at Silicon Mentor)