

National Institute for Research and Development of Isotopic and Molecular Technologies

# FPGA Based Hardware Architectures for High Performance Computing Applications Authors Bogdan BELEAN | Sergiu POGACIAN | Adrian BOT







5th Romania Tier 2 Federation Grid, Cloud & High Performance Computing ScienceDr. Ing. Bogdan Ioan BELEAN

### **FPGA Based Hardware Architectures for High Performance Computing Applications**

# Content



- A FPGA Technology Description
- **B** cDNA Microarray image processing
- С Low-density Parity Check Codes (Error Correction Codes)

# **FPGA Technology Description**

# FPGA

Α

Field Programmable Gate Arrays = digital logic chips containing:

- Configurable Logic Blocks (CLB)
  - LUT (Look Up Table)
  - Multiplexors
  - Flip-Flops
- Programmable interconnects
  - & switch matrices
- I/O Bocks (programmable)
- Block RAMs
- Processors (Power PC)
- Clock



# **FPGA Technology Description**

Α

#### Major advantage

### Spatial vs. temporal computation

- **temporal** (serial) **computing** – only one computation can proceed at a time. CPU has to wait while program code or data is fetched.

- **spatial** is parallel **computing** – one set of gates is processing on part of the algorithm while other gates are doing other tasks.



### Microarray experiment:

- Prelevarea probelor de cDNA
- Probes labeling (fluorescent markers Cy3, Cy5)
- Hybridization of cDNA probes on microarray glass slide
- Microarray scanning
- Data analysis



#### 6/

- Processing Platforms: Agilent Feature Extraction Software, Agilent GeneSpring
- □ GEO database (Gene Expression Omnibus) images and results
  - Preprocessing
    - Noise removal
    - Image Enhancement
  - Addressing
  - Segmentation
  - Intensity extraction





В

8

Β

#### Image processing methods for automatic microarray image processing

- Preprocessing
  - Enhancing weakly expressed spots

$$I_L(x, y) = \frac{\ln(I_0(x, y) + 1)}{\ln 2^n} \cdot 2^n \qquad I_A(x, y) = \begin{cases} \frac{k+1}{2^n} \\ \frac{2^n \operatorname{atgh}(\frac{I(x, y) - k}{2^n})}{\operatorname{atgh}(\frac{2^n - 1}{2^n})}, I(x, y) > k; \end{cases}$$

- Adressing
  - Shock filters
  - **IN** : Image profiles

$$VP(x) = \frac{1}{Y} \sum_{y=0}^{Y-1} I(x, y)$$
$$HP(y) = \frac{1}{X} \sum_{x=0}^{X-1} I(x, y)$$

Continuous model

$$U_t = -sign(U_{xx}) |U_x|$$

#### **Discrete model**

 $\left[\frac{2^n \operatorname{atgh}(\frac{I(x, y) - k}{k+1})}{\operatorname{atgh}(\frac{-k}{k+1})}, I(x, y) <= k;\right]$ 

$$U_{t} = -sign(U_{xx})|U_{x}|$$
$$U_{i}^{n+1} = U_{i}^{n} - |DU_{i}^{n}| \cdot sign(D^{2}U_{i}^{n})$$





9

Β

#### Image processing methods for automatic microarray image processing



10

В

#### **Overall results**





| Resurse hardware utilizate |              |                               |            |       |            |
|----------------------------|--------------|-------------------------------|------------|-------|------------|
|                            | Transformare | Calcul de profile și adresare | Segmentare | Total | Disponibil |
|                            | Logaritmica  | (filtre de soc)               |            |       |            |
| Nr. slice reg.             | 18           | 355                           | 1068       | 1441  | 69120      |
| Nr. slice LUT              | -            | 8525                          | 1736       | 10261 | 69120      |
| Nr. Block RAM              | -            | 4                             | 2          | 6     | 148        |
| Nr. BUFG                   | 1            | -                             | 1          | 2     | 32         |
| Nr. DSP48E                 | 4            | 2                             | -          | 6     | 64         |

11

Introduction

## LDPC

- Introduced by Galager in 1962
- LDPC codes offer remarkable performances falling only 0.04 dB short of the Shannon theoretical limit
- insufficient computational power available for the decoding process
- FPGA/ASIC technologies and digital signal processors, LDPC codes are considered a significant breakthrough in the world of digital communications
- Standards:
  - WiMAX for wireless networks
  - DVB-S2 for satellite broadcasting services use LDPC codes



- iterative decoding
- error correction capacity
- computationally expensive

13

Introduction

### LDPC codes

- Inter block codes (m,n) where m = information bits and n = total no. of bits in a codeword
- k = control bits added after encoding

n = m + k

- encoding relation:  $[i] \cdot G_{\underline{mxn}} = [c]$ Generator matrix
- **decoding** based on parity matrix  $H_{M,N}$  sparse matrix
  - Hard decoding
  - Soft decoding Message passing algorithm (probability propagation)



14

Introduction

- **Standards:** 
  - WiMAX 576 x 288
  - DVB-S2 1022 x 8176

| Implementation<br>approach | Code length<br>(standard) | Throughput |  |
|----------------------------|---------------------------|------------|--|
| 1. GPU                     | 10 000 bits               | 100Mbps    |  |
| 1. FPGA/ASIC               | 2048b                     | 240 Mbps   |  |
|                            | 672b (~WiMAX)             | 822 Mbps   |  |
|                            | 64800b (DVB-S2)           | 520 Mbps   |  |
| 1. ASIP                    | 1620b                     | 300 Mbps   |  |
|                            | 1620b (WiMAX)             | 100 Mbps   |  |
|                            | 2304b (WiMAX)             | 62 Mbps    |  |

15

С

#### **FPGA based hardware architectures**



the total delay path for updating the var nodes (R3) register values in case of 1 decoding iteration is

T = u+v (clock cycles)

represents the number of 1 values within the H lines

the number of clock cycles necessary for *Sum2*<sup>*i*</sup> addition, respectively

### T = 10 in case of WiMAX standard

#### WiMAX standard

M = 576 codeword length  $N_{iterations} = 10$  the number of iterations for the decoding process  $F_{clk} = 350 MHz$  the frequency of the FPGA based decoder **Decoder throughput estimation** 

$$Throughput = \frac{M \cdot f_{clk}}{N_{iterations} T} \approx 2GHz$$

### Conclusions

- Two applications were presented
  - Microarray image processing embedded system
  - LDPC decoder implementation
- FPGA Technology
  - Iterative algorithms
  - Increased data content
  - Real time system
  - Efficient implementations for high performance computing applications