Multilayer Neuronal network hardware implementation and co-simulation platform Nabil. Chouba, Khaled. Ouali, Mohamed. Moalla (LIP2, UNIVERSITY OF TUNIS EL MANAR)

Abstract— Perceptron multilayer neuronal network is widely used in industrial embedded applications. In this paper, we present a new hardware architecture that is fully generic to allow maximum flexibility by selecting the suitable design depending on available resources on FPGA or ASIC: area, consuming power and time execution. We introduce the use of the cosimulation and co-design to ensure the convergence of the learning step in simulated environment. We choose the robot Khepera obstacle avoider as a validation test case.

Index Terms— neuronal network, Perceptron multilayer, back-propagation, co-simulation, hardware implementation.

I. INTRODUCTION The changes in the density of integration into integrated circuits, offer the opportunity to explore new different approach from Von Neumann’s model. In fact, researches in cognitive science have developed formal models, bio-inspired, called "Neural Networks". These models outperform the limits of the Von Neumann model on several aspects: noise tolerance, learning from limited examples and inherit parallelism. The most popular and used by the industry is "Perceptron multilayer" using back-propagation as a learning process. This model is adopted in our work. Since 1980, several neural ASIC networks were designed [1] by large companies such as Intel [2], AT & T, Hitachi, Philips, Siemens and IBM [3]. Each design differs from the other depending on the number of neurons emulated, precision bit, the number of synapses and the speed of the learning process. Many developments were implemented on FPGA [4] [5] [6] due to the evolution of the latter one. Until 1995, the FPGA contains less than 5000 logic gates, allowing just the implementation of an instruction MAC [4] [5]. In the years that followed, many implementations [7] were made using the arithmetic series [8] [9] [10]. The use of arithmetic series (ie bit by bit computing) makes it possible to

reduce the space used. Instead, it forces them to perform the calculation by cycle iterations. Another solution to the problem of density FPGA is to divide the calculation steps by reconfigure the FPGA at each stage. Such a solution must submit a compromise between computing time and reconfiguration of time [11], [12]. In this paper we present a new neural design that exploits the rapid evolution of semiconductor field to surpass the existing solution [1-12]. In fact, our method proposes a generic and a dynamically reconfigurable design allowing the possibility to determine, in the synthesis step, the width of arithmetic to be used in the design. In other advantage of this architecture is the possibility to change on line the neural network topology. The last one is characterized by the number of layers, the number of neurons in each one, the weight of connectivity between neurons and the morphology of the neuron activation function. This design was implemented using new refinement methodology based on systemC. A great importance was allocated to the co-simulation of hardware and embedded software, lake of suitable solution to fix network topology and the arithmetic width, allowing to back-propagation algorithm to converge. The rest of the paper is organized as follow. In section 2 we briefly introduce the formal model of multilayer neural networks, the impact of reduced precision on the back-propagation. In section 3 we present our generic architecture and dynamically reconfigurable, the synthesize results are shown also in this section. In section 4 we explain the cosimulation tool that simulates the IP to command the robot Khepera [17] obstacle avoider. Section 5 concludes the paper. II. NEURONAL TECHNIQUES This part presents the most popular neural algorithms for learning process. We focus on the multilayer perceptron network. A. PROPAGATION ALGORITHM Consider the neural network presented in Figure 1, composed of k layers (0 to k-1). Each layer i is

-1-

composed of Pi neurons. The network takes n inputs (Ei), it returns q outputs (Sj) E1

Ni,0

N0,0 N1,0

E2

Nk-1,0 S1 Ni,2

N0,1

Nk-1, Sq N1,P1 En

N0,P0

Pk-1

Ni,Pi

Figure 1: Example of neuronal network The neuronal processing propagates the neuronal inputs Ei to the outputs Sj. The outputs of the neurons in the first layer are initialized directly by Ei input. The propagation is done through the different layers of neurons; each neuron Nm,j of the layer m calculates its output Xm,j from a weighted sum of the outputs Xm-1,i neurons in the layer m-1 connected to it. This processing is summarized [16] by the following algorithm: Begin Initializing of the output neurons of the first layer: (X0,i = Ei ) For each layer m, starting with the layer 1 to k-1, For each neuron j in the layer m - Calculate the output Xm,j of the neuron j :

  Pm −1 X m , j = f   ∑ X m −1, i × W i , j   i =1

  + b j 

   

Wi,j Is the weight of the synaptic connections Nm-1,i Nm,j et b j is the "bias" factor associated to neuron. The output Sj is confused with Xk,j . End B. BACK-PROPAGATION ALGORITHM The first stage of learning algorithm [16] is the submission of the neuronal input E and desired output D to the network. Then we calculate the error δ between the desired D value and the effective output value of the network. Next we execute the back-propagation algorithm to find the part error δi for the neuron Ni, (the processing is done from the output layer and arriving at the input layer, through the hidden layers) in order to correct the synaptic weight of each neuron. This algorithm involves five steps: - Step 1: initialization of weights of the network (usually by random)

-2-

- Step 2: initialization of input and desired output vectors of the network. - Step 3: propagation of the input E to the output (obtaining the propagated output) - Step 4: Calculation of the total error of the backpropagation, to update weight synaptic neurons. -Sub-phase 4.1: Calculation of the total error: Error = ½ Σ (di – Xi) ² (i ∈ [1,...,ns] with ns is the number of neurons in the output layer - Sub-phase 4.2: calculating the error of each neuron Ni belonging to the output layer δi = Xi (1 – Xi) (di – Xi) - Sub-phase 4.3: For each hidden layer, calculating the error of each neuron Ni belonging to this layer δi = Xi (1 – Xi) Σj δj Wij ( j ∈ [1,...,np] with np is the number of neurons (of the next layer), which are connected to the output of neuron Ni). - Sub-phase 4.4: Update of synaptic weights ∆Wki = η δk Xi (Wki is the weight that connects the neurons Nk and Ni, and the η is the epoch learning) Wki (t+1) = Wki (t) + ∆Wki - Step 5: Convergence analysis to determine the end of learning or iteration from step 2 with the next input. Steps 1 and 2 are limited only to initialization task while steps 3 and 4 refer to operations of calculation and synaptic weight correction. These two operations are time costly and are hard-coded. Finally, step 5 is a test step; the decision is determined by software. C.

PRECISION LIMITATIONS

The back-propagation explained previously uses continuous functions and real numbers. In hardware implementation, the real numbers are approximated by fixed-point values. The use of floating point is not recommended for hardware applications because of its silicon area cost and execution cycles. Many researches have been developed to study the possibility and effects in using fixed-point or integer approximation. The analysis of the fixed-point effect made in [15] [13] requires that a minimum synaptic weight precision of 20-22 bits is generally needed to guarantee the convergence of the back-propagation algorithm. This precision can be reduced to 14-16 bits through a judicious selection of learning epoch η. The authors proposed a method of how to choose this η. For the neurons outputs, the precision of 8 to 9 bits is sufficient. It has been shown in [14] that it is impossible to compensate the error coming from precision by an increase in the number of neurons in the hidden layers. The study also shows that a higher precision is always necessary for the hidden layers.

III. HARDWARE IMPLEMENTATION The designed IP is named NEURONA; it was implemented to be fully generic and dynamic to overcome the problems of precision limitation and the choice of the neuronal network topology. Theses two problems remain the handicap of neuronal hardware implementation. In fact, they cause the divergence of the back-propagation algorithm (See §II-C). The proposed design is illustrated by Figure 2.

exploration is controlled by the Control Unit CTR step by step. At every step, this block informs the CTR whether we move to the next neuron in the same layer or move to the next layer. - The Control Unit CTR. It uses the information coming from the previous block to command the ALU and the address generator gen_add_x (for the Memory X) and gen_add_w (for the Memory W). Biais Biais W5

NEURONA

Memory F [Activation function]

X[0,0]

n1 W2

W12

n4

W3

ALU

n6

W10 W11

W1

W6

X[0,1]

W9

n7

W13 W4

n2

W14

W7 W15 n5

CTR [Control unit]

Gen adr X

Gen adr W

Memory W [Wight]

Memory X [Neuronal output]

W16 W17

X[0,2]

n8

W8

n3

(3-a) Memory W [Synaptic weights values]

Topology explorer Memory T [Description of the network topology]

Figure 2: Schema block Our design is composed of: - ALU which calculates the output neurons from the inputs values Xi and the synaptic weight Wij. - Memory F which synthesizes the activation function - Memory T which contains the topology code of the neuronal network (number of layers and the number of neurons per layer). This memory associates with each layer i (i ranges from 0 to k-1) the word in the address i whose content reflects the number of neurons forming the layer. The last word in the address k contains a zero value indicating the end of the network. - Memory W that contains synaptic weights Wi,j of the connections between neurons. The order of Wi,j storage in the memory is chosen in order to simplify the generation of address when performing the algorithm. - Memory X which stores the neuron output value in propagation mode. In back-propagation mode, this memory contains the error made by the neurons. Each Memory T, W and X has its own address generator (Figure 3). - Topology Explorer which searches the information stored in the Memory T. This

-3-

W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17

Memory X [Values of the weights of the output neurons]

neuron 4 neuron 5 neuron 6 neuron 7 neuron 8

X[0,0] X[0,1] X[0,2] layer 0 X[1,0] layer 1 X[1,1]

Memory T [Topology of the network] 3 2 3 0 (end)

X[2,0] layer 2 X[2,1] X[2,2] D[2,0] D[2,1] Desired output D[2,2]

(3-b) Figure 3: Coding memory of the case studied network In the first step, the CTR Unit initializes the different blocks. Then, it commands the beginning of propagation. So, it orders the Topology Explorer to read the first two words of memory T. Once the blocks are initialized, the calculation of weighted sum Σj Xj Wij begins. Finally, the activation function output is stored in X memory. The CTR repeats this task until the propagation is done. In the backpropagation step, we choose those strategies: - Processing Steps 1 and 2 (§II-B): Storage of input X0,i and the desired output di values of the network in the X Memory (Table1)

- Next in the Sub-phase 4.2 (§II-B), the CTR controls the address generators (gen_add_x and gen_add_w) to calculate the error δi in the output layer. This error is then propagated though other network to input layer. Table 1 Memory X X0,0

X1,0





Xf,0

d0

X0,1

X1,1





Xf,1

d1

...

...

Xα,i

Xα+1,i

...



X0,n







Xf,n

dn

- Processing of the Sub-phase 4.3 (§II-B): Calculation of error of each neuron i in the hidden layer. To optimize memory space, the value δα,I will be stored in the boxes Xα+1,i in the X memory. (Table 2) This choice allows to progressively replace Xα,i by δα-1,i once calculated, Xα,i are not longer useful for the back-propagation algorithm.

X1,0





δf-1,0

δf,0

X0,1

X1,1





δf-1,1

δf,1

...

...

Xα,i

δα,i

...



X0,n







δf-1,n

δf,n

The neural architecture has been implemented on FPGA. We give below the characteristics and results of this implementation. - Synthesis Tool: Symplicity, version 7.3, Build 192R - FPGA: Apex [ep20k100eqc240-2x] - Synthesis constraint: Frequency 20.0 MHz (fixed by the studied board) Table 3 - Synthesis Results On RAM RAM RAM Frec, WX line logic Memory W X T MHz Learn 4

16 16 200

128

4

8 8

32

32

4

8 8

32

32

4

x

x

Standard Algorithm Simulated Arithmetic

Embedded Software In the robot

IV. SYNTHESIS RESULTS

128

Neuronal Application Model

Abstraction level

- Processing of Sub-phase 4.4 (§II-B): Update of synaptic weights. To optimize both hardware implementation and time execution, the two subphases 4.3 and 4.4 were carried out simultaneously: each time calculation of the error δα,i done, this value is stored in the memory replacing Xα+1,i and immediately updating synaptic weight Wij connected to this neuron (weight that connects the neuron i to the next layer)

16 16 200

The proposed platform is a CAD tool for neuronal embedded implementation, the platform gives the possibility to simulate and select the three components of neuronal system: application environment, the hardware architecture design and the embedded software. The Figure 4 shows the architecture of this tool. The development is specifically dedicated to the mobile robot navigation, and can be advantageously transposed to other neuronal application by just redefining the environmental model.

RTL (SystemC) IP Neurona

Table 2 - Memory X X0,0

V. PLATFORM AND CO-SIMULATION

18%

23%

25.0

56%

30%

23.9

9%

15%

37.0

23%

15%

36.1

-4-

Environment Model

GUI: Robot And the obstacles Visualization Virtual World Manager

Figure 4: neuronal platform for embedded system

A. NEURONAL APPLICATION MODEL The model consists of neuronal architecture NEURONA coupled to an embedded processor for the supervision of navigation and control learning. The aim of our tool is to achieve optimal architecture NEURONA ensuring the best behavior. The optimization is performed on the number of layers, the number of neuron and the width of the arithmetic. This optimization is achieved by several iterations of refinement based on the simulation at various abstraction levels. Initially, we perform a standard algorithmic level using floating point of the C++ programming. Then we try to reduce the arithmetic by emulation of the floating-point or fixed-point. Thus, we are sure that we have chosen the adequate parameters of arithmetic. Finally, the use of SystemC RTL level permits to optimize the synthesizable version via generic model of NEURONA. The tool allows validating the learning epoch η based on the RTL model. The use of various abstraction levels provides a progressive refinement getting closer to the hardware constraints. The interaction between NEURONA and the embedded software lets us validate and fix bugs earlier in the design phase.

The fact of using the same language to simulate hardware and software allows us to reduce development and debugging time. The works done in [4] and [5] depend on the emulated hardware model. In fact, to emulate the hardware behaviors and software interactivity, a C++ based hardware model must be developed from its specification. B. ENVIRONMENT MODEL

Several leaning examples confirm the theoretical research already presented on §II.C. Results show the impact of the arithmetic on the leaning speed and on the convergence of the back-propagation algorithm. The figure 5 clearly shows that 18 bits are not sufficient for the convergence. It proves that the convergence increases by relaxing the arithmetic constraints.

The modeling of the environment aims to create a virtual world representing the robot to serve as a test for embedded neuronal application. This virtual world is modeled by the mathematical equations describing 2D geometry of the navigation space with obstacles. It is also modeled by the equations of emulation behavior of the robot sensors and motor. The GUI interface allows us to create and modify a virtual environment and to plan and visualize scenarios navigation "used as a test bench". We used language C++, SystemC and Qt library [18] for the management of graphical interfaces. The migration of embedded environment software is facilitated by the fact that all microcontrollers support the C code. The implemented platform helps us to validate the overall system behavior, using virtual world. The testbenches are generated by the GUI and allow us to view the evolution of the robot in the universe. C. VALIDATION OF HARDWARE COMPONENT The traditional IP validation method is to create a pattern generator based on a C++ or MATLAB algorithm named reference algorithm. The output file is usually a HDL-written testbench. Under this project, we will adopt a slightly different methodology. Indeed, we compare the reference algorithm results and those of the implementation in SystemC. We perform both algorithms in a single program adding conditional structures which allow this comparison. The debugging is based mainly on statistical benchmarks to find the source of the error. This method enables us to measure the difference between the output of the algorithm using the standard floating-point and fixed-point and our implementation in SystemC.

Figure 5: Leaning Path including hardware constraints

VI. CONCLUSION AND PERSPECTIVE Encouraged synthesis results give us the possibility to go deeply in the exploration of other more interested hardware architecture. We can exploit the inherit parallelism and the growth in FPGA capacity and performance. The used co-simulation has ensured the convergence for used neuronal implementation. This pushes us in future work to continue on the same perspective. In fact, we can use the SystemC TLM library to accelerate the architecture exploration phase and improve the refinement design method and co-simulation. REFERENCES [1]

[2]

[3]

[4]

D. SIMULATION RESULTS The tool developed allows us to explore the hardware and the possible neuronal topologies. It also permits to generate graphs (figure 5) which facilitate the simulation analysis.

-5-

[5]

[6]

Jan N. H. Heemskerk , Overview of neural hardware ,Unit of Experimental and Theoretical Psychology Leiden University, P.O.Box 9555, 2300 RB Leiden The Netherlands J-P. LeBouquin, IBM Microelectronics ZISC, Zero Instruction Set Computer, Proc. of the World Congress on Neural Networks, Supplement, San Diego, 1994. J. Hopfield and D. W. Tank, Computing with neural circuits: A model, science, vol. 233, pp. 625-633, August,1986. Cox, C.e.; Blanz, w.e. ganglion-a fast field-programmable gate array implementation of a connectionist classifier . solid-state circuits, ieee journal of ( march 1992 volume 27 number 3 ) Marcelo H. Martine , A Reconfigurable Hardware Accelerator for Back-Propagation Connectionist Classifiers (1994), university of California Santa Cruz Aaron T. Ferrucci ACME, A Field-Programmable Gate Array Implementation of a Self-Adapting and Scalable Connectionist Network (1994), university of California Santa Cruz

[7]

[8]

[9]

[10]

[11]

[12] [13]

[14]

[15]

[16]

[17] [18]

Bernard GIRAU, Du parallélisme des modèles connexionnistes à leur implantation parallèle, L’école normale supérieur de Lyon (1999) Jean-Luc Beuchat, Etude et conception d'opérateurs arithmétiques optimisés pour circuits programmables. Thèse de doctorat, Ecole Polytechnique Fédérale de Lausanne, 2001. Thèse No 2426. BibTeX Jean-Luc Beuchat, Arnaud Tisserand, Opérateur en-ligne sur FPGA pour l'implantation de quelques fonctions élémentaires. Actes de la conférence Sympa'8 Symposium en Architectures Nouvelles de Machines, pages 267-274, 2002. BibTeX K. S. Trivedi and M. D. Ercegovac. On-line Algorithms for Division and Multiplication. IEEE Transactions on Computers, C-26(7):681-687, 1977. J.G. Elredge and B.L.Hutchings : rrann a hardware implementation of the backpropagation algorithm using reconfigurable fpgas , ieee int cof, neural network june 1994 JL.Beuchat, J-O.Haenni and E. Sanchez, Hardware Reconfigurable Neural Networks J.L. Holt and J-N Hwang. Finite precision error analysis of neural network hardware implementations. IEEE Transactions on Computers', 42:1380-1389, 1993. Yun Xie, Marwan Jabri, 91Analysis of the Effects of Quantization in Multi-Layer Neural Networks Using Statistical Model (1992) Leonardo M. Reyneri and Enrica Filippi. An Analysis on the Performance of Silicon Implementations of Backpropagation Algorithms for Artificial Neural Networks. 1991 IEEE Transactions on Computers, 40(12):1380--1389, December 1991. B. Widrow and M. A. Lehr. 30 Years of Adaptativ Neural NetWorks: Perceptron, Madaline and Backpropagation. Proc. IEEE, 78(9):1415-1442, 1990 K-TEAM S.A., Preverenges, Switzerland, Khepera User Manual. (http://www.kteam.com). Trolltech provides cross-platform software development frameworks and application platform (www.trolltech.com)

-6-

ICESCA'08_Nabil_chouba_Multilayer Neuronal network hardware ...

Abstract— Perceptron multilayer neuronal network is widely ... Index Terms— neuronal network, Perceptron multilayer, ..... Computers, C-26(7):681-687, 1977.

129KB Sizes 1 Downloads 238 Views

Recommend Documents

Predicting Synchrony in a Simple Neuronal Network
of interacting neurons. We present our analysis of phase locked synchronous states emerging in a simple unidirectionally coupled interneuron network (UCIN) com- prising of two heterogeneously firing neuron models coupled through a biologically realis

Predicting Synchrony in a Simple Neuronal Network
as an active and adaptive system in which there is a close connection between cog- nition and action [5]. ..... mild cognitive impairment and alzheimer's disease.

Bioelectronical Neuronal Networks
Aug 4, 1999 - those wonderful SEM pictures and for performing the EDX measurements. Many thanks ... to reconsider the sheer endless possibilities, and to find the (hopefully) best approaches. ...... Tools for recording from neural networks.

Neuronal activity regulates the developmental ...
Available online on ScienceDirect (www.sciencedirect.com). ..... Multi-promoter system of rat BDNF .... data provide the additional information that deprivation of visual ..... Egan, M.F., Kojima, M., Callicott, J.H., Goldberg, T.E., Kolachana, B.S.,

Dynamic structures of neuronal networks
The quantum clustering method assigns a potential function to all data points. Data points that ... 4.2.6 Relations between spatial and spatio-temporal data . . . 20.

Quantum stochasticity and neuronal computations
degrees Kelvin tissue strongly coupled to its environment - display macroscopic .... Biochemical regulation at the nanomolar scale: it's a noisy business!

Nematode locomotion: dissecting the neuronal ... - Semantic Scholar
To survive, animals process sensory information to drive .... facing receptive field would offer a better engineering solution. Experimental support for anterior stretch control in forward .... potential, followed by a slow relaxation back to baselin

Biophysical constraints on neuronal branching
After a branching area was chosen and its image was acquired to the computer, the exact geometry .... on to a Master degree at the Physics School, Tel Aviv University. In 1997 she ... Physics University of California, Santa Barbara. In 1984 he ...

Vulnerability of the developing brain Neuronal mechanisms
About 300,000 low birth weight neonates are born in the United States each year [1], and 60,000 of them are classified as very low birth weight (< 1500 g). An overwhelming majority of these children are born preterm, at a time when the brain's archit

Hardware and Representation - GitHub
E.g. CPU can access rows in one module, hard disk / another CPU access row in ... (b) Data Bus: bidirectional, sends a word from CPU to main memory or.

Interaction between Visual- and Goal-related Neuronal ...
direction errors were defined as saccades whose endpoint fell closer to a distractor than the target. A saccade deviation metric was derived using a method.

Morphological characterization of in vitro neuronal ...
Aug 14, 2002 - ln(N)/ln( k , solid line and regular graphs lreg n/2k, open triangles. Data for our networks are also compared to other real networks open dots, data taken from Albert and Barabasi 26,. Table I. b The network's clustering coefficient c

Social experience affects neuronal responses to male ...
training typically causes a shift in the tuning of single neurons toward relevant sounds (reviewed ..... After a week-long recovery period, the electrodes were connected to ... was digitized by a data acquisition system (CED Power 1401 interface ...

Predicting Neuronal Activity with Simple Models of the ...
proposed by Brette and Gerstner therefore seems ide- ally suited to deal with these issues [6]. It includes an additional mechanism that can be tuned to model.

Study of hypothermia on cultured neuronal networks ...
trolled investigations from the basic micro-level cell network up to the macro-level of the ... in order to establish a base line for normal neuronal activity. .... the same general trend. .... mia may support and may help explain recent experiments

Neuronal algorithms that detect the temporal order of ...
1996; Srinivasan, Zhang & Bidwell, 1997) can depend on the ability to compute the direction of motion. The first ... D=A.~B. (A). (B) τ. FIG. 1: Two models of temporal order detection have been obtained from experimental data. (a). Hassenstein-Reich

Neuronal Population Decoding Explains the Change in ...
(b) Schematic illustration of motion induction by the surrounding stimulus, ..... targets, both activation patterns are nearly flat and therefore do not widely ..... making no contribution never depend on the vertical speed, which is not the case in.

All quiet on the neuronal front: NMDA receptor ...
Correspondence to Andrew D. Steele: [email protected] ... All quiet on the neuronal front: NMDA receptor inhibition by prion protein. Andrew D. Steele. Division of Biology, California Institute of Technology, Pasadena, CA 91125 .... Bueler , H

Mutant huntingtin is present in neuronal grafts in ...
Tampa, FL, USA, 33606,7John van Geest Centre for Brain Repair, Department of Clinical. Neurosciences, University of ..... Immobilon PVDF membranes (EMD Millipore, Billerica, MA, USA) overnight at 15V in the transfer buffer (20% methanol (Fisher .....

JUNIOR INSTRUCTOR-MECHANIC COMPUTER HARDWARE ...
(D) Fo'tran. lia@ a trTe of network it which anv conputar can aci a! both a sefld! md cliont : ... A deqe rn rnddrs @mptrkr getrsar€d dds i' a form Nisble lo' rrtn'oissis rrmu3b ... COMPUTER HARDWARE-INDUSTRIAL TRAINING DEPARTMENT.pdf.