Regular 2D NASIC-based Architecture and Design Space Exploration Ciprian Teodorov∗ , Pritish Narayanan† , Loic Lagadec∗ , Catherine Dezan∗ and Csaba Andras Moritz† ∗ Lab-STICC

CNRS UMR 3192 Universit´e de Bretagne Occidentale, Brest, France † Nanoscale Computing Fabrics Laboratory University of Massachusetts, Amherst, USA

Abstract—As CMOS technology approaches its physical limits several emerging technologies are investigated to find the right replacement for the future computing systems. A number of different fabrics and architectures are currently under investigation. Unfortunately, at this time, no unified modeling exists to offer sound support for algorithmic design space exploration, with no compromise on device feasibility. This work presents a NASIC-compliant application-specific computing architecture template along with its performance models and optimization policies that support domain-space exploration. This architecture has up to 29X density advantage over CMOS, is completely compatible with the NASIC manufacturing pathway, and enables the creation of unique max-rate pipelined systems.

I. I NTRODUCTION Some nanowire-based fabric proposals emerged which all exhibit some common key characteristics. Among these, their bottom-up fabrication process leads to a regularity of assembly, which means the end of custom-made computational fabrics in favor of regular structures designed with respect to the application needs. Hence research activities in this area mainly focus on structures conceptually similar to today’s reconfigurable PLA and/or FPGA architectures [1], [2]. A number of different fabrics and architectures are currently under investigation, for example CMOL [1], FPNI [2], NASIC [3]. They are based on a variety of devices such as FETs [4], spin-based devices [5], diodes, and molecular switches [6]. All these fabrics include some support in CMOS: some like FPNI would move the entire logic into CMOS, others, like NASIC, would only provide the control circuitry in CMOS. The rationale for this varies but includes targeted application areas as well as manufacturability issues [7]. Apart from the fabrication issues, another limitation lies in the linkage between architecture and exploitation tools. This prevents algorithms/tools reuse, thus hindering shared improvements over fabric designs. This also slows the intrinsic performances comparison through devices, whereas such an ability to compare is the key, driving the domain-space exploration. Hence, to summarize this short nano-computing landscape analysis, it is important to note that several proof-of-concept architectures exist that take into account some fabrication constraints and support fault-tolerance techniques. What is still missing is the ability to capitalize on these experiments

while offering a one-stop shopping point for further research, especially addressing new algorithms. Sharing metrics, tools, and exploration capabilities is the next challenge to the nanocomputing community. This study presents a regular application-specific circuit architecture, based on the NASIC fabric architectures concepts. This architecture, named R2D NASIC, shows a number of very promising characteristics such as: •

• •



• •

regularity, which means easier fabrication process in the context of nanoscale technologies that have huge constraints in terms on custom placement and routing of wires; compatibility with the NASIC fabric manufacturing pathway [7] adaptability to a variety of technological and applicative constraints, such as nanowire length, logic density, physical delay, etc; capacity to implement max-rate pipelined designs based on its pipelined routing architecture, which paves the way towards high-throughput digital circuits – approaching the theoretical limits of max-rate pipelining presented by Cotten in [8]; simplified delay estimation, due to the dynamic logic evaluation and pipelined routing architecture; compatibility with the fault-tolerance techniques presented in the context of NASIC fabric [3].

Moreover, we propose a design automation flow, based on standard tools used in the reconfigurable architecture field, that maps standard logic netlists onto R2D NASIC clusters. And we show how this flow can be used without modification to provide a baseline evaluation of the architecture, thus bootstrapping the design-space exploration. This study starts by presenting the NASIC fabric concepts (Section II) based on which the R2D NASIC architecture, presented in Section III, is designed. The CAD flow used for circuit mapping on this architecture is detailed in Section IV. In Section V the architecture is evaluated and the results obtained by mapping circuits from the MCNC benchmark on the R2D Nasic architecture are reported. We conclude (Section VI) presenting the principal characteristics of this design along with its trade-offs and future developments.

III. R EGULAR 2D NASIC A RCHITECTURE

NASIC (Nanoscale Application Specific Integrated Circuits) [3] is a nanoscale fabric proposed for semiconductor nanowires and targeting datapaths. NASIC designs use crossed nanowire field-effect transistors (xnwFET) [4] on 2-D semiconductor NW grids to implement logic functions. NASICs are based on cascaded 2-level logic style (e.g., AND-OR, NAND-NAND) or on heterogeneous two-level logic [9] (e.g. AND-OR/NOR). Microwires provide control signals generated from CMOS circuitry, with a dynamic control style that channels the flow of data through the nanowire tiles. NASIC designs are optimized according to specific applications to achieve higher density and defect/fault-masking. By using dynamic circuits and pipelining on the wires, NASICs eliminate the need for explicit flip-flops in many areas of the design [10] and achieve unique pipelining schemes.

The Regular 2D NASIC architecture (R2D NASIC) is a general purpose NASIC-based architecture template. It is based on a regular array of cells of identical size that are interconnected by a flexible routing architecture, which enables arbitrary circuit placement and routing while maintaining all timing and signal integrity constraints. Moreover the cell design enables logical application partitioning as interconnected two-level logic functions. Routing Block

Connection Block

II. NASIC FABRIC

Logic

Fig. 2. An R2D NASIC cluster, showing a 2x3 array of parametric cells. On the left and right sides the CMOS I/O circuitry is shown

Fig. 2 presents a high level view of this architecture, showing the main architectural components: the logic block (LB), the connection block (CB), the routing block (RB) and the CMOS I/O infrastructure. These components form a R2D NASIC cell detailed in Fig. 3. These cells are replicated regulary to form a cluster. Cells lacking the LB are added at the periphery to assure the structural completeness of the cluster. Fig. 1. NAND-NAND based implementation of a NASIC 1-bit full adder [11]. White boxes are n-type transistors.

Fig. 1 shows a 1-bit full adder implemented on the NA fabric. This tile uses a 2-level cascaded NAND-NAND scheme for logic implementation. The tile comprises of n-type nanowires in horizontal and vertical directions surrounded by a small number of microwires carrying power supplies (Vdd and Vss) and control. To dynamically control the flow of data through the tile the hpre, heva, vpre and veva control-signals are used. Over the years some computing architectures were designed based on the NASIC fabric principles, starting with a nanoscale wire-streaming processor, named WISP [3], a simple processor used as a case study in the context of NASIC research. In [12] a FPGA-like reconfigurable architecture called NFPGA was proposed along with a layered approach to application design automation on this architecture. In [13] the authors present an image processing architecture based on an array of simple identical cells communicating locally. The architecture presented in this study can be seen as an extension of these developments. SIC

A. Logic and Interconnect The logic block uses a NAND-NAND two level scheme, implemented using two dynamic xnwNFET stages, forming a tile, as proposed in [11]. Such a tile is characterized by 3 parameters: the number of inputs, the number of minterms, and the number of outputs. These parameters are tunned according to the size of the application circuit, and to the physical constraints, then a custom tile instance is created and replicated in a grid. The inputs and the outputs of the tiles are connected to CBs which link the logic tiles with the routing infrastructure. The Fig. 3 presents the layout of a cell, which is composed of a LB (top-right), a CB, and a RB. The interface between the nanowires with the CMOS support circuitry is done by using CMOS controlled FETs, which are used especially for providing good control signals for the dynamic evaluation stages. In the top-left part the CMOS I/O interface is presented, which will be detailed in the following section. The routing architecture is built using routing elements based on dynamic logic evaluation stages which operate by signal inversion. The RB assures the connection between different routing channels. One particularity of the RB is that

CMOS-gated NWFET

MultiNW-gated FET

result each period (each heva assertion, see Sec III-C). Thus the signal a − c, that could be routed with a latency of 2, has been delayed 6 evaluation stages, inside the SB, to satisfy the latency constraint.

Logic Block

CMOS I/O

VDD eva

Output Input Routing Block

pre GND

B. Cluster CMOS IO HeightCell

Connection Block

WidthCell

Fig. 3. The layout of a R2D NASIC Cell. The thinner wires represent the nanowires.

it has one set of vertical (and one set of horizontal) directional routing tracks used to ease the signal routing inside the RB but also to delay a certain signal a number of stages (e.g. the signal a-c, in Fig. 4). This feature can be used by the routing algorithm to balance the pipeline stages to create highthroughput circuits approaching the max-rate pipeline limits.

In the case of R2D NASIC cluster, the input and output signals will be provided via lithographic pitch wires. The lithographic circuitry and wiring (which is also used for the power, ground and control signals) provides a reliable structure for providing the input signals and for collecting the computed results. The simplest arrangement uses the cells at periphery of the cluster to attach lithographic scale wires on the inter-switch vertical connections. To drive an input to a certain nanowire a lithographic gated NWFET will be placed at the crosspoint. Since the lithographic-scale wires are wider the NWFET will have a wider gate, and thus a better control. In the case of the outputs a similar arrangement might be possible but this time the nanowires will act as gates for a lithographic FET. One variation of this scheme, as shown in the Fig. 3, is to use multiple nanowires (carrying an identical signal) as a multiple gate FET to provide strong switching for the lithographic-scale FET. The principal advantage of this approach, besides its simplicity, is that the fabrication process presented in [7] is not altered, the I/O resources being just a particular case of control signals. C. Sequencing schemes

b

a

c

d

Within a cluster the logic blocks implement a 2 level NAND-NAND logic style. The routing structure is based on cascaded inverting stages. Each routing element inverts the signal but the routing architecture is designed to guarantee the unchanged signal transmission between logic blocks. Using an even number of routing elements, the signal is routed always through 2n inverting stages. R2D NASIC uses a three phase control scheme [14] (precharge, evaluate, hold) that precharges and evaluates a stage before the next, and the control signals are repeated every two stages. This offers the advantage of reusing the same control signals every two stages, and also enforces double signal inversion thus guaranteeing the correct signal transmission in the interconnect. 1.0 0.0 1.0

R2D NASIC signal routing example.

Fig. 4 shows an example of routing 3 signals (b−d, d−c, and a − c) on a four cell array. The propagation latency of signal b − d is 2, since it needs four evaluation stages to get from b to c and each evaluation period has 2 stages. The latency of signal d − c is 3. In consequence, the signal a − c needs to have a latency of 5 in order for the logic block c to issue one

vpre veva

0.0 1.0

(V)

Fig. 4.

vpre

hpre

0.0 1.0

vpre veva

veva hpre

heva

0.0 1.0

hpre heva

heva v(o) v(¬o)

0.0 11n

14n

t(s)

17n

Fig. 5. Pipelined R2D NASIC circuit HSpice simulation results using the 3 phase sequencing scheme.

Fig. 5 shows HSpice simulation results of a multistage circuit based on the 3 phase control scheme. For this simulation the xnwFET device model, presented in [4], was used. During the first assertion of the heva signal the circuit does not produce any results because of the latency of the pipeline. After filling the pipeline one result is issued each evaluation period, as can be seen during the second and third assertions of the heva signal. D. Parameters Due to the simple design of R2D NASIC cells and their regular replication, a limited number of architectural parameters are used to describe the clusters: • IN - the number of input of each logic block • OU T - the number of outputs of each logic block • M T ERM S - the number of minterms of each logic block • Wx - the number of horizontal routing segments • Wy - the number of vertical routing segments • NI/O - the number of the I/O lithographic scale wires for each cell at the periphery. • Rows - the number of rows in the array • Columns - the number of columns in the array For this study we will consider a simple topology with the same number of routing segments in every direction, and a segment length Lseg = 2 (the routing segments span between two routing blocks). For the cluster layout construction the following technological parameters are used: • Plitho - lithographic interconnect pitch. • Pnano - nanowire pitch E. Evaluation Metrics The metrics, presented in this section, are analytical models of three different aspects of the R2D NASIC architecture: area, nanowire length, and performance. They provide a quantitative basis for the evaluation of R2D NASIC, based on the technological, and architectural parameters (see Section III-D). a) Area: The area of each cell is derived as a function of the R2D NASIC parameters considering the cell layout proposed in Fig. 3. Heightcell

=

10 ∗ Plitho + 6 ∗ Wx ∗ Pnano + max(NI/O ∗ Plitho , M T ERM S ∗ Pnano )

W idthcell

=

10 ∗ Plitho + (6 ∗ Wy + IN + OU T ) ∗ Pnano

Areacell Areaarray

= Heightcell ∗ W idthcell = Rows ∗ Columns ∗ Areacell

The 10 * Plitho component, present in both the height and the width components of the cell, account for the 5 lithographic wires present all around the routing block. A high number of I/O wires at the periphery impacts negatively the

cell height if the logic block has a small number of minterms. A fine grained, directional, tuning of the routing segments can improve the surface area. b) Nanowire length: The design of the R2D NASIC takes into account nanowire length constraints. In the case of hard manufacturing constraints on the length of the nanowires the architectural parameters are tuned to meet the constraints. The exact length of the longest nanowire in a cluster is computed using the following formula, derived from the cell layout: N Wlength = h max (2 ∗ W idthcell ) − (IN + OU T ) ∗ Pnano i − 2 ∗ Plitho , h 2 ∗ Heightcell − max(NI/O ∗ Plitho , M T ERM S i ∗ Pnano ) − 2 ∗ Plitho c) Performance: The delay component plays a secondary role in the application output frequency due to the signal routing using dynamic logic stages, that creates a pipeline structure between the cells. In consequence: Latency

=

Scritical 2 Scritical

path

− Sshortest path +1 2 Where, Latency measures the pipeline latency, the time for an input signal to propagate to the output. Poutput is the application output period, defined as the duration between two correct output results, in terms of heva assertions. Scritical path , and Sshortest path represent the number of evaluation stages on the critical path and respectively on the shortest path from inputs to outputs. Poutput is inversely proportional to the output frequency. Moreover, along with Latency, it is greatly dependent on the quality of the circuit placement and routing, giving the motivation of the development of pipeline-aware placers and routers. Nevertheless, the delay of the evaluation stage with the highest fanin (THF IN ) imposes an upper-bound on the circuit frequency. There is a total order between the delays of different evaluation stages. Poutput

=

path

THF IN ≥ TLS ≥ TRE Where TLS is the delay of the other logic stages, and TRE is the delay of a routing stage. THF IN is computed using the basic precharge-evaluate model: THF IN = Tprecharge + Teval For circuit delay estimation, in the case of this architecture, it is not needed to estimate the delay of the critical path, but it

suffices to estimate the delay of only one logic stage (THF IN ) to obtain the clock frequency at which to operate the whole cluster. IV. P HYSICAL D ESIGN AUTOMATION In the context of nanoscale architectures, typically, the CAD flow is implemented based on specific tools created by the respective research groups [1], [2] in order to optimize the quality of the results for the targeted architecture. This approach has a number of drawbacks such as: a) a huge effort in terms of software development, b) the tools thus created cannot be reused in the context of similar projects and, most importantly c) this, a priori, closes the algorithmic optimization axis of the design space exploration problem by directly providing a supposedly optimized solution. For R2D NASIC we, deliberately, propose a routabilitydriven design automation flow which is suboptimal. But this flow enables us to produce a baseline evaluation of the architectural proposition. This baseline evaluation is used to bootstrap the design space exploration, see Section V for details. Moreover by reusing conventional tools and algorithms, like [15]–[19], we reduced the software development effort to a minimum without losing the capacity to evaluate some important aspects of the architecture (area, nanowire length). The flow, presented in Fig. 6, maps standard logic netlists (e.g. BLIF [15]) to R2D NASIC clusters. Blif Madeo

Sis

PLAMap

Placement

PLA Family Exploration

Architecture

Metrics

Routing yes

Layout

no

Fig. 6.

Design automation flow for R2D NASIC

SIS [15] performs technology independent logic optimizations and logic decomposition into small fanin nodes used for PLA family exploration and covering by PLAmap. The PLA Family Exploration step is based on the Run M Points algorithm, presented in [20], which explores different PLA families by breaking the 3D exploration space defined by (IN, MTERMS, OUT) into 3 1D spaces which are explored separately. At the end of this exploration step we obtain the P LA PLA family, Fbest , which offers the best mapping quality best (Qmapping ). For the purpose of this study, Qbest mapping is defined in terms of logic density and area as follows: Qbest mapping

=

h i Areaiarray i ) ∗ D max1≤i≤n (1 − logic Amax array

Areamax array

=

max1≤j≤n (Areajarray )

where n represents the number of different PLA families explored, Areaiarray represent the R2D NASIC cluster area of the for the ith family, Areamax array is the maximum area i obtained during the exploration, and Dlogic represent the logic

density obtained by partitioning the application for the ith PLA family. P LA Based on Fbest an empty (no xnwFETs) R2D NASIC cluster is generated. At the same time PLAMap [16] is used P LA to cover the logic into PLAs defined by Fbest . These PLAs are then placed and routed on the empty cluster using Madeo framework [17], which implements VPR-like placement [18] and Pathfinder routing [19] algorithms. The choice of using algorithms typically found in the reconfigurable architecture field might seem strange at first in the context of an ASIC architecture. But, in the case of the R2D NASIC, the nanowire functionalisation [7] can be seen, from the tool-flow perspective, as an one-time configuration step (similar to antifuse FPGA architectures [21]). During this step the empty R2D NASIC cluster is populated with the NFET devices used for logic evaluation and routing. V. D ESIGN S PACE E XPLORATION The design space exploration problem is structured around 4 principal axes: architecture, application, algorithms and metrics. In the context of this work the application axis is fixed, since we are not looking at application tuning. The evaluation metrics used are defined in Section III-E. To be able to explore the remaining two axes: architecture and algorithms, we have to be able to bootstrap the evaluation by providing default hypotheses for one of them. The design automation flow, presented in the last section, provides these hypotheses for the algorithm exploration axis. It enables us to map applications on the proposed architecture, and to explore different architectural configurations to find the most suitable one for the target application. The PLA Family Exploration step proposes a heuristic for finding most of the architectural parameters needed for the R2D NASIC architecture, namely IN, OU T, M T ERM S, Rows, Columns. The final step is used for application routing on the target architecture, and for finding the routing segments sizes (the Wx and Wy parameters) that ensures the routability of the application. This process is based on a trial-and-error approach using a binary search decision tree. The R2D NASIC architecture created using these parameters serves as baseline for future architectural and algorithmic explorations, which are outside the scope of this work. Some examples are: a) the algorithmic exploration to improve the performance of the design without changing the architecture parameters; b) the development of a better model and tool for routing segment size estimation, to increase the density of the design; c) the creation of a design space exploration flow which enables the evaluation of the area/performance tradeoff. In the following parts of this sections we present the routing segments impact on the metrics defined in Section III-E, and the results of mapping five circuits from the MCNC benchmark on the R2D NASIC architecture.

40

(42,58,32)

30

45nm 32nm 18nm

20

10

0

Mapping Quality

Area cell (μm2)

(41,82,1)

(39,84,15)

(35,44,17)

alu4 bigkey dsip ex5p misex3

(40,80,1) 20

30

40

50

60

70

80

Wx=Wy

0

20

40

60

80

Explored PLA Families

Fig. 7. R2D NASIC Cell Area for 3 technology nodes as a function of routing segments

Fig. 9.

PLA family exploration for optimal PLA sizing

12

Max nanowire length (μm)

11 10 9 8

45nm 32nm 18nm Tech. limits

7 6 5 4 3

20

40

60

80

Wx=Wy

Fig. 8. Maximum nanowire length for 3 technology nodes as a function of routing segments

For the purpose of this study we assume that each cell at the periphery has at least two lithographic scale wires providing (reading) the inputs (the outputs). The nano pitch (Pnano ) is set to 10 nm. The lithographic pitch (Plitho ) is varied through 3 technology nodes (45nm, 32 nm, and 18nm) according to ITRS [22]. To simplify, in the context of this section, we consider W x = Wy . A. Routing segments impact This section shows the relation between the number of routing segments and two of the metrics presented in Section III-E, namely the area and the nanowire length. a) Area: Fig. 7 shows that the difference in area between the 3 technology nodes is almost constant around 18% and that as the number of routing segments increases the area increases too. b) Nanowire length: Fig. 8 shows that the nanowire length increases linearly as the number of routing channels increase, and also that it stays inside the range of feasible technological ranges, as NWs of around 10 − 20µm are expected to be reliably assembled. Moreover the architectural parameters can be changed accordingly to accommodate different technological length constraints. For example, Fig. 8, shows that we cannot accommodate more than 65 routing segments for the 45nm node (or more

than 75 in the case of the 18 nm node), if the nanowire length limit is at 10µm. This constraint is integrated into the CAD flow as: a) a PLA family size constraint; b) a constraint on the number of nets between PLAs, during PLA family exploration; c) an upper bound for Wx (Wy ) exploration during routing; B. Circuit layout exploration and evaluation To asses the density benefit of the R2D NASIC cluster, we computed the layout of 5 circuits (alu4, bigkey, dsip, ex5p, and misex3) from the MCNC-20-benchmark suite [23]. a) PLA Family exploration: In Fig. 9, we show the results obtained during the PLA Family Exploration step on 5 MCNC benchmark circuits using the Run M Points heuristic [20], which explored almost 90 different PLA families, the P LA Fbest was retained and used in the next steps of the design automation flow. Since the exploration space is not linear the quality of the mapping does not improve always as the algorithm progresses but it (sometimes) jumps from local maxima to local maxima trying to find the best possible combination of the 3 defining parameters (IN,MTERMS,OUT). b) Surface Area: For the comparison with the CMOS area we used Cadence tools to compute the layout of the circuits using the Oklahoma State University FreePDK 45nm standard cell library [24]. In this case the MCNC benchmark netlists were converted from blif to verilog. The verilog netlists were synthesized using Cadence RTL compiler and the results were placed and routed using Cadence Encounter. Fig. 10 compares the area density of 5 MCNC benchmark circuits with a CMOS implementation at 3 different technology nodes. At the 45nm node the R2D NASIC have up to 29X density advantage over CMOS. But as the technology advances towards lower feature sizes this advantage is gradually lost. Even so, the R2D NASIC architecture has an upside, the regularity. Thus, as the CMOS technology pitch approaches the nanowire pitch, a tradeoff will have to be made between density (custom design) and easier fabrication (architectural regularity). The big density difference between alu4, bigkey and misex3 circuits is due to the non-linearity of the PLA exploration heuristic which in the later cases couldn’t find better mappings.

Density advantage

CMOS alu4 bigkey dsip ex5p misex3

only 6 PLAs and a low complexity routing problem, a maxrate pipelined system is achieved, which produces one results each clock cycle. But in the case of an alu4 decomposition with 67 PLAs the output rate drops to 79 which means one valid output each 79 clock cycles. This emphasizes the need for pipeline-aware tools, like the one presented in [25], [26], and even for specific tools focused on achieving max-rate pipelined designs, through pipeline balancing optimizations, which are the principal directions we are currently focusing on. VI. C ONCLUSIONS AND F UTURE W ORK

Technology node R2D NASIC vs standard cell design normalized density advantage

Max nanowire length (μm)

Fig. 10.

45nm 32nm 18nm

alu4

Fig. 11.

bigkey

dsip

ex5p

misex3

Maximum nanowire length for 5 PLA mapped MCNC circuits

c) Nanowire length: The Fig. 11 presents the obtained maximum NW lengths in the case of the 5 MCNC benchmark circuits explored. All the circuits respect the hypothetical NW length limit at 10µm, except the ex5p circuit. Which implies solution space exploration until this constraint is met. d) Performance: By measuring the applicative output rate and the latency of our designs we observed that the quality of the placement and routing greatly influences these metrics. For example, for simple designs, max-rate pipelined systems were obtained using the flow presented in the last section. But, as the design complexity increases (more than a few R2D NASIC cells), the output rate degrades rapidly. Table I shows this problem by reporting the latency and the output rate for 3 circuits decomposed such that the number of PLAs increases from one to another. In the case of the first circuit which has TABLE I T HE LATENCY AND OUTPUT RATE DEGRADATION AS THE # OF PLA S INCREASES , AND CONSEQUENTLY THE COMPLEXITY OF THE ROUTING PROBLEM .

Circuit simple wisp-alu alu4

PLA Family (4,10,2) (8,10,6) (30,20,10)

# of PLAs 6 31 67

Latency 4 17 108

Poutput 1 13 79

A general purpose NASIC-based architecture was proposed. Relying on a regular array of parametric cells, interconnected by a flexible routing architecture, it enables arbitrary circuit placement and routing and paves the way towards highthroughput nanoscale circuits through its unique pipelined routing architecture. Along with it, a design automation flow was presented, based extensively on standard tools used currently in the reconfigurable architecture field. In the context where the regularity of assembly is one of the principal constraints imposed by the nanoscale technologies, this showed a way of transposing the experience from reconfigurable research to application specific nanoscale circuits. One important trade-off offered by this architecture is the possibility to increase the output rate of the circuit at the expense of higher area and latency, by implementing pipelined designs approaching the max-rate pipeline bound. Trade-off which is impracticable using todays technology for generic applications due to the high area overhead, but which becomes realistic in the context of nanoscale technologies. The exploration of this trade-off is the principal axis of on-going works in the context of R2D NASIC. Based on the promising characteristics presented in this study, some future research directions are exploration of 3D integration techniques for the cluster I/O infrastructure, and system-level integration of R2D NASIC clusters using automated controller generation for inter-cluster communications. ACKNOWLEDGMENT This work was partially supported by an international exchange grant awarded by the Coll`ege Doctoral International (CDI) at European University of Brittany (UEB), France. R EFERENCES [1] D. B. Strukov and K. K. Likharev, “CMOL FPGA: A reconfigurable architecture for hybrid digital circuits with two-terminal nanodevices,” Nanotechnology, vol. 16, pp. 888–900, April 2005. [2] G. S. Snider and R. S. Williams, “Nano/CMOS Architectures Using a Field-Programmable Nanowire Interconnect,” Nanotechnology, vol. 18, no. 3, p. 035204 (11pp), 2007. [Online]. Available: http://stacks.iop.org/0957-4484/18/035204 [3] C. A. Moritz, T. Wang, P. Narayanan, M. Leuchtenburg, Y. Guo, C. Dezan, and M. Bennaser, “Fault-Tolerant Nanoscale Processors on Semiconductor Nanowire Grids,” IEEE Transactions on Circuits and Systems I, special issue on Nanoelectronic Circuits and Nanoarchitectures, november 2007.

[4] P. Narayanan, C. A. Moritz, K. W. Park, and C. O. Chui, “Validating cascading of crossbar circuits with an integrated device-circuit exploration,” Nanoscale Architectures, IEEE International Symposium on, pp. 37–42, 2009. [5] P. Shabadi, A. Khitun, P. Narayanan, M. Bao, I. Koren, K. L. Wang, and C. A. Moritz, “Towards logic functions as the device,” in Proceedings of the 2010 IEEE/ACM International Symposium on Nanoscale Architectures, ser. Nanoarch ’10. Piscataway, NJ, USA: IEEE Press, 2010, pp. 11–16. [Online]. Available: http://portal.acm.org/citation.cfm?id=1835957.1835963 [6] M. Stan, P. Franzon, S. Goldstein, J. Lach, and M. Ziegler, “Molecular electronics: from devices and interconnect to circuits and architecture,” Proceedings of the IEEE, vol. 91, no. 11, pp. 1940 – 1957, Nov. 2003. [7] P. Narayanan, K. Park, C. Chui, and C. Moritz, “Manufacturing patway and associated challenges for nanoscale computational systems,” in 9th IEEE Nanotechnology conference, 2009. [8] L. W. Cotten, “Maximum-rate pipeline systems,” in AFIPS ’69 (Spring): Proceedings of the May 14-16, 1969, spring joint computer conference. New York, NY, USA: ACM, 1969, pp. 581–586. [9] T. Wang, P. Narayanan, and C. Andras Moritz, “Heterogeneous TwoLevel Logic and Its Density and Fault Tolerance Implications in Nanoscale Fabrics,” IEEE Transactions on Nanotechnology, vol. 8, pp. 22–30, Jan. 2009. [10] C. A. Moritz and T. Wang, “Latching on the Wire and Pipelining in Nanoscale Designs,” 3rd Workshop on Non-Silicon Computation (NSC3), ISCA’04, Germany, june 2004. [11] P. Narayanan, M. Leuchtenburg, T. Wang, and C. Moritz, “Cmos control enabled single-type fet nasic,” in Symposium on VLSI, 2008. ISVLSI ’08. IEEE Computer Society Annual, april 2008, pp. 191 –196. [12] L. Lagadec, B. Pottier, and D. Picard, “Toolset for nano-reconfigurable computing,” Microelectronics Journal, vol. 40, no. 4-5, pp. 665 – 672, 2009, european Nano Systems (ENS 2007); International Conference on Superlattices, Nanostructures and Nanodevices (ICSNN 2008). [Online]. Available: http://www.sciencedirect.com/science/article/B6V444VXT49K-1/2/27d93aef994f08d8c5a7e77db31dc553 [13] P. Narayanan, T. Wang, M. Leuchtenburg, and C. Moritz, “Image processing architecture for semiconductor nanowire based fabrics,” in Nanotechnology, 2008. NANO ’08. 8th IEEE Conference on, aug. 2008, pp. 677 –680. [14] P. Narayanan, J. Kina, P. Panchapakeshan, C. O. Chui, and C. A. Moritz, “Integrated device-fabric explorations and noise impact and mitigation in nanoscale fabrics,” to be submitted to ACM Journal on Emerging Technologies in Computing Systems (JETC), 2011. [15] E. Sentovich, K. Singh, L. Lavagno, C. Moon, R. Murgai,

[16]

[17] [18] [19]

[20] [21]

[22] [23]

A. Saldanha, H. Savoj, P. Stephan, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “SIS: A System for Sequential Circuit Synthesis,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/ERL M92/41, 1992. [Online]. Available: http://www.eecs.berkeley.edu/Pubs/TechRpts/1992/2010.html D. Chen, J. Cong, M. Ercegovac, and Z. Huang, “Performance-driven mapping for cpld architectures,” Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 22, no. 10, pp. 1424 – 1431, oct. 2003. L. Lagadec and B. Pottier, “Object-oriented meta tools for reconfigurable architectures,” in Reconfigurable Technology: FPGAs for Computing and Applications II, SPIE Proceedings 4212, 2000. V. Betz, J. Rose, and A. Marquardt, Eds., Architecture and CAD for Deep-Submicron FPGAs. Norwell, MA, USA: Kluwer Academic Publishers, 1999. L. McMurchie and C. Ebeling, “Pathfinder: A negotiation-based performance-driven router for fpgas,” in Field-Programmable Gate Arrays, 1995. FPGA ’95. Proceedings of the Third International ACM Symposium on, 1995, pp. 111 – 117. M. Holland and S. Hauck, “Automatic creation of domain-specific reconfigurable cplds for soc,” Field-Programmable Custom Computing Machines, Annual IEEE Symposium on, vol. 0, pp. 289–290, 2005. I. Kuon, R. Tessier, and J. Rose, “Fpga architecture: Survey and challenges,” Found. Trends Electron. Des. Autom., vol. 2, pp. 135–253, February 2008. [Online]. Available: http://portal.acm.org/citation.cfm?id=1454695.1454696 International Technology Roadmap for Semiconductors, “[online],” http://public.itrs.net/, 2010. S. Yang, “Logic Synthesis and Optimization Benchmarks User Guide, Version 3.0,” MCNC Technical Report, Tech. Rep., January 1991.

[24] J. E. Stine, I. Castellanos, M. Wood, J. Henson, F. Love, W. R. Davis, P. D. Franzon, M. Bucher, S. Basavarajaiah, J. Oh, and R. Jenkal, “Freepdk: An open-source variation-aware design kit,” Microelectronics Systems Education, IEEE International Conference on/Multimedia Software Engineering, International Symposium on, vol. 0, pp. 173–174, 2007. [25] A. Sharma, C. Ebeling, and S. Hauck, “Piperoute: a pipelining-aware router for fpgas,” in Proceedings of the 2003 ACM/SIGDA eleventh international symposium on Field programmable gate arrays, ser. FPGA ’03. New York, NY, USA: ACM, 2003, pp. 68–77. [Online]. Available: http://doi.acm.org/10.1145/611817.611829 [26] S. Li and C. Ebeling, “Quickroute: a fast routing algorithm for pipelined architectures,” in Field-Programmable Technology, 2004. Proceedings. 2004 IEEE International Conference on, dec. 2004, pp. 73 – 80.

Regular 2D NASIC-based Architecture and Design ...

on its pipelined routing architecture, which paves the way towards high-throughput ... NASIC (Nanoscale Application Specific Integrated Circuits). [3] is a nanoscale fabric ..... motivation of the development of pipeline-aware placers and routers.

2MB Sizes 5 Downloads 229 Views

Recommend Documents

Hybrid Memory Architecture for Regular Expression ...
Abstract. Regular expression matching has been widely used in. Network Intrusion Detection Systems due to its strong expressive power and flexibility. To match ...

Hierarchical State Machine Architecture for Regular ...
hierarchical state machine architecture which can significantly reduce the memory ... 7, and 8 belong to the original state machine of “.*MN.*PQ”. Figure 2: ...

Architecture patterns for safe design
We have been inspired by computer science studies where design patterns have been introduced to ease software development process by allowing the reuse ...

Emergent Architecture Design (draft) - GitHub
May 15, 2014 - the majority of the smartphone market is Android and iOS, approximately 90% of the market[1], we decided to start by implementing our game ...

Innovation in Computational Architecture and Design (PDF Download ...
a given price, will result in a very large increase in demand. ... of the rate at which instructions, or particular classes of instructions (operations) are ... decision time and complexity dominate the data arrangement time. .... This is done by ana

pdf-1866\southwestern-ornamentation-and-design-the-architecture ...
... one of the apps below to open or edit this item. pdf-1866\southwestern-ornamentation-and-design-the-architecture-of-john-gaw-meem-by-anne-taylor.pdf.

Read PDF Sketching for Architecture and Interior Design
Online PDF Sketching for Architecture and Interior Design, Read PDF Sketching for ... adept at creating computer imagery, but often lack confidence in their freehand sketching skills, or do not ... Architects' Data · The Architecture Reference & Spec

READ ONLINE Digital Design and Computer Architecture: ARM Edition
{Free Online|ebook pdf|AUDIO. Online PDF Digital Design and Computer Architecture: ARM Edition, Read PDF Digital Design and Computer Architecture: ARM ...

product architecture, modularity and product design: a ...
initial design, and its future development potential, but also for outsourcing possibilities. For example, the ... standards in PCs, permits organisations such as Dell to outsource virtually all sub- assembly manufacturing. ... Levin, 1987; Kauffman