VLSI DESIGN TECHNOLOGY
Crosspoint Switch: A PLD Approach by Jim Donnell, Intel Corp.
device dictates the nuinber of switches that can be designed into a single device. rasable programmable logic devices (EPLDs) combine the gate densities oflow-end gate arrays with the short Configuration 1 development time and low cost of EPROMs. This The first circuit (Figure 1) considered is a digital crosspoint merging ofte~hnologies produces a device with features suited switch with eight inputs and a 3-bit word width. This switch to a wide range of digital applications. In contrast to the long transfers a 3-bit word coming from one of eight sources to a pardevelopment times (and higher costs) for gate arrays, EPLDs ticular output. The number of devices "OR-tied" to each outrequire minimal frontend design time. In just a few hours, put pin determines the number of outputs. Selecting one of eight EPLD designs can be developed, modified and verified. Also, core elements from one EPLD design can be incorporated indata inputs from each of the three channels (AO to A7, BO to B7 and CO to 0), the switch routes that data to a single output (QA, to new designs as quickly as standard software subroutines from QB and QC). Each output can be OR-tied to more than one one program can be modified and used in other programs. The design of a digital crosspoint . switch using an Intel 5CI21 EPLD illustrates these features. Digital Design implemented a crosspoint switch in a gate array last year (see Digital Design, January through March, 1985). Application~s that require a data transfer from one of several inputs to one of several outputs frequently use a digital crosspoint switch. Using the 5Cl2l EPLD, Intel Corp. (Santa Clara, CA) designed three different configurations of a crosspoint switch. Offered in a 40-pin package that provides up to 36 inputs or 24 outputs, the 5CI21 supports up to 28 macrocells (including four buried registers) and 236 product terms (p-terms). Logic density in the 5CI21 is the equivalent of 1,200 usable NAND gates. Maximum power requirements are 100 rnA active and 30 rnA standby with TTL input levels. With CMOS input levels, a 5Cl2l requires 50 rnA active and 3 rnA standby. Two major parameters determine the complexity and configuration of a digital ~ crosspoint switch: the number ofpossible switching locations for each bit (inputs and outputs), and the number of bits Figure 1: Configuration 1 uses a three-channel eighHo~ne multiplexer circuit with latching intransferred in one clock pulse (word puts. Each output can drive multiple, individually selected inputs to complete the digital crosswidth). The availability of I/O pins, point switch. By connecting inputs to the EPLD outputs in an "OR-tied" configuration, with only macrocells and p-terms for a given EPLD one input enabled at any time, the multiplexer circuit becomes a crosspoint switch.
© Intel Corporation, 1986 Reprinted with permission from Digital Design
VLSI DESIGN TECHNOLOGY
three-state input to complete the switch (only one input can be enabled at a time). Three additional control bits (00 to 02) select one of the eight different inputs. AlIlhree channels operate in parallel: Separate input and output clocks allow a high data rate and relax input set-up and hold times. Input data for all three channels, along with the three select bits, are latched by ILE. Oata at the inputs can change state after being latched and data is clocked out of the switch by CLK. Equation 1 shows the Boolean expression for a single channel in the sum-of-products form. (See Thble 1 for all equations.) The Boolean expression for the remaining two channels is similar: the designer need only change the A in the equations toaBore.
Figure 3: Configuration 2 uses a single-bit eight-inputleight·output digital crosspoint switch, Designers can implement this for either optimal package count (see Figure 4) or for optimal speed (see Figure 5),
Timing Analysis The internal delay paths determine the circuit's maximum operating frequency (fmax). In this configuration there is an input delay (Tin), an array delay (Thd), a register delay (Trd) and an output delay (Tod). The fmax is a function of the signals that must senle at the input of the output register before the rising edge ofth~ clock. In this case, signals propagate only through the Input latches and the array. Therefore, the data must be valid at the inputs Tin + Thd just nanoseconds before the rising edge, ofthe internal clock signal (CLK). However, because of the inherent delay of the CLK signal, this reference must be shifted to the rising edge ofthe external clock signal by subtracting the internal clock delay (Tic). The external data set-up time (Thu) is shown in Equation 2. Inverting this time requirement yields the maximum operating frequency. As the output flip-flops are clocked, data propagates through the register to the output pin. With reference to the external c10Sk pin, data becomes valid althe outputs Tic + Trd + Tod nanoseconds after the rising edge ofthe clock. Figure 2 shows the timing requirements for this circuit, including the input latch signal. Using a 5CI2l-50 (50-nsec propagation delay), data can be sent through this switch configuration at 25 Mbits/sec. This transfer rate remains independent ofthe word width. Since one 5CI21 EPLO in this configuration can simultaneously transfer three bits of information, three 5CI21's are required to transfer a byte of data during each clock cycle. This configuration of a digital crosspoint switch uses 86% of the 40 pins, 71 % of the macrocells and II % of the available p-terms in the 5CI2l EPLO.
outputs (QO to '([I). Six control bits' are required for each transfer: three to selectthe input path (DO to 02); three to select the output path (03 to 05). By selecting a single output path and' clocking all output registers simultaneously, deselected outputs are automatically cleared, This is useful for designs where only the most current data is needed. Equation 4 is the common equation to select one of eight input paths, Equations 5to 12 complete the Boolean equations for this example. The previous equations would contain eight product terms if they were written in expanded form. However, by treating SELECTEQ as one signal" each equation ,contains, only one product term. Both optionsare availabl~ in the 5C12I. But, there
I n contrast to the long development times for gate arrays, EPLDs require minimal frontend desigr;l time.
are advantages ,and disadvantages to the two methods. If SELECTEQ is implemented as one signal through a combinational feedback option, one and one-half crosspoint switches can be implemented in one 5CI21 (Figure 4). The trade-offis , faster speed for low chip count. By design, only 18 macrocells in the 5C121 can support eight product terms. On the other hand, selecting the combinational option reduces the p-terms but introduces an additional input mux delay. Figure'4 shows that an input signal must pass through four delays before,reaching the input to the flip-flop. Again, subtracting the input clock delay to shift the reference point yields Configuration 2 Equation 13 for the set-up time, Inverting Tsu gives the maxThe second circuit (Figure 3) also selects one of eight inputs - imum operating frequency. In this configuration, data can be (10 to 17), but this time data is routed to one of eight different clocked through at 12 Mbits/sec, This layout utilizes '!I % of the available pins, 89 % of the available macrocells and \3 %of the product terms.' Six 5CI21s would be required to implement abyte-wide switch with this layout. If the combinational feedback option is not used, there are eight output equations, each containing eight product terms. Assigning these equations to the macrocells that support eight p-terms shows that only a single, one-of-eight select line digital crosspoint switch fits Figure 2: A 4O·nsec internal set-up time (prior to clocking data through the output flip-flop) marks into one 5C12I. Thus, the design requires Configuration 1. Data clocked into all eight input latches at the rising edge of one ILE/CLK cycle eight 5CI21s io complete a byte-wide is selected and clocked out oftheoutput flip.flop on the next rising edge of ILE/CLK. ' DIGITAL DESIGN • ..JULY 1986
inter VLSI DESIGN TECHNOLOGY
parallel transfer. Since the signal paths are identical to Configuration I, the same timing analysis applies here. This layout (Figure 5) utilizes 65% of the pins, 39% of the macrocells and 30% of the p-terms. Though the utilization numbers are lower for this example, the actual available pins and macrocells in the 5Cl2l are higher than initially visible. Since macrocells in the 5Cl2l are organized into groups of four, when one output structure in a macrocell group is defined the other three must be of the same structure. Many times, this results in unused pins being labeled "RESERVED" in the utilization report.
Configuration 3 The final circuit (Figure 6) again uses eight inputs (10 to 17) and eight outputs (QO to ([I), though this time the deselected ouiputs "remember" their previously selected state. With the 5Cl2!'s register feedback option, deselected outputs can hold the last data bit sent to that output New data appears when the output Is selected again. Equations 14 to 22 express the Boolean terms necessary to implement this hold feature in the digital crosspoint switch. Note that each output is now a function of both the present inputs and the previous output (Qnfbk), which implements the registered feedback. Data bits 03, 04 and 05 determine which data bit will pass to the output. Again, the number of p-terms dictates the use of combinational feedback, as in Configuration 2.
Figure4: Configuration 2 features a lowpackagecount layout. Notethat one and one-half switches fit into each 5C121 EPLD. This configuration uses combinatorial feedbacks to simplify the logic equations, thus eliminatIng the requirement for eight product t~rms per output.
This configuration's timing analysis is similar to Configuration 2's combinational feedback analysis, with the exception of a register feedback delay (Trf). Trf is the time that the data is present at the output of the flip-flop to the time that data is available to the array. The total delay associated with the registered feedback consists of the Trd, the Trfand the Tad. Data from the flip-flop output reaches the input in about 50 nsec. The delay associated with data coming from the input pins is the same as that of Configuration 2 with combinational feedback - approximately 83 nsec. Using this as the clock period, there is ample time to implement the register feedback without affecting the cycle time. In this configuration, data could be clocked through at 12 Mbits/sec. Combinational feedback reduces the p-term requirement to two p-terms per equation. This allows one and one-half crosspoint switches to fit into one 5C12!. The design utilizes 64 % of the available pins, 42 % of the macrocells and II % of the product terms. Six devices would be required to implement a bytewide switch. All of the configurations function differently, and no one configuration is optimum for all applications. A designer can customize a device to meet the needs 0;' an application, whether those needs include higher speed or lower chip count. A second device can be quickly developed for a different application, Designers are no longer restrict~d to a single device type that must be adapted to an application with additional logic devices,
.JULY 1986 • DIGITAL DESIGN
VLSI DESIGN TECHNOLOGY
An original design can be developed in an afternoon. Additional devices derived from an original design can be developed in a few hours. Also, the ability to erase an EPLD and reprogram it allows design errors to be corrected immediately. Instead of several weeks delay with gate arrays, a designer using EPLDs can have working silicon devices in one day. Both the flexibility and short design times associated with EPLDs make them a good choice for applications that benefit Figure 6: Configuration 3 shows the use of registered feedback to allow deselected outputs to retain their previously selected data. The logic for a representative channel is shown. As with Configuration 2. this con·
figuration can be optimized for package count or speed.
from custom silicon devices. Today, EPLDs offer designers the densities and configuration flexibility of gate arrays, along with the short development time and cost associated with EPROMs. CD
Figure 5: This circuit (Configuration 2 optimized for speed) combines the multiplexer and demultiplexer functions for each channel in a single array. Since each output equation uses eight product terms, only one switching channel can fit into each 5C121 package.
.:.JULY 19B5 • DIGITAL DEBIGN