Instruction-Level Test Methodology for CPU Core Self ...

Viewer
Transcript

Instruction-Level Test Methodology for CPU Core Self-Testing SAEED SHAMSHIRI, HADI ESMAEILZADEH, and ZAINALABDEIN NAVABI University of Tehran

TIS is an instruction-level methodology for processor core self-testing that enhances instruction set of a CPU with test instructions. Since the functionality of test instructions is the same as the NOP instruction, NOP instructions can be replaced with test instructions. Online testing can be accomplished without any performance penalty. TIS tests different parts of the processor and detects stuck-at faults. This method can be employed in offline and online testing of single-cycle, multicycle and pipelined processors. But, TIS is more appropriate for online testing of pipelined architectures in which NOP instructions are frequently executed because of data, control and structural hazards. Running test instructions instead of these NOP instructions, TIS utilizes the time that is otherwise wasted by NOPs. In this article, two different implementations of TIS are presented. One implementation employs a dedicated hardware modules for test vector generation, while the other is a software-based approach that reads test vectors from memory. These two approaches are implemented on a pipelined processor core and their area overheads are compared. To demonstrate the appropriateness of the TIS test technique, several programs are executed and fault coverage results are presented. Categories and Subject Descriptors: B.5.3 [Register-Transfer-Level Implementation]: Reliability and Testing—built-in tests General Terms: Verification, Design, Performance Additional Key Words and Phrases: Instruction level testing, CPU core testing, software-based self testing, test instruction set, BIST, pipelined processor

1. INTRODUCTION In many SoCs, embedded processor cores are widely used because they offer several advantages including design reuse and portability over ASICs. Corebased design allows processors to be used in a variety of applications in a costeffective manner. On the other hand, design based on processor cores presents Authors’ address: School of Electrical and Computer Engineering, University of Tehran, North Kargar Ave., Tehran 14395-515, Iran; email: {shamshiri, hadi}@cad.ece.ut.ac.ir; [email protected]. edu. Based on “TIS: An Instruction-Level Test Methodology for CPU Core Software-Based Self-Testing” by Saeed Shamshiri, Hadi Esmaeilzadeh, and Zainalabdein Navabi, which appeared in the Proc 2004 ceedings of the 2004 IEEE High Level Design Validation and Test Workshop (HLDVT 2004). IEEE. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected]. C 2005 ACM 1084-4309/05/1000-0673 $5.00 ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005, Pages 673–689.

674

•

S. Shamshiri et al.

new challenges for testing since access to these embedded processors becomes further removed from the pins of the chip [Murray and Hayes 1996]. Self-testing for high-speed circuits has clear advantages over testing through external testers. The tester’s OTA (Overall Timing Accuracy) does not increase as fast as the on-chip clock speed and this implies more yield loss [Semiconductor Industry Association 1997]. One approach for self-testing realization is running a test program on the processor which tests it by its own instructions. This pure software self-testing method has some disadvantages including low fault coverage, large program size which cannot fit in an on-chip memory, and long test time [Lai and Cheng 2001]. For self-testing of a microprocessor for either stuck-at or delay faults by test program generation, several approaches have been proposed [Lai et al. 2000; Chen and Dey 2000; Brahme and Abraham 1984; Distante and Piuri 1986; Shen and Abraham 1998; Batcher and Papachristou 1999; Lee and Patel 1994]. Another proposed method is an instruction level DFT that adds instructions for improving the controllability and observability of processor cores for software-based self-testing [Lai and Cheng 2001]. In our proposed method [Shamshiri et al. 2004a, 2004b, 2004c], which we refer to it as TIS, test instructions are added and employed to test a processor core. This instruction level test method can be used for both online and offline testing. In the offline testing phase, the only instructions that run on the processor are test instructions. Therefore, all combinational and sequential parts of the processor can be tested with a high level of fault coverage. In the online testing phase, test instructions are inserted in the machine code by the assembler or compiler instead of NOP instructions. This way, combinational parts and some sequential parts of the processor will be tested while the processor performs its normal operation without any performance penalty. Our proposed method follows a unique approach for online and offline testing of processor cores. For testing the processor core, our method utilizes all the time that is wasted due to processor stalls after data, control and structural hazards or cache misses. The TIS method is more appropriate for online testing of pipelined architectures which are widely used in SoC implementations of embedded systems. In a pipelined architecture, one or many NOP instructions are inserted as stalls between instructions which are data or control dependent. For TIS realization, the hardware/software design space of the TIS methodology is explored and two different possible hardware/software partitionings are chosen for its implementation. Hardware-oriented approach is based on the BIST architecture and employs LFSRs and MISRs for test vectors generation and results compression respectively. Software-oriented approach decreases the hardware overhead of the former method by removing LFSRs. In this approach, test vectors are generated using a pseudo random pattern generator software that is embedded in the processor assembler. These random test vectors appended to the test instructions. During a program execution, test vectors are fetched as immediate data along with the test instruction. The rest of the article is organized as follows. Section 2 illustrates concepts of the proposed instruction level testing. Section 3 discusses considerations ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

675

Fig. 1. TIS implementation with separate LFSRs and parallel MISRs for each component. When test instruction passes through the pipe stage, the BIST controller puts all the combinational units in that stage in test mode.

about replacing NOP instructions by test instructions. Section 4 and Section 5 explain hardware- and software-oriented implementations of TIS respectively, and discuss the implementation framework and challenges. Finally, experimental results are presented in Section 6, and the article concludes in Section 7. 2. THE PROPOSED INSTRUCTION LEVEL TESTING TECHNIQUE TIS is an instruction level test technique for processor core self-testing, which utilizes NOP instructions for online testing. As a common practice, NOP instructions are inserted to provide stalls for resolving data and control dependencies in pipelined processor architectures. These instructions degrade the performance of the processor. Our proposed technique employs these instructions to test the processor’s various units, while it performs its normal task (see Figure 1). In this method, NOP instructions are replaced with test instructions. The functionality of these instructions is the same as the NOP instruction and this replacement has no effect on the currently running program’s functionality and performance. On the other hand, the TIS method can be used for at-speed offline testing. In this situation, all the instructions are test instructions. TIS supports both deterministic and random test approaches. In hardwareoriented implementation, when TIS is used as a deterministic test method, test instructions load test vectors from a specific internal test memory and apply ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

676

•

S. Shamshiri et al.

them to different combinational parts of the processor under test. On the other hand, in random test mode, test vectors are generated using LFSRs (Linear Feedback Shift Register). In software-oriented implementation, randomness of test vectors is dependent on the processor’s assembler. In the random test approach, the assembler generates test vectors using a pseudo random pattern generator software, while in the deterministic test approach, assembler appends the determined test vectors to the test instructions. For both approaches and implementations, test results are collected in MISRs (Multiple Input Signature Register). 3. TEST INSTRUCTION INSTEAD OF STALL In ordinary multicycle architectures, the NOP instruction is used for inserting a delay in program execution. In the pipelined architectures, the NOP instruction is inserted for hazard elimination in addition to delay generation. There are three types of hazards; structural, data and control. The structural hazard may occur when there are not enough hardware resources for execution of consecutive instructions. In processors with simple architectures, this hazard is usually eliminated in the design phase; but in architectures that use more than one functional unit for instruction level parallelism this kind of hazard can occur [Patterson and Hennessy 2003]. A data hazard occurs while processor executes data dependent instructions and there is not enough latency between these instructions. Two instructions are data dependent when the second instruction requires the result of the first one to begin its execution. A common solution for preventing data hazard is using a forwarding unit. The forwarding unit detects dependencies and forwards the required data from the running instruction to the dependent instructions. In some cases, it is impossible to forward the result because it may not be ready. In this situation, using a NOP instruction is inevitable. In the most common pipelined architectures, this situation happens after memory loads that are followed by instructions that depend on the load result. The last type of hazard is the control hazard that occurs when a branch prediction is mistaken or, in general, when the system has no mechanism for branch prediction. There are two mechanisms for handling the miss prediction. The first mechanism is flushing the pipe after the miss prediction until the jump target instruction. Generally, flush mechanisms are not cost effective. A better solution is filling the pipe after the jump instruction with specific number of NOPs. This mechanism is called delayed jump mechanism and is used widely in DSP processors [Patterson and Hennessy 2003]. In most processor architectures, conditional jumps need more NOP instructions than unconditional jumps, because validation of the branch prediction takes more time. Another situation that usually forces a processor to push stalls between instructions is in cache misses. Some processor architectures attempts to solve the problem of cache miss by the out-of-order execution approach. This approach ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

677

decreases the cache miss penalty and partially solves the problem, but it does not solve the problem completely. When an instruction needs the result of another instruction that encounters a cache miss, the processor must suspend it. When the processor freezes an instruction execution because of structural, data or control hazards or cache misses, one or many processor cycles are wasted by stalls or NOPs. TIS utilizes these wasted cycles by running test instructions. All logical units can be tested using one or more test instructions in their inactive periods that are due to hazards or cache misses. 4. HARDWARE-ORIENTED IMPLEMENTATION 4.1 Implementation Framework We have implemented our method on Pipelined SAYEH (PAYEH) processor. SAYEH [Navabi 2004] is a multicycle RISC processor with 16-bit data and 16-bit address buses. The register file of SAYEH is composed of 16 windows. Each window contains four 16-bit registers R0, R1, R2 and R3. At any time, one window is active and its offset is held in Window Pointer (WP). PAYEH is a pipelined version of SAYEH with a similar instruction set. Table IV shows the instruction set of PAYEH. PAYEH processor has five pipe stages illustrated in Figure 9. When instructions pass through the pipe stages, they may suffer because of data dependencies. To solve the problem of data dependency, PAYEH architecture uses a forwarding unit. This forwarding unit can resolve all dependencies by forwarding the required values from the next pipe stages to the previous ones. There is an exception on the LDA (load addressed) instruction. When the next instruction needs the result of a load operation, because the value is not loaded, the forwarding unit cannot forward the required data. In this situation, inserting one stall after the LDA instruction is inevitable. In the PAYEH processor, instructions that generate control hazards are BRZ (branch if zero), BRC (branch if carry), JPA (jump addressed) and JPR (jump relative). BRZ, BRC and JPA need two stalls because their jump status will be determined in the EXE stage. On the other hand, JPR needs one stall since its jump status is determined in the ID stage. PAYEH architecture does not generate any structural dependency; so all of the instructions that need stall are LDA, BRZ, BRC, JPA and JPR. The PAYEH processor hardware is not capable of detecting dependencies and injecting stalls between instructions during the run time. Therefore, these stalls should be injected before the run time by the programmer, compiler, or assembler. PAYEH assembler seeks the next instruction after the LDA instruction for detecting dependency. If there is a dependency between LDA and the instruction that follows it, the assembler inserts a NOP instruction after LDA. In the case of control dependencies, there is no need to detect the dependency; the assembler just puts the right number of NOPs after the BRZ, BRC, JPA and JPR instructions. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

678

•

S. Shamshiri et al.

After the exploration of the PAYEH processor as a framework for TIS implementation, in the next section, tradeoffs and challenges of the TIS implementation will be discussed. 4.2 Trade-Offs and Challenges Our method is based on introducing a test instruction that can test combinational and some sequential parts of a processor. When the processor loads a test instruction instead of NOP, the test instruction activates all combinational parts of the processor in its path by injecting a test vector for each part and collecting the results. The test instruction overwrites some bits of the internal pipe registers and validates their correctness; but it does not write in the register file or in other bits of the internal registers, because registers hold the state of the processor which must be kept unchanged in the run time of test instructions. Note that test instructions are functionally equivalent to the NOP instruction. We select a BIST (Built-In Self Test) strategy for test vector generation and collection of the results. This means that a LFSR and a MISR are inserted before and after each combinational component for injecting random test vectors and holing the output results of the components respectively. The MISR captures the results of the components and compresses to a short signature. This signature is then compared with the expected signature to validate the component. To reduce the cost of hardware overhead, all components can work with the same LFSR. For example, in our case study, PAYEH, a 32-bit LFSR is sufficient for feeding test vectors to all combinational components including adder, ALU, control unit and branch unit. Furthermore, all of these components can be tested using the same random input data. However, for a better fault coverage it is better to use a dedicated LFSR for each component, with a polynomial and starting seed that are tuned to cover most of the predetermined test vectors of that component. Choosing the number of MISRs regarding hardware overhead reduction must also be considered. Each component in the same pipe stage should have a dedicated MISR since they all work simultaneously during the same clock period. For example, in PAYEH two MISRs are required in the ID stage; one for the adder and one for the control unit. There is a trade-off between the number of MISRs in one pipe stage and the number of test instructions. For example, by introducing two test instructions, the adder and the control unit can put their results into the same MISR. When the first test instruction executes and passes through the ID stage, the adder enters to the test mode and the MISR collects its results. When the second test instruction arrives at the ID stage, the control unit enters into the test mode and the MISR collects the results of the control unit. Using two test instructions can solve the resource conflict by distributing it in the time domain. In other words, there is a time (efficiency) and space (area or cost) trade-off. Increasing the number of test instructions decreases the required hardware resources and hence decreases the total cost, but increases the test time and hence degrades the test performance. Therefore, to reduce the hardware cost of multiple MISRs ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

679

to one MISR per each pipe stage, the number of required test instructions must be the same as the maximum number of combinational units in a pipe stage. Since all pipe stages may run the test instruction at the same time, different pipe stages require separate MISRs. Our proposed method collects the result of the unit-under-test in one pipe stage. This means that MISRs must perform their tasks in one clock cycle. Therefore a parallel implementation of a MISR is employed. As an alternative to using parallel MISRs, serial MISRs can be used that reduce the test performance. Using a serial MISR, the parallel result of the unit-under-test must be captured by a shift register that feeds its corresponding MISR in the next clock cycles. During this time, the MISR cannot be used for any other purpose. Specifically, this means that when a test instruction is issued, no other test instruction can be issued until the first one completes. So, if the MISR length is 16, there must be at least 16 cycles between two consecutive test instructions that use this MISR. 4.3 Realization In pipeline architectures like PAYEH, the processor controller is a combinational unit that uses some slices of the pipe registers for holding its state. Hence, our proposed test instruction can also test the controller without affecting the state of the system. To test the control unit, the BIST controller detects the test instruction via its opcode and then puts the control unit in the test mode. Combinational parts of PAYEH are its control unit and a 16-bit adder in the ID stage, and a 16-bit ALU and a branch unit for branch evaluation in the EXE stage. In this work, we used separate LFSRs and parallel MISRs for each component (see Figure 1). With parallel MISR implementation, we need four MISRs for these combinational parts. Employing four MISRs, one test instruction is sufficient. This test instruction is called TST that has the same functionality as the NOP instruction but tests all combinational components of PAYEH (see Figure 2). 5. SOFTWARE-ORIENTED IMPLEMENTATION For software-oriented implementation, the implementation framework is the PAYEH processor which is the same as that used for hardware-oriented implementation. Now feasibilities and challenges of this kind of implementation will be discussed. 5.1 Trade-Offs and Challenges In the software-oriented implementation of TIS, the assembler inserts test vectors (random or deterministic) with the test instruction opcode. These test vectors are fetched from the memory in the run time immediately after fetching the test instruction opcode. Then they are applied to different combinational parts of the processor and the test results are collected and compressed with MISRs. As explained before, PAYEH is a 16-bit processor and all of its instructions are 16 or 8 bits long. This 16-bit space may be insufficient for making a test ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

680

•

S. Shamshiri et al.

Fig. 2. Execution of test instructions (TSTs) instead of NOPs in PAYEH processor. (a) A simple program that calculates power function after replacing all NOPs with TSTs. (b) Passing TST instructions from pipe stages. When TST passes through the ID stage (clock cycles 2, 3 and 5), a control unit and an adder enter the test mode. Furthermore, when TST passes through the EXE stage (clock cycles 3, 4 and 6), the branch unit and ALU enter the test mode.

instruction. For example a test instruction that is to test the 16-bit ALU of PAYEH, requires 16 bits for each ALU input and 4 bits for the instruction opcode. The 36 bits of data require three memory words. To load and execute such an instruction, three clock cycles are required. Therefore, software-oriented implementation of TIS, needs a controller to handle the running of these multicycle test instructions. In software-oriented implementation, a challenge arises in making a decision on the number of test instructions. When a single test instruction is designed to test all combinational parts of the processor, its length will be very long and several clock cycles are required for fetching it from the instruction memory because it carries test vectors as immediate data. This degrades the processor performance in online testing and is too long as compared with normal PAYEH instructions and normal stalls. In this case study, test instructions that are one or two words long are preferred, since PAYEH instructions, need at most two stalls. Therefore, it is better to define a separate test instruction for each combinational unit of PAYEH. The control unit, the branch unit, the 16-bit adder, and the 16-bit ALU are four combinational units of PAYEH that can be tested using the TIS methodology. For the test instruction dedicated to test the control unit, 16-bit space is sufficient, since the control unit does not have more than 12 inputs. The situation is the same for the branch unit, but different for the adder and the ALU. In the latter two cases, the test instruction must be at least 36 bits containing opcode bits and two 16-bit immediate test vectors. Therefore, each of these test instructions dedicated to test the functional units, need three words of instruction memory and three memory cycles to execute them. Replacing NOP instruction with these three-cycle instructions affects the performance of the processor. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

681

To reduce instructions into 2 words, a 36-bit instruction must be reduced by 4 bits. This gives 28 bits for the test data. There are two alternatives to handle the other 4 bits. The first solution is using a 4-bit LFSR. This LFSR generates a 4-bit random pattern as part of one input test vector of the 16-bit functional unit. The 28 bits of the required 32-bit test vector is part of the instruction and the remaining part is obtained from the LFSR. This way, the test instruction fits into two memory words and requires two clock cycles for its execution. In the PAYEH processor, this instruction can replace two consecutive NOP instructions that come after BRZ, BRC and JPA without any performance penalty. This solution has some hardware overhead. The second solution is sharing two 16-bit test vectors in four bits. For this solution, a 4-bit overlap is shared by two 16-bit data. In this work, the latter approach is implemented because of its lower hardware overhead. This method cannot be used for deterministic test data. Capturing test results of different combinational units can be done with only one MISR because at any given time only one combinational unit is being tested and no conflict may occur for using this single MISR. Since one MISR captures the results of more than one unit, the assembler must follow a predetermined order for inserting the test instructions in the machine code. Considering the sequential parts of processor in addition to the combinational parts, the test results are also written to the corresponding part of the pipe register in the next clock cycle. Then, for validating the register, its output is compared with its input by some xor gates. 5.2 Realization Four different test instructions are defined to test four combinational units of PAYEH. These test instructions are TST1, TST2, TST3 and TST4. TST1 and TST2 are responsible for testing the control unit and the branch unit respectively and they fit into one memory word and their results are captures with a shared MISR. TST3 and TST4 are responsible for testing the adder and the ALU respectively and their results are captured with another shared MISR. As mentioned, by overlapping two test vectors in some bits, these test instructions fit in two memory words and these test instructions can be inserted instead of two consecutive NOPs by the processor assembler. This software-oriented approach has less hardware overhead. The only requirements are two parallel MISRs, a test controller and some additional discrete gates. The hardware cost of software-oriented implementation is lower than that of the hardware-oriented implementation. On the other hand, the test time is longer by a factor of six because in the hardware-oriented implementation a single test instruction tests all parts of a processor in one clock cycle, while in the software-oriented implementation four test instructions must be executed in six clock cycles. Figure 3 shows the proposed architecture for software-based TIS realization. Comparing this figure to the architecture of Figure 1 demonstrates the reduction in hardware resources. Figure 4(a) shows a simple program in PAYEH assembly language. In this program all stalls are replaced with test instructions. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

682

•

S. Shamshiri et al.

Fig. 3. Software-based TIS implementation with two parallel MISRs.

Fig. 4. Execution of test instructions instead of NOP in PAYEH processor using the software-based implementation. (a) A simple program that calculates power function after replacing all NOPs with test instructions. (b) Passing TST instructions from pipe stages. When TST3 passes through the ID stage (clock cycles 2 and 3), control unit enters the test mode and when it passes through the EXE stage (clock cycles 3 and 4), an adder enters the test mode. When TST1 passes through the ID stage (clock cycles 5), the branch unit enters the test mode and when it passes through the EXE stage (clock cycles 6), an ALU enters the test mode. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

683

Fig. 5. Calculating R2 = R0 R1 while R0 and R1 are loaded from the 0 and 1 data memory locations respectively. (a) Hardware-oriented implementation. (b) Software-oriented implementation.

Fig. 6. Calculating R1 = R0! while R0 is loaded from the 0 location of the data memory. (a) Hardware-oriented implementation. (b) Software-oriented implementation.

Figure 4(b) depicts the execution of this program, and how test instructions pass through the pipe stages and put different components of the processor in the test mode. 6. EXPERIMENTAL RESULTS To demonstrate the results of TIS, several experiments have been performed. The first objective is illustrating the role of test instruction in online testing of the processor and the second objective is fault coverage measurement of the method. To achieve the first objective, we have used several programs. These programs are as follows: Power. This program calculates ab for natural numbers a and b (see Figure 5). Two stalls after BRZ and one stall after JPR instruction are filled with test instructions. Factorial. This program calculates a! (see Figure 6). Two stalls after BRZ and one stall after JPR instruction are filled with test instructions. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

684

•

S. Shamshiri et al.

Fig. 7. Calculating the R0th statement of the Fibonacci series while R0 is loaded from the 0 location of the data memory. (a) Hardware-oriented implementation. (b) Software-oriented implementation.

Fibonacci. This program calculates the nth statement of the Fibonacci series (see Figure 7). Two stalls after each BRZ and one stall after each JPR instruction are filled with test instructions. Vector Addition. This program adds two vectors from the memory and stores the results into the memory (see Figure 8). Two stalls after BRZ and one stall after JPR and one stall after dependent LDA instruction are filled with test instructions. In the hardware-oriented implementation, all stalls are filled with TST instructions but in the software-oriented implementation the assembler fills the two consecutive stalls after BRZ, BRC and JPA with TST3 or TST4 and it fills the single stall after the JPR and LDA instructions with TST1 or TST2 instructions. In the software-oriented implementation, it takes six clock cycles for all CPU parts to be exposed to a test vector, while in the hardware implementation, this takes only one clock cycle. To illustrating the effect of each program in online testing of a processor, a parameter is defined which is called test period. The test period is the time it takes to test the whole processor with one test vector during the normal operation of a program. The test period depends on the program context and can be calculated as follows: ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

685

Fig. 8. Adding two vectors from the data memory and saving the results. The size of vectors is specified in the R0 register from the second window of the register file. (a) Hardware-oriented implementation. (b) Software-oriented implementation. Table I. The Test Period of the Benchmark Programs Executed on PAYEH Processor

The test period can be calculated based on the loop bodies of the benchmark programs. Table I summarizes this parameter for all benchmark programs in both types of implementation. Test period shows the relation between the online testing time and the offline testing time of each separate component.

Since jump and branch instructions occur frequently, by utilizing their stalls for online testing of the processor core, a high rate online test can be achieved without any performance penalty. In the fault coverage measurement process, the fault coverage of each combinational component is measured separately. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

686

•

S. Shamshiri et al.

Fig. 9. The data path and controller of PAYEH with its five pipe stages. This processor has been designed and implemented by Saeed Shamshiri.

ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

687

Table II. Fault Coverage of Each Combinational Component after Testing with 8192 Randomly Generated Test Vectors

Table III. The Area Overhead of Both TIS Implementations in PAYEH Processor

The method used for fault coverage measurement is based on synthesizing the design into a faulty library. In the faulty library, each gate reports its detected faults during the test procedure [Zolfy et al. 2001]. Table II shows fault coverage achieved for each component in the both kinds of implementation after testing it with 8192 test vectors. To achieve a high fault coverage for the complete processor, sequential parts of the processor must be tested. Testing some sequential parts is feasible during the normal mode but testing the other sequential parts is feasible only in the offline mode. This is possible when the processor is in the test mode and there is nothing to be kept in the registers as the state of the system. Internal memory modules (data cache, instruction cache and register file) also can be tested using one of the memory testing methods in offline mode. Both hardware- and software-oriented implementations of the TIS method on the PAYEH processor with four LFSRs and four parallel MISRs for hardwareoriented and two parallel MISRs for software-oriented approach has been synthesized with a 0.5µ ASIC technology to demonstrate the hardware overhead of the method. Table III shows the post-synthesis hardware overhead of the both implementations. 7. CONCLUSION AND FUTURE WORK An instruction level test methodology for embedded processor core self-testing was presented. The proposed method, which we referred to as TIS, adds some test instructions to enable the processor to test its various parts. The implementation challenges of TIS in both hardware-oriented and software-oriented implementations were explained and real implementations of the method, on the ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

688

•

S. Shamshiri et al. Table IV. PAYEH Instruction Set

PAYEH processor were presented. Some sample programs are used to demonstrate the method’s appropriateness for at speed online testing of pipelined processors. For each of these programs, the fault coverage of each component was measured. These measurements show that this method can achieve a desirable level of fault coverage for at speed online and offline self-testing. The hardware overhead of both implementations were measured and compared together. Applying this method on some other processors with complicated architectures like VLIW and superscalar architectures are a part of our future steps. REFERENCES BATCHER, K. AND PAPACHRISTOU, C. 1999. Instruction randomization self test for processor cores. In Proceedings of VLSI Test Symposium. 34–40. BRAHME, D. AND ABRAHAM, J. 1984. Functional testing of microprocessors. IEEE Trans. Computers C-33, 6, 475–485. CHEN, L. AND DEY, S. 2000. Defuse: A deterministic functional self-test methodology for processors. In Proceedings of VLSI Test Symposium. 255–262. DISTANTE, F. AND PIURI, V. 1986. Optimum behavioral test procedure for VLSI devices: A simulated annealing approach. In Proceedings of the IEEE International Conference on Computer Design. IEEE Computer Society Press, Los Alamitos, CA, 31–35. LAI, W.-C. AND CHENG, K.-T. T. 2001. Instruction-level DFT for testing processor and IP cores in system-on-a-chip. In Proceedings of Design Automation Conference. IEEE Computer Society Press, Los Alamitos, CA. ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.

Instruction-Level Test Methodology for CPU Core Self-Testing

•

689

LAI, W.-C., KRSTIC, A., AND CHENG, K.-T. 2000. Test program synthesis for path delay faults in microprocessor. In Proceedings of International Test Conference. IEEE Computer Society Press, Los Alamitos, CA, 1080–1089. LEE, J. AND PATEL, J. 1994. Architectural level test generation for microprocessors. IEEE Trans. Comput.-aid. Des. Integ. Circuits Syst. 13, 10 (Oct.), 1288–1300. MURRAY, B. T. AND HAYES, J. P. 1996. Testing ICS, getting to the core of the problem. IEEE Design Test Comput. 29, 11 (Nov.), 32–38. NAVABI, Z. 2004. Digital Design and Implementation with Field Programmable Devices. Kluwer Academic Publisher. PATTERSON, D. A. AND HENNESSY, J. L. 2003. Computer Architecture: A Quantitative Approach, 3rd Edition. Morgan-Kaufmann, San Francisco, CA. SEMICONDUCTOR INDUSTRY ASSOCIATION. 1997. The National Technology Roadmap for Semiconductors. SHAMSHIRI, S., ESMAEILZADEH, H., ALISAFAEE, M., LOTFIKAMRAN, P. AND NAVABI, Z. 2004a. Test instruction set (TIS): An instruction level CPU core self-testing method. In Proceedings of 9th IEEE European Test Symposium (ETS’04) (Corsica, France). IEEE Computer Society Press, Los Alamitos, CA, 15–16. SHAMSHIRI, S., ESMAEILZADEH, H. AND NAVABI, Z. 2004b. Test instruction set (TIS) for high level self-testing of cpu cores. In Proceedings of IEEE 13th Asian Test Symposium (ATS’04) (Kenting Taiwan). IEEE Computer Society Press, Los Alamitos, CA, 158–163. SHAMSHIRI, S., ESMAEILZADEH, H. AND NAVABI, Z. 2004c. TIS: An instruction level test methodology for CPU core software-based self-testing. In Proceedings of IEEE International High Level Design Validation and Test Workshop (HLDVT’04) (Sonoma, CA). IEEE Computer Society Press, Los Alamitos, CA, 25–29. SHEN, J. AND ABRAHAM, J. 1998. Native mode functional test generation for processors with applications to self test and design validation. In Proceedings of International Test Conference. IEEE Computer Society Press, Los Alamitos, CA, 990–999. ZOLFY, M., MIRKHANI, S. AND NAVABI, Z. 2001. SPC-FC: A new method for fault simulation implemented in VHDl. In Proceedings of North Atlantic Test Workshop (NATW’01). 17–21. Received February 2005; revised May 2005; accepted July 2005

ACM Transactions on Design Automation of Electronic Systems, Vol. 10, No. 4, October 2005.