Banked Register Files for SMT Processors Jessica H. Tseng and Krste Asanovi´c MIT Computer Science and Artificial Intelligence Laboratory 200 Technology Square, Cambridge, MA 02139 fjhtseng,[email protected] Abstract Multiported register files are a critical component of high-performance superscalar microprocessors. Deeper pipeline speculation and higher instruction-level parallelism (ILP) of current processor designs push a growing requirement on both the number of ports and the number of registers. These increasing demands on register files cause the area of a conventional multiported regfile to grow more than quadratically with issue width [7]. The trend towards simultaneous multithreading (SMT) further increases register count as separate architectural registers are needed for each thread. For example, the proposed eight-issue Alpha 21464 design had a regfile that occupied over five times the area of the 64 KB primary data cache [3]. Hence, we examine the designs of banked multiported register files that consist of multiple interleaved banks of fewer ported register cells to reduce power, area, and access time. In this talk, we will present results that extend our previous work [4] on banked register files to SMT processors. Banked register files designs have been shown to provide sufficient bandwidth for a superscalar machine, but previous proposed designs had complex control structures that would likely limit cycle time and add to design complexity [6, 1, 2]. We present a banked multiported regfile design together with a much simpler and faster control logic suitable for a deeply pipelined high-frequency superscalar processor [4]. Our control scheme does not place any register bank arbitration in the critical wakeup-select loop but instead speculatively issues potentially conflicting instructions. Bank conflicts occurs when too many instructions are trying to read or write the same bank at the same cycle. If any conflicts are found after issue, a pipelined recovery scheme quickly repairs the issue window and reissues conflicting instructions. In contrast to previous work [6, 1, 2], all conflicts are detected and resolved in one pipeline stage such that no write buffering or pipeline stalls are required. The extra pipeline stage used for port arbitration and the possibility of bank conflicts in our design can impact processor performance. The additional pipeline stage causes an increase in branch misprediction latency while instances of a bank conflict add penalty cycles to repair the pipeline and delays the issuing of dependent instructions. Fortunately, the number of bank conflicts can be kept within a few percent of total instructions if we can remove the correlation between accesses to the same bank. It is observed that instructions which become ready in the same cycle tend to be issued together and some architectural registers are used more frequently than others. To avoid unnecessary read port contention and to remove the correlation between the same-cycle 1

read accesses, we implement two optimization techniques–bypass-skip and read-sharing. Bypassskip avoids competing for register read ports for operands that will be sourced by the bypass network. Read-sharing fetches only once per register value from the regfile and shares it among the instructions that request the value. To evaluate our work, we modified the SMTSIM [5] simulator to keep track of a unified physical register file organized into banks for both superscalar and SMT processor. One might expect that extending banked register file schemes to SMT processors would degrade performance more than in superscalar microprocessors because of SMT’s higher IPC and register counts. Our initial data surprisingly reveals that the banked register files work better for the SMT processors than the single thread superscalar processors. This result is due to the SMT’s ability to hide the branch misprediction penalty, as when one thread experiences a misprediction, other threads can continue to execute instructions. For an eight-issue SMT processor with 512 physical registers, by adopting a 16-banked register file design with four read ports and two write ports per bank, we can reduce regfile area by a factor of seven over a monolithic design while decreasing IPC by less than 2%. The ability to reduce area significantly with minimal performance degradation should make this approach attractive for chip multi-processors which are aiming to provide the highest possible thread throughput at low cost.

References [1] R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi. Reducing the complexity of the register file in dynamic superscalar processors. In MICRO-34, December 2001. [2] I. Park, M. D. Powell, and T. N. Vijaykumar. Reducing register ports for higher speed and lower energy. In MICRO-35, Istanbul, Turkey, November 2002. [3] R. P. Preston et al. Design of an 8-wide superscalar RISC microprocessor with simultaneous multithreading. In ISSCC Digest and Visuals Supplement, February 2002. [4] J. H. Tseng and K. Asanovi´c. Banked multiported register files for high-frequency superscalar microprocessors. In ISCA-30, June 2003. [5] D. M. Tullsen, S. Eggers, and H. M. Levy. Simultaneous multithreading: Maximizing on-chip parallelism. In ISCA-22, 1995. [6] S. Wallace and N. Bagherzadeh. A scalable register file architecture for dynamically scheduled processors. In Proc. PACT, October 1996. [7] V. Zyuban and P. Kogge. The energy complexity of register files. In Proceedings 1998 International Symposium on Low Power Electronics and Design, pages 305–310, August 1998.

2

Banked Register Files for SMT Processors

together with a much simpler and faster control logic suitable for a deeply pipelined high-frequency superscalar processor [4]. Our control scheme does not place any register bank arbitration in the critical wakeup-select loop but instead speculatively issues potentially conflicting instructions. Bank conflicts occurs when too ...

10KB Sizes 0 Downloads 254 Views

Recommend Documents

Banked Multiported Register Files for High-Frequency ...
MIT Laboratory for Computer Science, 200 Technology Square, Cambridge, MA 02139. {jhtseng ... a banked register file with much simpler and faster control .... tion window in the next wakeup phase to allow back-to- ..... conflicts in a system with eig

Implementing Register Files for High-Performance ... - CiteSeerX
Abstract— 3D integration is a new technology that will greatly increase transistor density ... improvement with a simultaneous energy reduction of 58.5%, while a four-die version ..... Figure 3(d) shows an alternative implementation of a 2-die ...

SMT
Aug 21, 2017 - processes and a world class facility. The company is ... medical devices, Internet of things, optical communication, automotive electronics and ...

Banked Microarchitectures for Complexity-Effective ...
May 5, 2006 - Department of Electrical Engineering and Computer Science ...... issuing other instructions that do not have an outstanding data dependency into the pipeline. ...... issue, a pipelined recovery scheme quickly repairs the issue ...

REGISTER
Iowa STEM School+Business. Innovation Conference. WHEN: Wednesday, June 29, 2016. 9:00 AM to 3:30 PM. WHERE: Sheraton West Des Moines Hotel, Des ...

REGISTER
Iowa STEM School+Business. Innovation Conference. WHEN: Wednesday, June 29, 2016. 9:00 AM to 3:30 PM. WHERE: Sheraton West Des Moines Hotel, Des ...

A Speculative Control Scheme for an Energy-Efficient Banked ... - Scale
energy, and delay numbers from detailed circuit layouts in. Section 3. We present ..... have explored alternative designs for implementing a large ..... Technology.

Dr (Smt) -
Jul 29, 2013 - I am to further inform that, Awards are proposed to be given to the deserving teachers & Teacher educators working under the categories at a State .... for recommendation the teachers for state Awards. 2. Criteria to be followed for se

NILAI SMT Genap FL (smt 6_2012) 2015 mtbs.pdf
Page 2 of 10. PROGRAMACIÓ TRIMESTRAL Escola del Mar, curs 2017-18. 5è. 2. SEGON TRIMESTRE. Numeració i càlcul. - Nombres decimals: part sencera i ...

SMT-4032A_Datasheet.pdf
HDMI, DVI, VGA, and Component (CVBS Common) video input ... Stand (WxHxD) ... information / specification can be found at www.samsungsecurity.com.

Late Smt. Leelabai.pdf
Facts in brief are that the assessee was head of the family after the death of ... her unawareness about source of investment made in the said property and.

Reactive DVFS Control for Multicore Processors - GitHub
quency domains of multicore processors at every stage. First, it is only replicated once ..... design/processor/manuals/253668.pdf. [3] AMD, “AMD Cool'n'Quiet.

VLIW Processors
Benefits of VLIW e VLIW design ... the advantage that the computing paradigm does not change, that is .... graphics boards, and network communications devices. ere are also .... level of code, sometimes found hard wired in a CPU to emulate ...

An SMT Based Method for Optimizing Arithmetic Computations in ...
embedded software program via code transformation to reduce the required bit-width and to increase the dynamic range. Our method is based on judicious application of an SMT solver based inductive synthesis procedure to code regions of bounded size. W

An SMT Based Method for Optimizing Arithmetic Computations in ...
paper, we present a new compiler assisted code transformation method to ...... case, taken from [18], is an inverse discrete cosine transform. (IDCT), which is ...

Texas Register
(I)AP Computer Science A; ... (M)Discrete Mathematics for Computer Science; ... (A)a coherent sequence of courses for four or more credits in career and ... (vii)Chapter 130, Subchapter K, of this title (relating to Information Technology); or.

Member Register -
Closure. Chief Judge and Contest Chair. 10:00. 12:35 AM. 12:45 AM. Gurgaon Toastmasters Club. Toastmasters Club # 1200975. 429th Meeting - Sep 3rd, 2017 (Sunday). Time: 9:00 AM - 12:45 PM. Address: The Shri Ram School , V- 37, Moulsari Avenue, DLF Ph

Register: http://bit.ly/LearningPower2014
During the upcoming school year, Technology and Innovation in Education (TIE), ... teacher-developed resources now featured on the Smarter Balanced Digital ...

a case for specialized processors for scale-out ... - (PARSA) @ EPFL
web search, social networks, and video shar- ing, are all ..... 10. 11. Cache size (Mbytes). Figure 4. Performance sensitivity to the last-level cache (LLC) capacity.

Extending SMTCoq, a Certified Checker for SMT - Stanford University
SMT-Solver. In R. A. Schmidt, editor: CADE, Lecture Notes in Computer Science 5663, Springer, pp. 151–156 ... Available at http://www.cl.cam.ac.uk/~tw333/.

Extending SMTCoq, a Certified Checker for SMT - Stanford University
The checker's soundness is stated with respect to a translation function from the ... The choice of the type of Booleans bool as the codomain of the translation ...

BioPSy: An SMT-based Tool for Guaranteed Parameter ...
perform sensitivity analysis limiting how much of the state space the model checker ..... progress in continuous and hybrid reachability analysis. ... Soft Comput.

Comparing SMT Methods for Automatic Generation of ...
In this paper, two methods based on statistical machine trans- lation (SMT) are ... Lecture Notes in Computer Science: Authors' Instructions pronunciations for ...

Better Learning and Decoding for Syntax Based SMT ...
Data made available by the courtesy of Microsoft .... Part-of-Speech mapping template: whether the ..... clude that PSDIG and Pharaoh each excel on dif-.