1

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008

Double-DIP: Augmenting DIP with Adaptive Promotion Policies to Manage Shared L2 Caches Jonathan D. Kron†

Brooks Prumo‡

Gabriel H. Loh†

Georgia Institute of Technology College of Computing† School of Electrical and Computer Engineering‡ {jonathan.kron,prumob}@gatech.edu, [email protected] Abstract In this paper, we study how the Dynamic Insert Policy (DIP) cache mechanism behaves in a multi-core shared-cache environment. Based on our observations, we explore a new direction in the design space of caches called the promotion policy. In a conventional LRU-based cache, a hit causes the line to be promoted to the MRU position in the recency stack. Instead, we suggest an incremental promotion policy where each hit on a cacheline progressively moves it toward the MRU position. We describe a generalization of the DIP approach that can simultaneously adapt both the insertion and promotion policies of a shared multi-core cache. Our preliminary results indicate that promotion polices are a promising avenue to further improve the behavior of shared L2 caches.

1. Introduction Careful management of the on-chip cache resources has always been an important research topic for highperformance microprocessors. Despite decades of work on cache optimizations, this continues to be a thriving research topic as evidenced by several recent studies on building better L2 caches [2, 7, 8, 10, 11]. The advent of multi-core processors has made L2 cache design more challenging due to the interactions and contention introduced by multiple cores competing for the same shared resources. As a result, a substantial research effort has gone into finding new and better ways to manage multi-core shared caches [3, 4, 6, 9, 12]. In this paper, we start by examining Qureshi et al.’s recently proposed Dynamic Insertion Policy (DIP) cache management scheme [8]. The original evaluation of DIP was only in the context of a single-core processor. We use simulations to study the impact of using this technique in a multi-core system. Competition between cores for the shared L2 cache increases the overall L2 miss rate; the increase in cache activity seems like it might increase the number of opportunities for DIP to adapt to and provide more benefit. This turns out to not be the case, and we provide a qualitative explanation for why this is so. We then introduce a new design parameter for cache management called the promotion policy. We then propose Double-

DIP which is a generalization of Qureshi et al.’s DIP. Our Double-DIP approach simultaneously considers both insertion and promotion policies, can dynamically adapt both of these, and also provides a finer level of granularity of adaptivity.

2. Dynamic Insertion Policy (DIP) In this section, we first review the original Dynamic Insertion Policy (DIP) for cache management. We then briefly explain our simulation methodology and then present performance results for how DIP performs in both single-core and multi-core contexts. 2.1. Review of the Original DIP Qureshi et al. observed that for some applications, there exist many cache lines that get brought into the cache and then are never used again before they are evicted [8]. Ideally, we would want to minimize the amount of time that these lines remain in the cache, as they are simply occupying space without providing any performance benefit (i.e., keeping them cached will not result in any additional cache hits). The traditional LRU (least recently used) cache policy assumes that all newly installed cachelines are placed at the MRU (most recently used) position of the recency stack, which makes sense since this newly installed line is in fact the most recently used in the set. For a cacheline with no reuse, however, such a policy ensures that this cacheline will remain in the cache for at least n more accesses to this set (for an n-way set-associative cache). That is, it will require at least n cache misses to this set before this cacheline gets demoted to the LRU position and then gets evicted from the cache. Qureshi et al. introduced the novel concept of an insertion policy. That is, when the processor first installs a new cacheline into a cache set, the processor need not automatically place the line in the MRU position. In particular, they first introduce the LRU Insertion Policy (LIP) which places newly installed cachelines in the LRU position of the recency stack. If the line gets accessed again, then the cache promotes the line to the MRU position. In some situations, the cacheline may be used again but not before another cacheline gets installed. In this case, inserting in the

2

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008

Data

Real Tags

LRU Tags

Increment on miss in LRU tags

Adapt insertion policy based on PSEL

BIP Tags

− Decrement on miss in BIP tags

+

PSEL

(a)

Real Tags

Data

Increment on miss in LRU leader set

Adapt insertion policy for follower sets based on PSEL

+



Decrement on miss in BIP leader set

PSEL

(b)

Figure 1. Tracking the better of two insertion policies using (a) per-policy shadow tags and (b) set dueling.

LRU position will cause the line to get evicted and therefore cause an additional miss on its next access. To combat this, Qureshi et al. introduced the Bimodal Insertion Policy (BIP) that acts the same as LIP, except that a line may occasionally be inserted into the MRU position with some non-zero probability. Finally, Qureshi et al. show that for some applications, BIP works really well, while for other applications, traditional MRU insertion is still the better option. To deal with this diversity in behaviors, they proposed the Dynamic Insertion Policy (DIP) which tracks which of the two insertion policies performs better and then dynamically chooses the better approach. The naive way to track which policy is performing better would be to implement two additional sets of “shadow tags.” Each set of shadow tags is responsible for simulating what the contents of the cache would have been had a particular insertion policy been used all along. A small integer counter (called PSEL or the Policy Selector) can then be used to track which policy would have resulted in a higher number of cache misses. This is all illustrated in Figure 1(a). The main cache array uses this counter value to determine which insertion policy should be employed. Full duplication of all of a cache’s tags (plus LRU counters) would result in an unacceptably high overhead. To estimate the performance of the two insertion policies with reasonable overhead, Qureshi et al. propose a novel monitoring mechanism called set dueling. The idea is to dedicate a few sets in the cache to always employ a single insertion policy. Figure 1(b) shows a figure depicting the sets in a cache. The lightly shaded sets always make use of a traditional MRU insertion policy (i.e., a true traditional LRU replacement policy); the darker shaded sets always make use of the Bimodal insertion policy. A miss in a LRU set causes the PSEL counter to be incremented, while a miss in a BIP set causes PSEL to be decremented. This approach effectively simulates the different insertion policies in vivo

within the cache, and then extrapolates the overall benefit of the two insertion policies based on these “leader” sets. As a result, the hardware overhead is reduced to the trivial amount required to implement the single PSEL counter. They also describe a simple complement-select mechanism for identifying which sets are leader sets without storing any extra information.

3. Experimental Methodology We make use of simulation to generate all of our experimental results. In particular, we use the pre-release version of the x86 SimpleScalar toolset [1]. Our cycle-level simulator includes detailed models of all of the caches, the intercache buses, miss-status handling registers (MSHRs), offchip bus (FSB) contention and memory-controller queuing delays. Our baseline processor configuration is based on the Intel Core 2 Duo (65nm/Merom version); Table 1 lists the microarchitecture details. We make use of programs from the SPECcpu 2006 suite. In particular, we selected benchmarks with varying L2 cache miss rates. In this fashion, we could explore the impact of having two co-scheduled programs with different L2 cache access/miss rates and see how that impacts overall cache behavior and system performance. For each benchmark, we make use of the SimPoint 3.2 toolkit to select representative samples [5]. For the single-core runs, we first fast-forward while warming all caches for 500 million instructions, and then perform detailed cycle-level simulation for 100 million committed instructions. For the multi-core runs, we fast-forward by 500 million instructions per core in a round-robin fashion, and then perform detailed simulation for 100 million instructions per core. When one core reaches its instruction limit, we freeze its relevant statistics, but we then continue to simulate its execution so that it still competes with the other core for shared resources. This is similar to previous methodologies employed in multi-core cache studies [9].

3

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008



*+ +

  





   

 



 



 

 

 

 

  

)

  

   

!" #

   





    

  



 



   

  

 

  

 

   

 

  

 



  





  



  





  



    





$% &'(



  



)

(a)

(b)

Figure 2. Performance of DIP compared to LRU as measured by (a) L2 miss rate in MPKI and (b) IPC.

Fetch Width Decode Width Decoders Issue Width Commit Width RS ROB LDQ/STQ IL1/DL1 Shared L2 Memory Latency

32 bytes per cycle Up to 4 x86 insts per cycle 4-1-1-1 fused μops 6 μops per cycle 4 μops per cycle 32 fused entries 96 fused entries 32/20 entries 32KB, 8-way 4MB, 16-way 250 cycles + queuing

Weighted Speedup [13] Throughput (IPCs)

1 n

Pn

(IP C [i]/IP Cbase [i]) i=1 Pn enhanced i=1 (IP Cenhanced [i])

Table 2. IPC Performance metrics used in this study.

adaptation). The results, however, show that the relative benefit of DIP is still about the same as in the single-core case (this can still be viewed as a positive result that says that going to multi-core shared caches does not break DIP).

Table 1. Baseline microarchitecture details.

4. DIP Performance We evaluated the performance benefit of DIP for both single-core and multi-core configurations. Figure 2 shows the L2 miss rates (measured in misses per thousand instructions or MPKI) and the IPC improvements of our benchmarks for both a conventional LRU policy (i.e., MRU insertion) and the DIP policy. Overall, DIP reduces the L2 miss rate and increases performance. The magnitude of the benefits are not as large as previously reported, but this is primarily due to the fact that our simulation makes use of a larger and more highly set-associative contemporary L2 cache organization (4MB/16-way, like that used in the 65nm Intel Core 2 Duo), as opposed to the 1MB/8-way L2 used in the original DIP study. Figure 3 shows the results for our dual-core workloads. For the DL2 miss rates, we report the average between the two programs. For the IPC improvements, we use both the increase in the total IPC throughput and the weighted speedup metrics [13]; details are listed in Table 2. Compared to the single-core runs, the number of L2 misses has increased quite dramatically. This makes sense since the two cores now have to compete with each other for the shared cache resources. With so many more misses, one might expect DIP to perform better than in the singlecore case because it has more dynamic behavior to attempt to adapt over (i.e., there should be more opportunities for

5. Double-DIP 5.1. Cache Promotion Policies For the BIP component of DIP, most cachelines will be inserted at the LRU position of the recency stack. On a subsequent hit, the line will then be promoted to the MRU position. In a dual-core situation, the previously most-recentlyused line may belong to the other core. Now consider the example shown in Figure 4(a) where one core (P0 ) accesses one set in the cache more frequently than the other (P1 ). The multiple accesses by P0 rapidly push all of P1 ’s cachelines down to the less-recently used end of the LRU recency stack, leaving them far more vulnerable to eviction. When combined with insertion at the LRU position (LIP/BIP/DIP), P1 will have a very difficult time trying to keep its data in the cache. To help mitigate the pathological access patterns described above, we propose a new design parameter for cache management: the cache promotion policy. A traditional cache uses a MRU promotion policy meaning that anytime there is a cache hit, the line is instantly promoted to the MRU position in the recency stack. One possible alternative promotion policy is to incrementally move the cache line toward the MRU position1 . For example, Figure 4(b) 1 We

should technically not be calling the “leftmost” position (with respect to the recency stack) the most-recently used as it no longer holds the most recently used line, but we will still refer to the leftmost position as MRU to avoid introducing new terminology. This is also effectively

)





*+ +



--  

 --   

 --  

 --



  --  

 --   

 --   

 --  

 --  



*+ +



-- 

 --   

 --  

 --



  -- 

 --    

 --   

 -- 



 -- 





 --

  -- 

  --  

  --  

  --



      

 ---

  --- 

  --  

  --  

  --



  -- 

)

  -- 

     



4 In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008



%$&'(















  

  (a)

,









(b)

,

!  

 









(c)

Figure 3. Performance of DIP compared to LRU on our dual-core workloads as measured by (a) L2 miss rate in MPKI (b) weighted IPC speedup and (c) IPC throughput improvement.

5

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008

A

P0’s cache line

X

P1’s cache line

MRU Position

Requested Address

LRU Position

A

W

B

X

C

Y

D

Z

C

A

W

B

X

Y

D

Z

D

C

A

W

B

X

Y

Z

C

D

A

W

B

X

Y

Z

C

MRU Position

Requested Address

LRU Position

A

W

B

X

C

Y

D

Z

A

W

B

C

X

Y

D

Z

A

W

B

C

X

D

Y

Z

A

W

C

B

X

D

Y

Z

A

C

W

B

X

D

Y

Z

A

C

B

W

X

D

Y

Z

C

D

D

C

C

C

C C

D

A

W

B

X

Y

Z

B

B B

C

D

A

W

X

Y

Z

(a)

(b)

Figure 4. Example updates of a cache set using (a) a conventional MRU promotion policy and (b) a promotion policy that only moves accessed lines by a single position in the recent stack.

shows the same access sequence, but on a hit, we only move the cacheline by one position to the left. As a result, it takes several more accesses for a cacheline to find its way to the MRU position in the recency stack. The net result is that the positions of P1 ’s cachelines are able to stay further to the left in the recency stack, which means that P1 ’s cachelines are, on average, in less danger of eviction than when we used the conventional MRU-promotion policy. This helps to level the playing field between programs with different L2 cache access frequencies. Based on this intuition, we explore the potential for better promotion policies. 5.2. Double-DIP: Combining Cache Insertion and

Promotion Policies We propose to augment the original DIP approach to handle cache promotion policies different than the conventional MRU promotion approach. In particular, we make use of Qureshi et al.’s set-dueling approach to monitor the potential for two different promotion policies. The first promotion policy is the conventional MRU approach. The second policy is the Single-step Incremental Promotion Policy (SIPP), that was illustrated in Figure 4(b). Based on the misses observed in the leader sets, we use a PSEL counter to determine the promotion policy employed in the remaining follower sets. Together, this implements a Dynamic Promotion Policy (DPP). We actually implement both DPP and DIP at the same time. This requires that we make use of twice as many leader sets: half of the sets track the component insertion policies, and the other half of the sets track the component consistent with the terminology used in the original DIP paper.

promotion policies. We make use of two PSEL counters as well: IPSEL (insertion policy selector) and PPSEL (promotion policy selector). This slightly increases the hardware overhead (from one small counter to two), but this is still a very small amount of additional state. Note that insertion and promotion are orthogonal design considerations. The insertion policy is only invoked on the installation of a new cacheline, and the promotion policy is only considered when hitting on an already cached line. As a result, the leader sets for the promotion policies are actually follower sets with respect to the dynamic insertion policy. Similarly, the insertion policy leader sets are follower sets for the dynamic promotion policy. Our leader set allocation method is a simple extension to the original DIP complement-select strategy. In complement-select, the N sets in the cache are divided into K constituencies, where each constituency has one leader set per policy. The most significant k = log2 K bits of the set index determine the constituency number. The leader set for the first policy is that set where the least significant k bits of the set index are equal to the constituency number. That is, the mth set of constituency number m is the leader set for the first policy. For the second policy, the leader set is chosen by the set whose k least significant bits are equal to the bitwise complement of the constituency number. Figure 5(a) shows an example of the leader sets chosen by complement select. In our approach, we take the k+1 least significant bits and right-shift this number by one position (equivalent to taking k bits after omitting the least significant bit), and then compare this to the constituency number (or its complement). We then make use of the re-

6

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008

LRU Leader Set

MPP Leader Set

LRU Leader Set

MPP Leader Set

BIP Leader Set

SIPP Leader Set

BIP Leader Set

SIPP Leader Set

Constituency 1

Constituency 2

Constituency 3

(a)

(b)

Simultaneous Set Dueling on Four Policies

Constituency 0 +

− MSB=0 use Bimodal insertion MSB=1 insert at MRU position

IPSEL

+



PPSEL

MSBs=00 MSBs=00 MSBs=00 MSBs=00

promote by +1 promote by +2 promote by +4 promote to MRU

Figure 6. Overall organization of the Double-DIP cache management scheme.

Figure 5. Example of leader set allocation (a) for two policies using Qureshi et al.’s complement-select, and (b) for four policies using our extension to complement-select.

maining least significant bit to determine whether the set is a leader for the insertion policy or the promotion policy. Figure 5(b) illustrates an example. The original DIP policy recommended using at least 16 leader sets per policy; our experiments confirmed this is to be a good number. We initially had our L2 cache use set-dueling to choose between an MRU Promotion Policy (MPP) and the Singlestep Incremental Promotion Policy (SIPP). For some workloads, this proved to be an effective approach, but for others the results were not as good. The main problem is that even with dynamic adaption, direct promotion to the MRU position is still frequently too aggressive and can cause bad shared cache interactions. As a result, we propose the Dynamic Promotion with Interpolated Increments Policy (D-PIIP) that can choose different promotion increment amounts based on the strength of the PPSEL counter. Normally, we would use the most significant bit of PPSEL to effectively choose between a promotion of +1 position (SIPP) versus a promotion of +n positions (MPP; n is the set-associativity). With D-PIIP, we use the two most significant bits to select promotion increments of +1, +2, +4 and +n. Figure 6 shows the final overall organization of the Double-DIP scheme. Due to the strong resemblance of our DPP (and the derivative D-PIIP) scheme to the original DIP approach, we call the overall combined scheme “Double-DIP” to properly acknowledge the genesis of our proposal. 5.3. Double-DIP Results We evaluated the performance of Double-DIP on our dualcore workloads. Figure 7 shows the L2 MPKI and the IPC performance improvements using conventional LRU

as the baseline. For reference, the original DIP approach, as well as DIP/SIPP (adaptive promotion between +1 and +n only) are also included. For DIP/SIPP, we can see that there are a few cases where the promotion policy provides some striking reductions in miss rates (e.g., astar(bl)::milc and astar(bl)::soplex(ref)), but for many other situations it performs even worse than the baseline LRU policy. When we make use of Double-DIP’s variable promotion increments, the overall average miss-rate improves even further, and the improvements across the workloads is far more consistent (i.e., there are fewer workloads where Double-DIP performs worse than original DIP or LRU). The performance results are similar for both weighted IPC and throughput metrics. The DIP/SIPP approach achieves strong speedups on only three of the workloads. Even worse, DIP/SIPP causes a performance slowdown on a substantial number of workloads. The Double-DIP approach works far better than both the original DIP and the DIP/SIPP schemes. The only outlier is the bzip2(c)::mcf workload where we suffer a 8.6% (9.2%) reduction in performance as measured by the IPC weighted speedup (throughput). There are many potential reasons for the performance anomalies of DIP/SIPP and Double-DIP. This could be due to the cross-interactions/contamination between the two sets of leader sets for insertion and promotion policies. Another possibility is that our simple extension to the complement-select leader-set allocation scheme may have introduced some unexpected interference. Further analysis is required to fully understand the interactions between insertion policies, promotion policies, and how the access patterns of the multi-core workload all affect each other. Such work will be explored in the future, but these preliminary results are very encouraging and strongly suggest that the cache promotion policy is a new dimension in

) )0,1

  

 



),01

),01

*+ +



--  

 --   

 --  

 --



  --  

 --   

 --   

 --  

          

*+





-- 

 --   

 --  



 --





  -- 



 --    

)01

 --   



 -- 





 --  



 -- 



)0,1

 --

  -- 



 -- 

  --





  --  

  --  

  --



      

  --  

  --  

  --



  -- 

)

  --  

     



7 In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008



%$&'(

&'(./'((

& )&'(

(a)

&'(

&'(./'((

& )&'(

)01





(b)

!  

  &'(

&'(./'((

)201

& )&'(

)01





(c)

Figure 7. Performance of DIP compared to LRU on our dual-core workloads as measured by (a) L2 miss rate in MPKI (b) weighted IPC speedup and (c) IPC throughput improvement.

8

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008

the cache design space that warrants further investigation. The performance benefits of our Double-DIP scheme appear to be unique to multi-core shared-cache scenarios. We ran additional experiments where we make use of DoubleDIP with single-program workloads, and the overall performance was very similar to that of the original DIP approach. We believe that this provides evidence to back up our hypothesis that Double-DIP helps deal with the recency-stack interference caused by two programs with different L2 access rates fighting over the shared cache resources. The results in Figure 7 also show that the original DIP has much more stable behavior. That is, when DIP works, it works; when it is not beneficial, it does not have much negative impact on the workloads either. Double-DIP on the other hand works very well in some cases, but it still causes some relatively significant performance decreases on certain workloads. Our current implementation of DoubleDIP has not been extensively tuned and optimized, and we are confident that with some furhter tweaks, we can minimize the cases where Double-DIP hurts performance. One straightforward adjustment is to simply change the promotion increments to have more MRU-promotion-like behavior (e.g., use increments of +4/+8/+12/+16). Another possible modification is provide per-core PSEL counters. The idea is that on a miss in a leader set, we also check to see which core initiated the cache access and then update its corresponding PSEL counter. For a dual-core configuration, this would add up to four PSEL counters (one PPSEL and one IPSEL, per core). In this fashion, each core can adopt different levels of aggressiveness with respect to both insertion and promotion policies.

6. Conclusions In this work, we have introduced the concept of a cache’s promotion policy as a new design parameter. Our preliminary results indicate that this is a promising avenue for improving the behavior of caches in a multi-core system. The Double-DIP approach presented in this paper is only a firststep in exploring ways to exploit different promotion policies for shared caches. Future research directions include optimization and refinement of the Double-DIP scheme. The idea of interpolated increments can potentially be applied to insertion policies as well, which may work for both single- and multi-core (including more than two cores) scenarios. Additional research is requried for evaluating these techniques with multi-threaded workloads (as opposed to multi-programmed) where two or more threads may share some cache lines. For all scenarios, fairness also needs to be studied, and the policies may need further refinements to strike the right balance between fairness and performance. As mentioned earlier, much more work is required to analyze promotion policies and understand why and how they work. Most modern caches do not make use

of true LRU replacement policies due to the complexity in updating all of the recency counters, especially for modern highly-associative caches. It may be fruitful to explore how to adapt promotion policies to LRU approximation algorithms such as NMRU and pseudo-LRU replacement policies.

Acknowledgments Funding and equipment were provided by a grant from Intel Corporation.

References [1] Todd Austin, Eric Larson, and Dan Ernst. SimpleScalar: An Infrastructure for Computer System Modeling. IEEE Micro Magazine, pages 59–67, February 2002. [2] Arkaprava Basu, Nevin Kirman, Meyrem Kirman, Mainak Chaudhuri, and Jose Martinez. Scavenger: A New Last Level Cache Architecture with Global Block Priority. In Proceedings of the 40th International Symposium on Microarchitecture, Chicago, IL, December 2007. [3] Bradford M. Beckmann, Michael R. Marty, and David A. Wood. ASR: Adaptive Selective Replication for CMP Caches. In Proceedings of the 39th International Symposium on Microarchitecture, Orlando, FL, December 2006. [4] Haakon Dybdahl and Per Stenstr om. An Adaptive Shared/Private MUCA Cache Partitioning Scheme for Chil Multiprocessors. In Proceedings of the 13th International Symposium on High Performance Computer Architecture, Phoenix, AZ, USA, February 2007. [5] Greg Hamerly, Erez Perelman, Jeremy Lau, and Brad Calder. SimPoint 3.0: Faster and More Flexible Program Analysis. In Proceedings of the Workshop on Modeling, Benchmarking and Simulation, Madison, WI, USA, June 2005. [6] Seongbeom Kim, Dhruba Chandra, and Yan Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 111–122, Antibes Juan-les-Pins, France, September 2004. [7] Moinuddin K. Qureshi, , Daniel Lynch, Onur Mutlu, and Yale N. Patt. A Case for MLP-Aware Cache

9

In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008

Replacement. In Proceedings of the 33rd International Symposium on Computer Architecture, pages 167–178, Boston, MA, USA, June 2006. [8] Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely Jr., and Joel Emer. Adaptive Insertion Policies for High-Performance Caching. In Proceedings of the 34th International Symposium on Computer Architecture, pages 381–391, San Diego, CA, USA, June 2007. [9] Moinuddin K. Qureshi and Yale N. Patt. UtilityBased Cache Partitioning: A Low-Overhead, HighPerformance, Runtime Mechanism to Partition Shared Caches. In Proceedings of the 39th International Symposium on Microarchitecture, pages 423–432, Orlando, FL, December 2006. [10] Kaushik Rajan and Govindarajan Ramaswamy. Emulating Optimal Replacement with a Shepherd Cache. In Proceedings of the 40th International Symposium on Microarchitecture, Chicago, IL, December 2007. [11] Ranjith Subramanian, Yannis Smaragdakis, and Gabriel H. Loh. Adaptive Caches: Effective Shaping of Cache Behavior to Workloads. In Proceedings of the 39th International Symposium on Microarchitecture, pages 385–396, Orlando, FL, December 2006. [12] G. Edward Suh, Larry Rudolph, and Srinivas Devadas. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing, 28(1):7–26, 2004. [13] Dean M. Tullsen and J. Brown. Handling LongLatency Loads in a Simultaneous Multithreaded Processor. In Proceedings of the 34th International Symposium on Microarchitecture, pages 318–327, Austin, TX, USA, December 2001.

Double-DIP: Augmenting DIP with Adaptive Promotion ...

1We should technically not be calling the “leftmost” position (with re- spect to the recency stack) .... call the overall combined scheme “Double-DIP” to properly acknowledge the .... 13th International Conference on Parallel Architec- tures and ...

4MB Sizes 1 Downloads 202 Views

Recommend Documents

Double-DIP: Augmenting DIP with Adaptive Promotion ...
Georgia Institute of Technology. College of Computing. † .... 1. In the 2nd Workshop on CMP Memory Systems and Interconnects (CMP-MSI), June 2008 ...

Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia
vs. colour), plurals (baby vs. babies), abbreviations (N.Y.C. vs. New York City), equivalent syntactic constructions (predatory birds vs. ..... Knowledge and Data.

Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia
Department of Computer Science, University of Waikato. Private Bag 3105 ..... and 11,000 non-descriptors—and high degree of specificity. The version of ...

Augmenting Points of Interest Recommendations with ...
Evaluation strategy. Recommendation .... Hypothesis 2: the POIs 1, 2 and 3 are less matching the user's preferences. ... Beautiful(1), Big(2),. Calm(1), Cold(1), ...

Augmenting Conversations through Context-Aware Multimedia ...
Oct 20, 2011 - America, by accounting for 37% share of all downstream Internet traffic and ... call-center conversations and multimedia contents [14,20,9]. Most.

Augmenting Domain-Specific Thesauri with Knowledge from Wikipedia
applications, from manual indexing and browsing to automated natural language ..... Proc. of the International Conference on Web Intelligence. (IEEE/WIC/ACM ...

Augmenting Solr's NLP.pdf
Sign in. Loading… Whoops! There was a problem loading more pages. Retrying... Whoops! There was a problem previewing this document. Retrying.

gel dip co.pdf
Barati et al., 2009, Applied Surface Science Preparation ... ture film on 316L stainless steel by sol – gel dip co.pdf. Barati et al., 2009, Applied Surface Science ...

subminiature dip relay - GitHub
Bounce time ... 2) In case 5V of transistor drive circuit, it is recommended to use .... with the consideration of shock risen from transit and relay mounting, it should.

Enhancing and Augmenting Human Reasoning
worker” in the ever-expanding information-based economy. ..... information which would properly induce a rational person to have more ...... (2002). Visualizing.

Acidic aqueous chlorite teat dip with improved visual indicator stability ...
Mar 2, 2006 - Data Sheet, manufactured by E. I. duPont deNemours & Co. (Date unavailable). Flett et al. ..... (i.e., removal of source), total viscosity recovery occurs .... ence has taught that favorable admixture blending is best accomplished ...

Acidic aqueous chlorite teat dip with improved visual indicator stability ...
Mar 2, 2006 - STN Online, American Chemical Society, Columbus, Ohio,. Database File: .... degrees of effectiveness, limit the transmission of mastitis by reducing ...... alpha-hydroxycarboxylic acid, healing may be accelerated; and, by ...

Thanksgiving DIP Wishlist.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. Thanksgiving ...

Augmenting Conversations through Context-Aware ...
The amount of video content available on the Internet has increased dramatically .... indexed into a special data structure to optimize the speed and performance in .... [18] SoX - Sound eXchange. http://sox.sourceforge.net. [19] Taylor, A. and ...

Augmenting Conversations through Context-Aware Multimedia ...
Oct 20, 2011 - call-center conversations and multimedia contents [14,20,9]. Most of them relied on the fact that the retrieval accuracy is not significantly ...

DIP del DMA.pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. DIP del DMA.pdf.