Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

IMPROVING L2 CACHE PERFORMANCE THROUGH STREAM-DIRECTED OPTIMIZATIONS

SOHONI, SOHUM

Abstract Details

2004, PhD, University of Cincinnati, Engineering : Computer Science and Engineering.
Research on caches has traditionally concentrated on the L1 cache. Most of the improvements in the design of L2 caches have been rather simple: increase in size and associativity. We believe that the L2 offers some unique opportunities for improvement. First, the L2 does not lie on the critical path, and can be made more complex. Second, the L1 filters out many temporal references to data blocks, so that accesses to the L2 are more uniform and show streaming patterns. Third, the miss penalty of the L2 cache is very high, since the next level in the memory hierarchy is usually the main memory (system RAM). In this work, we propose to improve overall performance by detecting streaming references at the L2 access level and by performing stream-based optimizations. The first optimization is targeted towards improving the LRU replacement policy. We identify streaming blocks that will not be accessed before being eventually evicted and we promote them to the LRU position. By removing such blocks from the cache immediately after their last access, we retain potentially useful cache blocks. This leads to better cache utilization and lower capacity/conflict misses. The second optimization is a prefetching scheme that is triggered by streaming accesses. Accesses to the streaming blocks are continuously monitored, and prefetching is stopped immediately after a streaming access ends. This eliminates the two most significant drawbacks of prefetching: it ensures that there is no cache pollution, and it keeps the increase in memory traffic to a minimum. Prefetching reduces the compulsory L2 misses. The key element in our optimizations is a low-overhead technique for identifying streaming accesses. Our stream detector is less than 0.002% of the size of the L2 cache, and thus has an extremely low hardware overhead. Our preliminary results for SPECfp2000 and multimedia applications show that there is significant sequentiality at the L2 access level. Trace-driven simulations for cache miss rates show that our first optimization, modified LRU, does not perform well. More importantly, the results show that our second optimization, stream-based prefetching, performs quite well: it yields substantial reductions in the miss rates of most applications without increasing the memory traffic. Full-system, execution-driven simulations show significant reductions in cache miss rates: 22% for SPECfp2000 and 33% for multimedia applications. The simulations also show an 18% reduction in the overall L2 access time. Thus it is clear that stream-based prefetching is highly effective in improving system performance. Moreover, its low hardware cost makes it ideal for a real-world implementation.
Dr. Yiming Hu (Advisor)
116 p.

Recommended Citations

Citations

  • SOHONI, S. (2004). IMPROVING L2 CACHE PERFORMANCE THROUGH STREAM-DIRECTED OPTIMIZATIONS [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092932892

    APA Style (7th edition)

  • SOHONI, SOHUM. IMPROVING L2 CACHE PERFORMANCE THROUGH STREAM-DIRECTED OPTIMIZATIONS. 2004. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092932892.

    MLA Style (8th edition)

  • SOHONI, SOHUM. "IMPROVING L2 CACHE PERFORMANCE THROUGH STREAM-DIRECTED OPTIMIZATIONS." Doctoral dissertation, University of Cincinnati, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1092932892

    Chicago Manual of Style (17th edition)