Chip Multi-Processor (CMP) designs have become dominant in the processor market. The evaluation and development of CMPs is essential for product improvement. Up to date, CMPs have presented many challenges for system designers, including cache memory system scalability. My research aims to implement a highly scalable CMP cache memory system using an associative cache, with enhanced replacement policy and a scalable cache coherent protocol.
This thesis establishes a novel Adaptive Hashing and Replacement Cache (AHRC) design, which can maintain high associativity with an advanced method of replacement policy. The AHRC design can improve associativity and keep the possible number of locations of each block (or ways) to a minimum. For the AHRC, the Adaptive Reuse Interval Prediction (ARIP) replacement policy was used because of its ability to resist both scan and thrash.
This research involved simulating several workloads on a large-scale CMP with AHRC as the last-level cache. The results demonstrated that AHRC has better energy efficiency and higher performance than conventional caches. Additionally, larger caches that utilise AHRC are the most suitable in many-core CMPs, as they support scalability as opposed to smaller caches. Scalable cache coherence protocols are essential for CMPs systems, in order to satisfy the requirement for more dominant high-performance chips with shared memory. However, the limited size of the directory cache, associated with larger systems, may result in recurrent directory entries, evictions and invalidations of cached blocks thus compromising system performance.
This thesis proposes the Private/Shared, Read-Only/Read-Write, Invalid/Valid scalable coherence protocol called PROI. This novel protocol implements a slight modification on the caches’ tags, allowing it to differentiate between the private and shared data on a block granularity level. Also, PROI employs a dynamic writing policy with self-invalidation and self-downgrade for each L1 cache and can sustain system coherence and performance, scale with the raised number of cores and reduce area, energy, and performance associated costs with the coherence mechanism. The result indicates that PROI can reduce various variables, including the miss ratio of the private L1 cache by 17%, the network traffic, application runtime of approximately 6%, and energy consumption by about 35%. Therefore, utilising AHRC, ARIP, and PROI can mitigate the cache scalability constraints significantly and maintain the performance level while enhancing energy consumption of the CMP cache.