3 mins read

AMD further strengthens its technical computing leadership with the world’s first data center CPU with true 3D die stacking

AMD further strengthens its technical computing leadership with the world’s first data center CPU with true 3D die stacking
AMD further strengthens its technical computing leadership with the world’s first data center CPU with true 3D die stacking

AMD announced the general availability of its 3rd Gen AMD EPYC processors with AMD 3D V-Cache, this March 22nd, the world’s highest performing x86 server processor for technical computing and first data center CPU using 3D die stacking. Codenamed Milan-X, these processors expand the 3rd Gen EPYC CPU family and deliver up to 66 percent performance uplift on targeted technical computing workloads on versus comparable, non-stacked 3rd Gen AMD EPYC processors

Designed for Technical Computing 

These processors were engineered to work within the power and thermal design specifications of existing Milan high frequency parts so that customers would have the flexibility to choose high frequency Milan or large cache Milan-X in the same server platform with the same number of memory and I/O channels.

The 3rd Gen EPYC with 3D V-Cache is designed for Technical Computing. AMD has tripled the L3 cache available on existing 3rd Gen EPYC processors, bringing the total to a massive 768 MB per socket or 1.5 GB per 2P server. With Milan as the foundation, it is possible to maintain compatibility with SP3 systems. A BIOS upgrade is all that’s needed to enable existing Milan servers to support MilanX, but customers still get the security, IO bandwidth, and compatibility benefits of Milan. 

AMD has introduced four Milan-X SKUs to the market. Each one represents unique value for particular workloads. Every Milan-X model has 8 Core Complex Dies or CCDs, each of which contains 96 MB of L3 cache, for a total of 768MB and the Zen 3 shared L3 cache model means that any single core can access the full 96 MB. 

Milan-X is designed specifically for technical computing applications. These are some of the most complex and demanding workloads in the data centers. These applications are typically enablers of product designs. 

  • Computational Fluid Dynamics is used to simulate physical interactions across a broad range of applications from consumer product designs to aerospace engineering. 
  • Finite Element Analysis simulates strength and vibration of products such as engines and tires as well as medical devices like heart valves. 
  • Structural Analysis explores high-impact situations such as crashes or explosions in order to predict cascading damage to components. 

These tools are used to simulate and improve the design of physical systems. Just as these software solutions are used to simulate the physical world around us, EDA tools are used to simulate and optimize chip design.

Workloads that may be a fit for Milan-X

  • are sensitive to L3 cache size
  • have high L3 cache capacity misses – the data set is often too large for L3 cache
  • have high L3 cache conflict missesie, the data pulled into cache has low associativity.

Workloads that likely won’t benefit from Milan-X

  • already have L3 cache miss rates near zero
  • have high L3 cache coherency misses – ie, data is highly shared between cores
  • may be CPU-intensive, but only “stream” data or use it once rather than
  • operating on it iteratively

Milan-X vs Milan – Performance benefits 

EDA applications are typically lightly threaded. They benefit from a large cache per core ratio and from considerable memory bandwidth. EDA software is also licensed on a per-core basis. While both 16 and 24-core Milan-X parts offer significant benefits to EDA customers, the structure of their software license agreement may be the deciding factor in which model they choose. 

CFD, FEA and Structural Analysis applications are all highly threaded and scale well to many cores. The CFD software market is broad, with applications for every size of organization and problem. CFD software users typically want the flexibility to choose the number of cores that best suits both their workload and their licensing model. While FEA and Structural Analysis tools are both highly threaded, the size of a customer’s data set may be the determining factor in which processor they choose. Larger data sets benefit from greater core density, but again, the flexibility to choose the number of cores best suited to their needs is important to customers.

EDA is the core of the semiconductor industry. And RTL Simulation makes up the majority of the work in digital circuit simulation. The 16 core Milan-X processor accelerates RTL Simulation by 66% over the fastest 16 core Milan part. To put that into context, the expected performance uplift for EDA tools is 8-12% per generation. Basically, Milan-X is still based on Milan – the same Zen3 compute cores, with a massive L3 cache.

Competitive benchmarking with Intel Xeon 

Imagine an organization with limited data center space. They’re using CFX to design, model and test their products and they’ve determined they can get 4,600 jobs done in a day with a rack of 20 Intel servers using Intel’s fastest 32c Xeon. With Milan-X 32c processors, they could get that work done with half the servers, using 49% less power. Dropping from 178kwh to 91kwh of power used per year is the equivalent of 81 acres of US forest carbon sequestration per year. This nets the organization a 51% Total Cost of Ownership savings over a 3 year deployment Oh, and let’s not forget that this does not include software license costs. Going from 1,280 cores down to 640 cores with AMD will dramatically reduce license core based, license cost-on top of the already significant TCO savings.

With this new innovation, AMD will continue to deliver a new level of performance to their large customer base globally.

Leave a Reply