INTEL XEON PHI
Introducing Intel’s Knights Landing
Knights Landing is the codename for Intel’s 2nd generation Xeon Phi Product Family, which delivers massive thread parallelism, data parallelism and memory bandwidth – with improved single-thread performance and Intel Xeon processor binary-compatibility in a standard CPU form factor. Additionally, Knights Landing will offer integrated Intel Omni-Path fabric technology and also be available in the traditional PCIe coprocessor form factor.
2U Intel Xeon Phi Processor (KNL) Quad Module Server/Omni-Path Host Fabric Interface
Designed for parallelized workflows in the HPC market and features four Intel Compute Modules, each with support for the Intel Xeon Phi Processor. The Intel Omni-Path Host Fabric Interface Adapter offers up to 100 Gbps per port of bandwidth, delivering performance that scales with high node and core counts. The hot-swappable compute modules, 3.5″ drive bays, and redundant power supply modules offer easy serviceability.
The most distinguishing feature of the chip is that it’s a bootable host CPU — unlike its predecessor Knights Corner, which is a coprocessor that connects over PCIe. The Knights Landing Phi is the first chip to offer an integrated fabric, Intel’s Omni-Path Architecture (OPA), in the package.
Knights Landing also puts integrated on-package memory in a processor, which benefits memory bandwidth and overall application performance. A six-channel memory controller supports up to 384 GB of DDR4-2400 memory (~90GB/s sustained bandwidth). There are 36 PCI Express 3.0 lanes for connecting to PCIe coprocessors, PCIe SSDs or discrete graphics cards. The MIC (Many Integrated Cores) design fits 8 billion transistors on a die, using 14 nm process technology. The Phi product family comes in three variants: a PCIe coprocessor form factor; a stand-alone CPU; and a stand-alone CPU with integrated Omni-Path fabric technology. The SKU stack that Intel is launching includes four parts with different core counts, frequencies, TDPs and price points.
|Processor Number||Cache||Clock Speed||# of Cores/
# of Threads
|OPA on Chip|
|Xeon Phi Processor 7290F (16GB, 1.50 GHz, 72 core)||36||1.50 GHz||72/72||260||Yes|
|Xeon Phi Processor 7290 (16GB, 1.50 GHz, 72 core)||36||1.50 GHz||72/72||245||No|
|Xeon Phi Processor 7250F (16GB, 1.40 GHz, 68 core)||34||1.40 GHz||68/68||230||Yes|
|Xeon Phi Processor 7250 (16GB, 1.40 GHz, 68 core)||34||1.40 GHz||68/68||215||No|
|Xeon Phi Processor 7230F (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||230||Yes|
|Xeon Phi Processor 7230 (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||215||No|
|Xeon Phi Processor 7210F (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||230||Yes|
|Xeon Phi Processor 7210 (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||215||No|
Many Trailblazing Improvements in Knights Landing
|Improvements||What / Why|
|Self-Boot Processor||No PCIe bottleneck|
|Binary Compatibility with Xeon||Runs all legacy software. No recompilation.|
|New Core: SLM based||~3x higher ST performance over KNC|
|Improved Vector density||3+ TFLOPS (DP) peak per chip|
|AVX 512 ISA||New 512-bit Vector ISA with Masks|
|Scatter/Gather Engine||Hardware support for gather and scatter|
|New memory technology: MCDRAM + DDR||Large High Bandwidth Memory, MCDRAM Huge bulk memory, DDR|
|New on-die interconnect: Mesh||High BW connection between cores and memory|
- Knights Landing (KNL) is the first self-boot Intel Xeon Phi processor
- Many improvements for performance and programmability
- Significant leap in scalar and vector performance
- Significant increase in memory bandwidth and capacity
- Binary compatible with Intel Xeon processor
- Common programming models between Intel Xeon processor and Intel Xeon Phi processor
- KNL offers immense amount of parallelism (both data and thread)
- Future trend is further increase in parallelism for both Intel Xeon processor and Intel Xeon Phi processor
- Developers need to prepare software to extract full benefits from this trend
Choose from Some of Our Most Popular Intel Xeon Phi Chassis
Intel Xeon Phi
Highly-Parallel Processing for Unparalleled Discovery
Intel Xeon Phi – Extracting extreme performance from highly-parallel applications just got easier. Xeon Phi coprocessors, based on Intel Many Integrated Core (MIC) Architecture, complement the industry-leading performance and energy-efficiency of the Intel Xeon processor E5 family to enable dramatic performance gains for some of today’s most demanding Phi applications.
You can now achieve optimized performance for even your most highly-parallel, technical computing workloads, while maintaining a unified hardware and software environment.
Even Higher Efficiency for Parallel Processing
While a majority of applications will continue to achieve maximum performance using Intel Xeon processors, certain highly-parallel applications will benefit dramatically by using Intel Xeon Phi coprocessors. Each coprocessor features many more and smaller cores, many more threads and wider vector units. The high degree of parallelism compensates for the lower speed of each individual core to deliver higher aggregate performance for highly-parallel code.
You can use Xeon processors and Xeon Phi coprocessors together to optimize performance for almost any workload. To take full advantage of Xeon Phi coprocessors, an application must scale well to over one hundred threads, and either make extensive use of vectors or efficiently use more local memory bandwidth than is available on an Intel Xeon processor.
Read more about Xeon Phi Coprocessors
Intel Parallel Studio XE – Optimized Tools to Build Fast Code
Boost your applications performance with Intel C++ Compiler and Intel Fortran Compiler for Windows, Linux and OS X. The built-in OpenMP and Intel Cilk Plus parallel models combined with performance libraries simplify the implementation of fast, parallel code. Available in 3 editions: Cluster, Professional and Composer. As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Read more about Intel Parallel Studio XE 2017
The Intel Xeon Phi coprocessors can dramatically accelerate performance for your highly-parallel applications to help you push the boundaries of innovation and scientific discovery, without requiring your developers to reinvent the wheel.
|Processor Number||Cache||Clock Speed||# of Cores/
# of Threads
|Xeon Phi Coprocessor 7240P (16GB, 1.3 GHz, 68 Core)||34 MB||1.30 GHz||68/68||275||No|
|Xeon Phi Coprocessor 7220P (16GB, 1.2 GHz, 68 Core)||34 MB||1.20 GHz||68/68||275||No|
|Xeon Phi Coprocessor 7220A (16GB, 1.2 GHz, 68 Core)||34 MB||1.20 GHz||68/68||275||No|
|Xeon Phi Coprocessor 7120X (16GB, 1.238 GHz, 61 core)||30.5 MB||1.24 GHz||61/ 61||300||No|
|Xeon Phi Coprocessor 7120P (16GB, 1.238 GHz, 61 core)||30.5 MB||1.24 GHz||61/ 61||300||No|
|Xeon Phi Coprocessor 7120D (16GB, 1.238 GHz, 61 core)||30.5 MB||1.24 GHz||61/ 61||270||No|
|Xeon Phi Coprocessor 7120A (16GB, 1.238 GHz, 61 core)||30.5 MB||1.24 GHz||61/ 61||300||No|