INTEL XEON PHI
Introducing Intel’s Knights Landing
Knights Landing is the codename for Intel’s 2nd generation Xeon Phi Product Family, which delivers massive thread parallelism, data parallelism and memory bandwidth – with improved single-thread performance and Intel Xeon processor binary-compatibility in a standard CPU form factor.
2U Intel Xeon Phi Processor (KNL) Quad Module Server/Omni-Path Host Fabric Interface
Designed for parallelized workflows in the HPC market and features four Intel Compute Modules, each with support for the Intel Xeon Phi Processor. The Intel Omni-Path Host Fabric Interface Adapter offers up to 100 Gbps per port of bandwidth, delivering performance that scales with high node and core counts. The hot-swappable compute modules, 3.5″ drive bays, and redundant power supply modules offer easy serviceability.
The most distinguishing feature of the chip is that it’s a bootable host CPU — unlike its predecessor Knights Corner, which is a coprocessor that connects over PCIe. The Knights Landing Phi is the first chip to offer an integrated fabric, Intel’s Omni-Path Architecture (OPA), in the package.
Knights Landing also puts integrated on-package memory in a processor, which benefits memory bandwidth and overall application performance. A six-channel memory controller supports up to 384 GB of DDR4-2400 memory (~90GB/s sustained bandwidth). There are 36 PCI Express 3.0 lanes for connecting PCIe SSDs or discrete graphics cards. The MIC (Many Integrated Cores) design fits 8 billion transistors on a die, using 14 nm process technology. The Phi product family comes in two variants: a stand-alone CPU, and a stand-alone CPU with integrated Omni-Path fabric technology. The SKU stack that Intel is launching includes four parts with different core counts, frequencies, TDPs and price points.
|Processor Number||Cache||Clock Speed||# of Cores/
# of Threads
|OPA on Chip|
|Xeon Phi Processor 7290F (16GB, 1.50 GHz, 72 core)||36||1.50 GHz||72/72||260||Yes|
|Xeon Phi Processor 7290 (16GB, 1.50 GHz, 72 core)||36||1.50 GHz||72/72||245||No|
|Xeon Phi Processor 7250F (16GB, 1.40 GHz, 68 core)||34||1.40 GHz||68/68||230||Yes|
|Xeon Phi Processor 7250 (16GB, 1.40 GHz, 68 core)||34||1.40 GHz||68/68||215||No|
|Xeon Phi Processor 7230F (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||230||Yes|
|Xeon Phi Processor 7230 (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||215||No|
|Xeon Phi Processor 7210F (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||230||Yes|
|Xeon Phi Processor 7210 (16GB, 1.30 GHz, 64 core)||32||1.30 GHz||64/64||215||No|
Many Trailblazing Improvements in Knights Landing
|Improvements||What / Why|
|Self-Boot Processor||No PCIe bottleneck|
|Binary Compatibility with Xeon||Runs all legacy software. No recompilation.|
|New Core: SLM based||~3x higher ST performance over KNC|
|Improved Vector density||3+ TFLOPS (DP) peak per chip|
|AVX 512 ISA||New 512-bit Vector ISA with Masks|
|Scatter/Gather Engine||Hardware support for gather and scatter|
|New memory technology: MCDRAM + DDR||Large High Bandwidth Memory, MCDRAM Huge bulk memory, DDR|
|New on-die interconnect: Mesh||High BW connection between cores and memory|
- Knights Landing (KNL) is the first self-boot Intel Xeon Phi processor
- Many improvements for performance and programmability
- Significant leap in scalar and vector performance
- Significant increase in memory bandwidth and capacity
- Binary compatible with Intel Xeon processor
- Common programming models between Intel Xeon processor and Intel Xeon Phi processor
- KNL offers immense amount of parallelism (both data and thread)
- Future trend is further increase in parallelism for both Intel Xeon processor and Intel Xeon Phi processor
- Developers need to prepare software to extract full benefits from this trend
Choose from Some of Our Most Popular Intel Xeon Phi Chassis
Intel Parallel Studio XE – Optimized Tools to Build Fast Code
Boost your applications performance with Intel C++ Compiler and Intel Fortran Compiler for Windows, Linux and OS X. The built-in OpenMP and Intel Cilk Plus parallel models combined with performance libraries simplify the implementation of fast, parallel code. Available in 3 editions: Cluster, Professional and Composer. As processors evolve, it is becoming more and more critical to both vectorize (use AVX or SIMD instructions) and thread software to realize the full performance potential of the processor. In some cases, code that is vectorized and threaded can be more than 175X faster than unthreaded / unvectorized code and about 7X faster than code that is only threaded or vectorized. And that gap is growing with every new processor generation.
Read more about Intel Parallel Studio XE 2017