NVIDIA GPU – One Platform. Unlimited Data Center Acceleration.

NVIDIA GPU – Accelerating scientific discovery, visualizing big data for insights, and providing smart services to consumers are everyday challenges for researchers and engineers. Solving these challenges takes increasingly complex and precise simulations, the processing of tremendous amounts of data, or training sophisticated deep learning networks. These workloads also require accelerating data centers to meet the growing demand for exponential computing. NVIDIA Tesla is the world’s leading platform for accelerated data centers, deployed by some of the world’s largest supercomputing
centers and enterprises. It combines GPU accelerators, accelerated computing systems, interconnect technologies, development tools, GPU applications and Compilers, like PGI to enable faster scientific discoveries and big data insights. At the heart of the Tesla platform are the massively parallel GPU accelerators that provide dramatically higher throughput for compute‑intensive workloads—without increasing the power budget and physical footprint of data centers.

NVIDIA Tesla V100

The Most Advanced Data Center GPU Ever Built.

NVIDIA® Tesla® V100 is the world’s most advanced data center GPU ever built to accelerate AI, HPC, and graphics. Powered by NVIDIA Volta™, the latest GPU architecture, Tesla V100 offers the performance of 100 CPUs in a single GPU—enabling data scientists, researchers, and engineers to tackle challenges that were once impossible. Read more about NVIDIA Tesla V100 Volta GPU
NVIDIA Volta V100

Three Reasons to Upgrade to the NVIDIA Tesla V100

NVIDIA Volta V100

1Be Prepared for the AI Revolution

NVIDIA Tesla V100 is the computational engine driving the AI revolution and enabling HPC breakthroughs. For example, researchers at the University of Florida and the University of North Carolina leveraged GPU deep learning to develop ANAKIN-ME (ANI) to reproduce molecular energy surfaces at extremely high (DFT) accuracy and 1-10/millionths of the cost of current computational methods.
Boost Data Center Productivity V100

2Boost Data Center Productivity & Throughput

Data center managers all face the same challenge: how to meet the demand for computing resources that often exceed available cycles in the system. NVIDIA Tesla V100 dramatically boosts the throughput of your data center with fewer nodes, completing more jobs and improving data center efficiency. A single server node with V100 GPUs can replace up to 50 CPU nodes. For example, for HOOMD-Blue, a single node with four V100’s will do the work of 43 dual-socket CPU nodes while for MILC a single V100 node can replace 14 CPU nodes. With lower networking, power, and rack space overheads, accelerated nodes provide higher application throughput at substantially reduced costs.
GPU-Accelerated NVIDIA Volta V100

3Top Applications are GPU-Accelerated

Over 450 HPC applications are already GPU-optimized in a wide range of areas including quantum chemistry, molecular dynamics, climate and weather, and more. In fact, an independent study by Intersect360 Research shows that 70% of the most popular HPC applications, including 10 of the top 10 have built-in support for GPUs. With the most popular HPC applications and all deep learning frameworks GPU-accelerated, every HPC customer would see most of their data center workload benefit from GPU-accelerated computing.

NVIDIA Tesla V100 Volta Specifications

PERFORMANCE with NVIDIA GPU Boost Interconnect Bandwidth PERFORMANCE with NVIDIA GPU Boost
Connectivity Single-Precision Double-Precision Deep Learning Bi-Directional Power Capacity Bandwidth
NVLink 7.5 TFLOPS 15 TFLOPS 120 TFLOPS 300 GB/s 300 WATTS 16 or 32 GB HBM2 900 GB/s
PCIe 7 TFLOPS 14 TFLOPS 112 TFLOPS 32 GB/s 250 WATTS 16 or 32 GB HBM2 900 GB/s

Choose the Right NVIDIA Tesla Solution for You

V100 32GB NVLINK V100 16GB NVLINK V100 32GB PCIe V100 16GB PCIe T4
Appearance NVIDIA GPU - NVIDIA Tesla V100 GPU Computing Accelerator NVLINK 32GB NVIDIA GPU - NVIDIA Tesla V100 GPU Computing Accelerator PCIe 16GB NVIDIA Tesla T4 used for GPU Clusters and GPU Servers
Form Factor SXM2 (NVLINK) PCI Express x16 Full Height/Length PCI Express x16/x8 Low-profile
  • 640 Turing Tensor cores
  • 5,120 CUDA® Cores
  • 640 Turing Tensor cores
  • 5,120 CUDA® Cores
  • 320 Turing Tensor cores
  • 2,560 CUDA® Cores
Interconnect Bandwidth 300GB/sec 32GB/sec 32GB/sec
  • Single-precision:15.7 TFLOPS
  • Double-precision:7.8 TFLOPS
  • Tensor:125 TFLOPS
  • Single-precision:14 TFLOPS
  • Double-precision:7 TFLOPS
  • Tensor:112 TFLOPS
  • Single-precision:8.7 TFLOPS
  • Mixed-precision(FP16/FP32):65 TFLOPS
  • INT8:130 TOPS
  • INT4:260 TOPS
Memory 32 GB HBM2 900GB/sec ECC 16 GB HBM2 900GB/sec ECC 32 GB HBM2 900GB/sec ECC 16 GB HBM2 900GB/sec ECC 16 GB GDDR6 300 GB/s ECC
Key Requirements
  • Performance (Double and Single Precision)
  • Interconnect Bandwidth
  • Programmability
  • Performance (Double and Single Precision)
  • Form Factor
  • Programmability
  • Performance (Mixed-Precision)
  • Performance (Low-Precision)
  • Power Footprint
  • Form Factor
Workload Profile
  • Volta Architecture, Tensor Core, Next Gen NVLink
  • Hyperscale & HPC Data Centers Running Apps that Scale to Multiple GPUs
  • Mixed Workloads
  • TensorFlow training
  • Specific Applications such as 3D RTM
  • Deep Learning Frameworks such as Caffe and TensorFlow
  • Volta Architecture, Tensor Core, Next Gen NVLink
  • Hyperscale & HPC Data Centers Running Apps that Scale to Multiple GPUs
  • Mixed Workloads
  • TensorFlow training
  • Specific Applications such as 3D RTM
  • Deep Learning Frameworks such as Caffe and TensorFlow
  • Turing Architecture, Tensor Core
  • Mixed Inference Workloads such as Image, Video or Data Processing
  • High-density VDI clusters
  • TensorRT (TensorFlow inference)
Begin Building Your New V100 NVLINK System Begin Building Your New V100 PCIe System Begin Building Your New T4 System

The Exponential Growth of GPU Computing

For more than two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics. With a singular focus on this field, NVIDIA GPUs offers specialized platforms for the gaming, professional visualization, data center, GPU server and automotive markets. NVIDIA’s work is at the center of the most consequential mega-trends in GPU cluster technology — virtual reality, artificial intelligence and self-driving cars.
NVIDIA GPU Computing Preferred Solutions Provider Logo
GPU servers have become an essential part in the computational research world. From bioinformatics to weather modeling, GPUs have offered over 70x speed up on researcher’s code. With hundreds of applications already accelerated by these cards, check to see if your favorite applications are on the GPU applications list.

Tools for GPU Computing.

Tensorflow Tensorflow, developed by google, is an open source symbolic math library for high performance computation. It has quickly become an industry standard for artificial intelligence and machine learning applications, and is known for its flexibility, used in many scientific disciplines.
GPU Accelerated Libraries
GPU Accelerated Libraries There are a handful of GPU accelerated libraries that developers can use to speed up applications using GPUs. Many of them are NVIDIA CUDA libraries (such as cuBLAS and CUDA Math Library), but there are others such as IMSL Fortran libraries and HiPLAR (High Performance Linear Algebra in R). These libraries can be linked to replace standard libraries that are commonly used in non-GPU-Accelerated computing.
CUDA NVIDIA has created an entire toolkit devoted to computing on their CUDA-enabled GPUs. The CUDA toolkit, which includes the CUDA libraries, are the core of many GPU-Accelerated programs. CUDA is one of the most widely used toolkits in the GPGPU world today.
Deep Learning SDK
NVIDIA Deep Learning SDK In today’s world, Deep Learning is becoming essential in many segments of the industry. For instance, Deep Learning is key in voice and image recognition where the machine must learn while gaining input. Writing algorithms for machines to learn from data is a difficult task, but NVIDIA has written a Deep Learning SDK to provide the tools necessary to help design code to run on GPUs.
OpenACC The OpenACC Directives can be a powerful tool in porting a user’s application to run on GPU servers. There are two key features to OpenACC and that are that is it easy, and portable. Applications that use OpenACC can not only run on NVIDIA GPUs, but it can run on other GPUs and CPUs as well.

NVIDIA Accelerators dramatically lower data center costs by delivering exceptional performance with fewer, more powerful servers. This increased throughput means more scientific discoveries delivered to researchers every day.

Asetek Pascal Liquid Cooler

Keep it Cool

Asetek direct-to-chip liquid cooling focuses on removing heat from the hottest locations in servers. GPUs and other coprocessors are a growing hot spot in high performance servers as manufactures offload processor intensive tasks from the main processor for more performance. Power consumption of greater than 300 watts per GPU (or GPGPU co-processors) are becoming the norm and can easily be addressed with Asetek technology. Learn more about Asetek and Liquid Cooling.

Choose from Some of Our Most Popular GPU Capable Servers

Supermicro Storage Servers

4U SuperWorkstations

Up to 4 GPUs or Coprocessors.
Shop 4U 7048GR-TR
Supermicro 1U SuperServer 1029GQ-TXRT GPU Server

1U SuperServer 1029GQ-TXRT

Up to 4 P100s with 10GBase-T Ethernet
Shop 1U 1029GQ-TXRT

NVIDIA Tesla T4 Tensor Core GPU: The Price Performance Leader

The next level of acceleration, the NVIDIA T4 is a single-slot, 6.6-inch Gen3 PCIe Universal Deep Learning Accelerator based on the TU104 NVIDIA GPU. It supports both x8 and x16 PCI Express, and 32 GB/sec interconnect bandwidth. The T4’s small form factor design allows all of this, yet is energy efficient, consuming only 70 watts of power. Passive thermal design supports bi-directional airflow (R-L or L-R). The T4 utilizes Turing™ Architecture and has 320 Turing™ Tensor Cores, as well as 2,560 CUDA® cores, supporting CUDA™, TensorRT™, and ONNX compute APIs.
Nvidia Tesla T4 used for GPU Clusters and GPU Servers

Multi-precision performance specifications

  • Single-precision (8.1 TFLOPS)
  • Mixed-Precision FP32 and FP16 (65 TFLOPS)
  • INT8 (130 TOPS)
  • INT4 (260 TOPS)


The T4 boasts 16GB of GDDR6 ECC memory, with 256 bit memory bus, memory clock up to 5001 MHz., and peak memory bandwidth up to 320 GBytes per second.

The T4 provides up to 9.3X higher performance than CPUs on training and up to 36X on inference.

Inference Performance vs. CPU*

T4 vs. CPU Inference Performance

Training Performance vs. CPU*

T4 vs. CPU Training Performance
*Comparison made of dual NVIDIA T4 GPUs versus servers with dual-socket Xeon Gold 6140 CPU.

Some of Our Most Popular Tesla Capable Servers

Supermicro 1U SuperServer 1029GQ-TRT

1U 1029GQ-TRT SuperServer

Holds 4 GPUs and has Dual Port 10GbE.
Shop 1U 1029GQ-TRT
Supermicro 2U SuperServer 2028GR-TRT

2U 2028GR-TRT SuperServer

Up to 4 GPUs or Coprocessors.
Shop 2U 2028GR-TRT
Supermicro SuperServer 4028GR-TR2

4U 4028GR-TR2 SuperServer

Dual Socket Intel Xeon E5-2600 v3/v4.
Shop 4U 4028GR-TR2