NVIDIA GPU – ONE PLATFORM. UNLIMITED DATA CENTER ACCELERATION.
NVIDIA GPU – Accelerating scientific discovery, visualizing big data for insights, and providing smart services to consumers are everyday challenges for researchers and engineers. Solving these challenges takes increasingly complex and precise simulations, the processing of tremendous amounts of data, or training sophisticated deep learning networks. These workloads also require accelerating data centers to meet the growing exponential demand for computing.
NVIDIA Ampere is the world’s leading platform for accelerated data centers, deployed by some of the world’s largest super-computing centers and enterprises. It combines GPU accelerators, accelerated computing systems, interconnect technologies, development tools, GPU applications and Compilers, like PGI to enable faster scientific discoveries and big data insights.
Ampere is incredibly fast for training and inference, and has the ability to fractionalize and partition itself from a single large GPU with maximum Scale-Up performance, or Scale-Out and partition itself in up to 7 independent GPUs to accelerate multiple smaller applications. The new Ampere architecture yields a new data center architecture for acceleration that is flexible, high throughput and enables higher utilization.
Speak with One of Our System Engineers Today
THE EXPONENTIAL GROWTH OF GPU COMPUTING
For more than two decades, NVIDIA has pioneered visual computing, the art and science of computer graphics. With a singular focus on this field, NVIDIA GPUs offers specialized platforms for the gaming, professional visualization, data center, GPU server and automotive markets. NVIDIA’s work is at the center of the most consequential mega-trends in GPU cluster technology — virtual reality, artificial intelligence and self-driving cars.
GPU servers have become an essential part in the computational research world. From bioinformatics to weather modeling, GPUs have offered over 70x speed up on researcher’s code. With hundreds of applications already accelerated by these cards, check to see if your favorite applications are on the GPU applications list.
NVIDIA AMPERE A100
8th Generation Data Center GPU for the Age of Elastic Computing
NVIDIA Ampere A100 Tensor Core GPU adds many new features while delivering significantly faster performance for HPC, AI, and data analytics workloads. Powered by the NVIDIA’s latest Ampere GPU architecture, The latest model, the A100, utilizes 3rd Gen Tensor Cores, Sparsity Acceleration, MIG (Multi-Instance GPUs) and 3rd Gen NVLINK & NVSWITCH.

Our Services
High Performance Computing Professional Services

A100 Tensor Cores Accelerate HPC
The performance needs of HPC applications are growing rapidly. The A100 GPU supports Tensor operations that accelerate IEEE-compliant FP64 computations, delivering up to 2.5x the FP64 performance of the NVIDIA Tesla V100 GPU. Each SM in A100 computes a total of 64 FP64 FMA operations/clock (or 128 FP64 operations/clock), which is twice the throughput of Tesla V100.

Multi Instance GPUs
The new MIG feature can partition each A100 into as many as seven GPU Instances, each fully isolated with their own high-bandwidth memory, cache, and compute cores, for optimal utilization, effectively expanding access to every user and application. With NVIDIA Ampere architecture-based GPU, you can see and schedule jobs on their new virtual GPU instances as if they were physical GPUs.

AI Training and Inference
From scaling-up AI training and scientific computing, to scaling-out inference applications, to enabling real-time conversational AI, NVIDIA GPUs provide the necessary horsepower to accelerate numerous complex and unpredictable workloads. The NVIDIA A100 GPU delivers exceptional speedups over V100 for AI training and inference workloads
NVIDIA RTX A6000 AND A5000 GPUS
Perfectly Balanced. Blazing Performance.
Spearhead innovation from your desktop with the NVIDIA RTX™ A6000 and A5000 graphics cards, the perfect balance of power, performance, and reliability to tackle complex workflows. Built on the latest NVIDIA Ampere architecture and featuring 48 and 24 gigabytes (GB) of GPU memory, they’re everything designers, engineers, and artists need to realize their visions for the future, today.
WHAT TYPE OF GPU ARE YOUR LOOKING FOR?
Double Precision & Compute GPUs
Name | A100 (80GB SXM) |
A100 (40GB SXM) |
A100 (80GB PCIe) |
A100 (40GB PCIe) |
A30 |
---|---|---|---|---|---|
Appearance | ![]() |
![]() |
![]() |
![]() |
![]() |
Architecture | Ampere | Ampere | Ampere | Ampere | Ampere |
FP64 | 9.7 TF | 9.7 TF | 9.7 TF | 9.7 TF | 5.2 TF |
FP64 Tensor Core | 19.5 TF | 19.5 TF | 19.5 TF | 19.5 TF | 10.3 TF |
FP32 | 19.5 TF | 19.5 TF | 19.5 TF | 19.5 TF | 10.3 TF |
Tensor Float 32 (TF32) | 156 TF | 312 TF* | 156 TF | 312 TF* | 156 TF | 312 TF* | 156 TF | 312 TF* | 82 TF | 165 TF* |
BFLOAT16 Tensor Core | 312 TF | 624 TF* | 312 TF | 624 TF* | 312 TF | 624 TF* | 312 TF | 624 TF* | 165 TF | 330 TF* |
FP16 Tensor Core | |||||
INT8 Tensor Core | 627 TOPS | 1248 TOPS* | 626 TOPS | 1248 TOPS* | 625 TOPS | 1248 TOPS* | 624 TOPS | 1248 TOPS* | 330 TOPS | 661 TOPS* |
GPU Memory | 80 GB HBM2e | 40 GB HBM2 | 80 GB HBM2e | 40 GB HBM2 | 24GB HBM2 |
GPU Memory Bandwidth | 2,039 GB/s | 1,555 GB/s | 1,935 GB/s | 1,555 GB/s | 933 GB/s |
TDP | 400 W | 400 W | 300 W | 250 W | 165 W |
Interconnect | NVLink: 600GB/s PCIe Gen4: 64GB/s |
NVLink: 600GB/s PCIe Gen4: 64GB/s |
NVIDIA® NVLink® Bridge for 2 GPUs: 600GB/s PCIe Gen4: 64GB/s |
NVIDIA® NVLink® Bridge for 2 GPUs: 600GB/s PCIe Gen4: 64GB/s |
PCIe Gen4: 64GB/s Third-gen NVIDIA® NVLINK® 200GB/s |
Single Precision & Visualization GPUs
Name | A40 | A6000 | A5000 | A4000 | A2000 |
---|---|---|---|---|---|
Appearance | ![]() |
![]() |
![]() |
![]() |
![]() |
Architecture | Ampere | Ampere | Ampere | Ampere | Ampere |
FP64 | 1,250 GF | 867.8 GF | 599 GF | 125 GF | |
FP32 | 37.4 TF | 38.7 TF | 27.8 TF | 19.2 TF | 8 TF |
Tensor Float 32 (TF32) | 74.8 | 149.6* | 309.7 TF | 222.2 TF | 153.4 TF | 63.9 TF |
BFLOAT16 Tensor Core | 149.7 TF | 299.4 TF* |
||||
FP16 Tensor Core | |||||
INT8 Tensor Core | 299.3 TOPS | 598.6 TOPS* | ||||
GPU Memory | 48 GB GDDR6 with ECC | 48 GB GDDR6 | 24 GB GDDR6 | 16 GB GDDR6 | 6 GB GDDR6 |
GPU Memory Bandwidth | 696 GB/s | 768 GB/s | 768 GB/s | 448 GB/s | 288 GB/s |
TDP | 300 W | 300 W | 230 W | 140 W | 70 W |
Interconnect | NVIDIA® NVLink® 112.5 GB/s (bidirectional) PCIe Gen4 31.5 GB/s (bidirectional) |
NVLink: 112.5 GB/s PCIe Gen4: 64GB/s |
NVLink: 112.5 GB/s PCIe Gen4: 64GB/s |
PCIe Gen4: 64GB/s | PCIe Gen4: 64GB/s |
Virtualization GPUs
Name | A16 | A10 | T4 |
---|---|---|---|
Appearance | ![]() |
![]() |
![]() |
Architecture | Ampere | Ampere | Turing |
FP64 | 271.2 GF | ||
FP32 | 8.678 TF | 31.2 TF | 8.1 TF |
Tensor Float 32 (TF32) | 62.5 TF | 125 TF* | 65 TF | |
BFLOAT16 Tensor Core | 8.678 TF | 125 TF | 250 TF* | |
FP16 Tensor Core | |||
INT8 Tensor Core | 250 TOPS | 500 TOPS* | 130 TOPS | |
GPU Memory | 4x 16GB GDDR6 with ECC | 24 GB GDDR6 | 16 GB GDDR6 |
GPU Memory Bandwidth | 4x 232 GB/s | 600 GB/s | 300 GB/s |
TDP | 250 W | 150 W | 70 W |
Interconnect | PCI Express Gen 4 x16 | PCIe Gen4: 64 GB/s | PCIe 3.0 x 16 |
Speak with One of Our System Engineers Today
Software Tools for GPU Computing

Tensorflow Artificial Intelligence Library
Tensorflow, developed by google, is an open source symbolic math library for high performance computation. It has quickly become an industry standard for artificial intelligence and machine learning applications, and is known for its flexibility, used in many scientific disciplines. It is based on the concept of a Tensor, which, as you may have guessed, is where the Volta Tensor Cores gets its name.

GPU Accelerated Libraries
There are a handful of GPU accelerated libraries that developers can use to speed up applications using GPUs. Many of them are NVIDIA CUDA libraries (such as cuBLAS and CUDA Math Library), but there are others such as IMSL Fortran libraries and HiPLAR (High Performance Linear Algebra in R). These libraries can be linked to replace standard libraries that are commonly used in non-GPU-Accelerated computing.

CUDA Development Toolkit
NVIDIA has created an entire toolkit devoted to computing on their CUDA-enabled GPUs. The CUDA toolkit, which includes the CUDA libraries, are the core of many GPU-Accelerated programs. CUDA is one of the most widely used toolkits in the GPGPU world today.

NVIDIA Deep Learning SDK
In today’s world, Deep Learning is becoming essential in many segments of the industry. For instance, Deep Learning is key in voice and image recognition where the machine must learn while gaining input. Writing algorithms for machines to learn from data is a difficult task, but NVIDIA has written a Deep Learning SDK to provide the tools necessary to help design code to run on GPUs.

OpenACC Parallel Programming Model
OpenACC is a user-driven directive-based performance-portable parallel programming model. It is designed for scientists and engineers interested in porting their codes to a wide-variety of heterogeneous HPC hardware platforms and architectures with significantly less programming effort than required with a low-level model. . The OpenACC Directives can be a powerful tool in porting a user’s application to run on GPU servers. There are two key features to OpenACC: easy of use and portability. Applications that use OpenACC can not only run on NVIDIA GPUs, but it can run on other GPUs, X86 CPUs & POWER CPUs, as well.