NVIDIA A800 40GB HBM2 ECC PCIe GPU | 6912 CUDA Cores, NVLink Support

Part#: VCNA800-PB
  • CUDA Cores: 6912

  • Tensor Cores: 432 third-generation Tensor Cores

  • GPU Memory: 40 GB HBM2 ECC

  • Memory Interface: 5,120-bit

  • Interface: PCI Express 4.0

  • Display Outputs: None (no display connectors)

  • NVLink: NVIDIA NVLink support

  • Cooling: Active fansink

  • Form Factor: CEM5

  • Power Connector: 16-pin auxiliary power connector

  • Included Cables: 16-pin to dual 8-pin PCIe auxiliary power cable

  • Maximum Power Consumption: 250 W

NVIDIA A800 40GB ACTIVE 

PERFORMANCE AND USEABILITY FEATURES

 

NVIDIA Ampere Architecture

 

NVIDIA A800 40GB Active is one of the world's most powerful data center GPUs for AI, data analytics, and high-performance computing (HPC) applications. Building upon the major SM enhancements from the Turing GPU, the NVIDIA Ampere architecture enhances tensor matrix operations and concurrent executions of FP32 and INT32 operations.

More Efficient CUDA Cores

 

The NVIDIA Ampere architecture's CUDA® cores bring up to 2.5x the single-precision floating point (FP32) throughput compared to the previous generation, providing significant performance improvements for any class or algorithm, or application that can benefit from embarrassingly parallel acceleration techniques.

Third-Generation Tensor Cores

 

Purpose-built for deep learning matrix arithmetic at the heart of neural network training and inferencing functions, the NVIDIA A800 40GB Active includes enhanced Tensor Cores that accelerate more datatypes (TF32 and BF16) and includes a new Fine-Grained Structured Sparsity feature that delivers up to 2x throughput for tensor matrix operations compared to the previous generation.

Multi-Instance GPU (MIG): Securely, Isolated Multi-Tenancy

 

Every AI and HPC application benefits from acceleration, but not all require a full A800 40GB GPU. With Multi-Instance GPU (MIG), a single A800 can be partitioned into up to seven fully isolated instances, each with dedicated memory, cache, and compute. This enables guaranteed performance, efficient GPU utilization, and right-sized acceleration for multiple users and applications.

Ultra-Fast HBM2 Memory

 

The NVIDIA A800 40GB Active GPU features 40GB of high-speed HBM2 memory with 1,555GB/s bandwidth and a 48MB L2 cache—nearly 7× larger than the previous generation—delivering extreme performance for compute-intensive AI workloads.

PCIe Gen 4

 

The NVIDIA A800 40GB Active supports PCI Express Gen 4, which provides double the bandwidth of PCIe Gen 3, improving data-transfer speeds from CPU memory for data-intensive tasks like AI and data science.

 

G4L3‑ZD1‑LAX5 GPU Server

NVIDIA A800 40GB Active

 

High-Performance Data Science and AI Platform

 

Rapid growth in workload complexity, data size, and the proliferation of emerging workloads like generative AI are ushering in a new era of computing, accelerating scientific discovery, improving productivity, and revolutionizing content creation. As models continue to explode in size and complexity to take on next-level challenges, an increasing number of workloads will need to run on local devices. Next-generation workstation platforms will need to deliver high-performance computing capabilities to support these complex workloads.

The NVIDIA A800 40GB Active GPU accelerates data science, AI, and HPC workflows with 432 third-generation Tensor Cores to maximize AI performance and ultra-fast and efficient inference capabilities. With third-generation NVIDIA NVLink technology, A800 40GB Active offers scalable performance for heavy AI workloads, doubling the effective memory footprint and enabling GPU-to-GPU data transfers up to 400 gigabytes per second (GB/s) of bidirectional bandwidth. This board is an AI-ready development platform with NVIDIA AI Enterprise, and delivers workstations ideally suited to the needs of skilled AI developers and data scientists.

 

 

 

MULTI-GPU TECHNOLOGY SUPPORT

 

Connect a pair of NVIDIA A800 40GB Active cards with NVLink to increase the effective memory footprint and scale application performance by enabling GPU-to-GPU data transfers at rates up to 100GB/s (bidirectional) for a total bandwidth of 200GB/s. Scaling applications across multiple GPUs requires extremely fast movement of data. The third generation of NVLink in A800 40GB Active provides 400GB/s of GPU-to-GPU direct bandwidth.

SOFTWARE SUPPORT

  • Software Optimized for AI
    Deep learning frameworks such as Caffe2, MXNet, CNTK, TensorFlow, and others deliver dramatically faster training times and higher multi-node training performance. GPU-accelerated libraries such as cuDNN, cuBLAS, and TensorRT deliver higher performance for both deep learning inference and High-Performance Computing (HPC) applications.

 

  • NVIDIA CUDA Parallel Computing Platform
    Natively execute standard programming languages like C/C++ and Fortran, and APIs such as OpenCL, OpenACC, and Direct Compute to accelerate techniques such as ray tracing, video and image processing, and computation fluid dynamics.

 

  • Unified Memory
    A single, seamless 49-bit virtual address space allows for the transparent migration of data between the full allocation of CPU and GPU memory.

 

  • NVIDIA AI Enterprise
    Enterprise adoption of AI is now mainstream and leading to an increased demand for skilled AI developers and data scientists. Organizations require a flexible, high-performance platform consisting of optimized hardware and software to maximize productivity and accelerate AI development. NVIDIA A800 40GB Active and NVIDIA AI Enterprise provide an ideal foundation for these vital initiatives.
 Specifications
 ProductNVIDIA A800 40GB Active
 ArchitectureNVIDIA Ampere
 FoundryTSMC
 Process Size7 nm NVIDIA Custom Process
 Die Size826 mm
 CUDA® Cores6912
 Streaming Multiprocessors108
 Tensor Cores | Gen 3432
 FP64 Performance9.7 TFLOPS
 FP32 Performance19.5 TFLOPS
 TF32 Tensor Core311.8 TFLOPS*
 BFLOAT16 Tensor Core312 TFLOPS | 624 TFLOPS*
 FP16 Tensor Core312 TFLOPS | 624 TFLOPS*
 INT8 Tensor Core1247.4 TOPS*
 INT4 Tensor Core1248 TOPS | 2496 TFLOPS*
 NVLink2-way low profile (2-slot and 3-slot bridges), 400 GB/s bidirectional
 NVLink Bandwidth400 GB/s
GPU Memory40GB HBM2
Memory Interface5120-bit
Memory Bandwidth1555.2 GB/s
Multi-Instance GPU SupportUp to 7 MIG Instances
System InterfacePCIe 4.0 x16
Display SupportNone Provided, use companion NVIDIA T1000 or RTX A4000 board for video output
Thermal SolutionActive
Form Factor4.4" H x 10.5" L, Dual-Slot
Power ConnectorCEM5 16-pin
Maximum Power Consumption240W
$13,125.00