Skip to main content

June 17, 2025 - NVSHMEM and GDRcopy support

NVSHMEM and GDRcopy support now available for enhanced GPU communication and performance

Update SUNK Support for NVSHMEM and GDRCopy is now available in SUNK, enabling high-performance GPU-to-GPU communication. These technologies provide significant performance improvements for distributed GPU workloads and multi-GPU applications.

Overview

SUNK now supports NVSHMEM and GDRCopy, enabling high-performance GPU-to-GPU communication. These technologies provide significant performance improvements for distributed GPU workloads and multi-GPU applications.

NVSHMEM support

NVSHMEM (NVIDIA SHMEM) is a parallel programming interface that provides high-performance communication between GPUs across nodes.

Key features

FeatureDescription
High-performance communicationOptimized GPU-to-GPU communication across nodes
Memory consistencyStrong memory consistency guarantees
ScalabilityEfficient communication patterns for large-scale deployments
Ease of useFamiliar programming model similar to OpenSHMEM

Use cases

NVSHMEM is ideal for:

  • Distributed training: Large-scale machine learning model training
  • Scientific computing: High-performance computing applications
  • Data analytics: GPU-accelerated data processing workflows
  • Multi-node applications: Applications requiring GPU communication across nodes

GDRCopy support

GDRCopy (GPU Direct RDMA Copy) enables direct memory access between GPU memory and network interfaces, bypassing CPU memory.

Key features

FeatureDescription
Direct memory accessGPU memory to network interface communication
Reduced latencyBypass CPU memory for faster data transfer
Higher bandwidthOptimized data transfer performance
CPU offloadingReduce CPU overhead in data transfer operations

Use cases

GDRCopy is beneficial for:

  • High-frequency trading: Low-latency data processing
  • Real-time analytics: Fast data ingestion and processing
  • Streaming applications: Continuous data processing workflows
  • Network-intensive workloads: Applications with high network I/O requirements

Performance benefits

Communication performance

  • Reduced latency: Direct GPU-to-GPU communication
  • Higher bandwidth: Optimized data transfer paths
  • Better scalability: Efficient communication patterns for large clusters
  • CPU offloading: Reduced CPU overhead in communication operations

Application performance

  • Faster training: Accelerated distributed machine learning
  • Improved throughput: Higher data processing rates
  • Better resource utilization: More efficient use of GPU resources
  • Enhanced scalability: Better performance at scale

Configuration

Enabling NVSHMEM and GDRCopy

To enable these features in your SUNK deployment:

  1. Update your SUNK Helm values to include NVSHMEM and GDRCopy support
  2. Configure the necessary network settings
  3. Deploy the updated SUNK configuration
  4. Verify the features are working correctly

Requirements

  • Compatible NVIDIA GPUs with NVSHMEM and GDRCopy support
  • Appropriate network infrastructure (InfiniBand recommended)
  • Updated SUNK version with support for these features

Documentation

For detailed setup and configuration instructions, see:

Migration notes

Existing deployments

  • Existing SUNK deployments will continue to work without changes
  • NVSHMEM and GDRCopy are optional features that can be enabled as needed
  • No migration is required for existing workloads

Planning considerations

When planning to use NVSHMEM and GDRCopy:

  1. Hardware compatibility: Ensure your GPUs support these features
  2. Network requirements: Verify network infrastructure compatibility
  3. Application compatibility: Check if your applications can benefit from these features
  4. Performance testing: Test performance improvements in your specific use case