June 17, 2025 - NVSHMEM and GDRcopy support
NVSHMEM and GDRcopy support now available for enhanced GPU communication and performance
Update SUNK Support for NVSHMEM and GDRCopy is now available in SUNK, enabling high-performance GPU-to-GPU communication. These technologies provide significant performance improvements for distributed GPU workloads and multi-GPU applications.
Overview
SUNK now supports NVSHMEM and GDRCopy, enabling high-performance GPU-to-GPU communication. These technologies provide significant performance improvements for distributed GPU workloads and multi-GPU applications.
NVSHMEM support
NVSHMEM (NVIDIA SHMEM) is a parallel programming interface that provides high-performance communication between GPUs across nodes.
Key features
Feature | Description |
---|---|
High-performance communication | Optimized GPU-to-GPU communication across nodes |
Memory consistency | Strong memory consistency guarantees |
Scalability | Efficient communication patterns for large-scale deployments |
Ease of use | Familiar programming model similar to OpenSHMEM |
Use cases
NVSHMEM is ideal for:
- Distributed training: Large-scale machine learning model training
- Scientific computing: High-performance computing applications
- Data analytics: GPU-accelerated data processing workflows
- Multi-node applications: Applications requiring GPU communication across nodes
GDRCopy support
GDRCopy (GPU Direct RDMA Copy) enables direct memory access between GPU memory and network interfaces, bypassing CPU memory.
Key features
Feature | Description |
---|---|
Direct memory access | GPU memory to network interface communication |
Reduced latency | Bypass CPU memory for faster data transfer |
Higher bandwidth | Optimized data transfer performance |
CPU offloading | Reduce CPU overhead in data transfer operations |
Use cases
GDRCopy is beneficial for:
- High-frequency trading: Low-latency data processing
- Real-time analytics: Fast data ingestion and processing
- Streaming applications: Continuous data processing workflows
- Network-intensive workloads: Applications with high network I/O requirements
Performance benefits
Communication performance
- Reduced latency: Direct GPU-to-GPU communication
- Higher bandwidth: Optimized data transfer paths
- Better scalability: Efficient communication patterns for large clusters
- CPU offloading: Reduced CPU overhead in communication operations
Application performance
- Faster training: Accelerated distributed machine learning
- Improved throughput: Higher data processing rates
- Better resource utilization: More efficient use of GPU resources
- Enhanced scalability: Better performance at scale
Configuration
Enabling NVSHMEM and GDRCopy
To enable these features in your SUNK deployment:
- Update your SUNK Helm values to include NVSHMEM and GDRCopy support
- Configure the necessary network settings
- Deploy the updated SUNK configuration
- Verify the features are working correctly
Requirements
- Compatible NVIDIA GPUs with NVSHMEM and GDRCopy support
- Appropriate network infrastructure (InfiniBand recommended)
- Updated SUNK version with support for these features
Documentation
For detailed setup and configuration instructions, see:
Migration notes
Existing deployments
- Existing SUNK deployments will continue to work without changes
- NVSHMEM and GDRCopy are optional features that can be enabled as needed
- No migration is required for existing workloads
Planning considerations
When planning to use NVSHMEM and GDRCopy:
- Hardware compatibility: Ensure your GPUs support these features
- Network requirements: Verify network infrastructure compatibility
- Application compatibility: Check if your applications can benefit from these features
- Performance testing: Test performance improvements in your specific use case