Functio Memory Tier Optimization Demo

Analyze and optimize CUDA kernels for better memory tier use

About This Demo

This demo focuses on optimizing memory flow between RAM and VRAM of the system.

CPU AMD EPYC 7R32 (8 cores / 16 threads, 64 GiB RAM)
GPU NVIDIA A10G Tensor Core (24 GB VRAM)

The kernel must be standalone compilable (include main() function and all dependencies)