- Please open IA3_2022 folder that contains code and instructions for IA3 2022 paper.
- BibTeX to cite this work:
@INPROCEEDINGS{10027548,
author={Shovon, Ahmedur Rahman and Dyken, Landon Richard and Green, Oded and Gilray, Thomas and Kumar, Sidharth},
booktitle={2022 IEEE/ACM Workshop on Irregular Applications: Architectures and Algorithms (IA3)},
title={Accelerating Datalog applications with cuDF},
year={2022},
volume={},
number={},
pages={41-45},
doi={10.1109/IA356718.2022.00012}}
- Configuration data are collected using
nvidia-smi
,lscpu
,free -h
- Theta GPU (single-gpu node):
- GPU 0: NVIDIA A100-SXM4-40GB
- CPU Model name: AMD EPYC 7742 64-Core Processor
- CPU(s): 256
- Thread(s) per core: 2
- Core(s) per socket: 64
- Socket(s): 2
- CPU MHz: 3399.106
- L1d cache: 4 MiB
- L1i cache: 4 MiB
- L2 cache: 64 MiB
- L3 cache: 512 MiB
- Total memory: 1.0Ti
- Short CUDA tutorial
- nVidia CUDA C programming guide
- Theta GPU nodes
- Getting started video
- Getting Started on ThetaGPU
- Submit a Job on ThetaGPU
- Running jobs at Theta
- Chapter 39. Parallel Prefix Sum (Scan) with CUDA
- GPGPU introduction
- CUDA 2d thread block
- CUB
- Documentation on CUDF Drop
- Documentation on CUDF Drop Duplicates
- Documentation on CUDF concatenate
- Open addressing hash table
- Open addressing techniques
- A Simple GPU Hash Table
- CUDA Pro Tip: Occupancy API Simplifies Launch Configuration
- CUDA thrust
- Count in thrust
- Basics of Hash Tables
- MurmurHash3
- Thrust stable_sort()