About Experiments with Large-Scale Data #1

dcm-nakashima · 2022-03-14T02:24:38Z

Thanks for sharing the script.
I am trying to do an experiment on my own large dataset.
#Nodes | #Edges
700,000 | 6,000,000

I would like to get the node embedding result of this graph

Unfortunately, CUDA out of memory occurred even on the collab dataset in my environment.
I used g4dn.16xlarge aws instance. (GPU: NVIDIA T4 GPU/1/16GB)

Can I use PEG while saving GPU memory?

Have you done any experiments with datasets of this size?
For example,

ogbl-ppa
ogbl-citation2

dcm-nakashima · 2022-03-16T02:34:05Z

@ZoomWang666 @lipan00123
I forgot to add a mentions.
If you know anything about this, please let me know.

zoom-wang112358 · 2022-03-16T05:11:05Z

Hello,

I think maybe you can try to reduce the PE dimension to save GPU memory. For PEG-LE+ and PEG-DW+, they need more GPU memory than PEG-LE and PEG-DW, and we are planning to address this issue.

dcm-nakashima · 2022-03-16T05:41:26Z

@ZoomWang666
Thank you for your prompt reply.
I will lower hidden_channels(default=256).
(For example 64, 128)

dcm-nakashima · 2022-03-16T05:45:14Z

Have you done any experiments with datasets of this size?
For example,
ogbl-ppa
ogbl-citation2

What do you think about this one?

zoom-wang112358 · 2022-03-16T20:59:11Z

This project is still ongoing. The ICLR paper mainly focus on the theoretical understanding of positional encoding.

We are now working on a standard framework of PEG for large-scale networks (100M+ nodes)，and the standard framework will be released later.

dcm-nakashima · 2022-03-17T04:42:04Z

Thank you for your reply.
I understand the situation.
I am also aware that this is a theoretical stage.

We are now working on a standard framework of PEG for large-scale networks (100M+ nodes)，and the standard framework will be released later.

I am looking forward to it!

lipan00123 · 2022-04-03T01:02:33Z

@dcm-nakashima Thanks for checking our work. May I ask a follow-up question? When you decreased the hidden_channels dimension and use PEG-DW or PEG-LE instead of PEG-DW++ or PEG-LE++, did you successfully run the code on your dataset with 700k nodes?

I am also curious if you are able to run standard GCN on your network based on your 16G GPU. If even standard GCN cannot work, the current pipeline of PEG may not work as well. Further graph partitioning/downsampling-based pipeline that we are working on is needed.

dcm-nakashima · 2022-04-05T05:30:24Z

@lipan00123
Thank you for your question.
I have not done so yet and will do so in the future.
I will share the results with you.

dcm-nakashima · 2022-05-12T05:43:20Z

@lipan00123
Since the above large data set could not be used, we used sampled small datasets.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About Experiments with Large-Scale Data #1

About Experiments with Large-Scale Data #1

dcm-nakashima commented Mar 14, 2022 •

edited

Loading

dcm-nakashima commented Mar 16, 2022

zoom-wang112358 commented Mar 16, 2022

dcm-nakashima commented Mar 16, 2022

dcm-nakashima commented Mar 16, 2022

zoom-wang112358 commented Mar 16, 2022

dcm-nakashima commented Mar 17, 2022

lipan00123 commented Apr 3, 2022 •

edited

Loading

dcm-nakashima commented Apr 5, 2022

dcm-nakashima commented May 12, 2022

About Experiments with Large-Scale Data #1

About Experiments with Large-Scale Data #1

Comments

dcm-nakashima commented Mar 14, 2022 • edited Loading

dcm-nakashima commented Mar 16, 2022

zoom-wang112358 commented Mar 16, 2022

dcm-nakashima commented Mar 16, 2022

dcm-nakashima commented Mar 16, 2022

zoom-wang112358 commented Mar 16, 2022

dcm-nakashima commented Mar 17, 2022

lipan00123 commented Apr 3, 2022 • edited Loading

dcm-nakashima commented Apr 5, 2022

dcm-nakashima commented May 12, 2022

dcm-nakashima commented Mar 14, 2022 •

edited

Loading

lipan00123 commented Apr 3, 2022 •

edited

Loading