Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Experiments with Large-Scale Data #1

Open
dcm-nakashima opened this issue Mar 14, 2022 · 9 comments
Open

About Experiments with Large-Scale Data #1

dcm-nakashima opened this issue Mar 14, 2022 · 9 comments

Comments

@dcm-nakashima
Copy link

dcm-nakashima commented Mar 14, 2022

Thanks for sharing the script.
I am trying to do an experiment on my own large dataset.
#Nodes | #Edges
700,000 | 6,000,000

I would like to get the node embedding result of this graph

Unfortunately, CUDA out of memory occurred even on the collab dataset in my environment.
I used g4dn.16xlarge aws instance. (GPU: NVIDIA T4 GPU/1/16GB)

Can I use PEG while saving GPU memory?

Have you done any experiments with datasets of this size?
For example,

  • ogbl-ppa
  • ogbl-citation2
@dcm-nakashima
Copy link
Author

@ZoomWang666 @lipan00123
I forgot to add a mentions.
If you know anything about this, please let me know.

@zoom-wang112358
Copy link
Collaborator

Hello,

I think maybe you can try to reduce the PE dimension to save GPU memory. For PEG-LE+ and PEG-DW+, they need more GPU memory than PEG-LE and PEG-DW, and we are planning to address this issue.

@dcm-nakashima
Copy link
Author

@ZoomWang666
Thank you for your prompt reply.
I will lower hidden_channels(default=256).
(For example 64, 128)

@dcm-nakashima
Copy link
Author

Have you done any experiments with datasets of this size?
For example,
ogbl-ppa
ogbl-citation2

What do you think about this one?

@zoom-wang112358
Copy link
Collaborator

This project is still ongoing. The ICLR paper mainly focus on the theoretical understanding of positional encoding.

We are now working on a standard framework of PEG for large-scale networks (100M+ nodes),and the standard framework will be released later.

@dcm-nakashima
Copy link
Author

Thank you for your reply.
I understand the situation.
I am also aware that this is a theoretical stage.

We are now working on a standard framework of PEG for large-scale networks (100M+ nodes),and the standard framework will be released later.

I am looking forward to it!

@lipan00123
Copy link
Contributor

lipan00123 commented Apr 3, 2022

@dcm-nakashima Thanks for checking our work. May I ask a follow-up question? When you decreased the hidden_channels dimension and use PEG-DW or PEG-LE instead of PEG-DW++ or PEG-LE++, did you successfully run the code on your dataset with 700k nodes?

I am also curious if you are able to run standard GCN on your network based on your 16G GPU. If even standard GCN cannot work, the current pipeline of PEG may not work as well. Further graph partitioning/downsampling-based pipeline that we are working on is needed.

@dcm-nakashima
Copy link
Author

@lipan00123
Thank you for your question.
I have not done so yet and will do so in the future.
I will share the results with you.

@dcm-nakashima
Copy link
Author

@lipan00123
Since the above large data set could not be used, we used sampled small datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants