-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GNN training benchmark #359
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
- KEY: | ||
NAME: global_batch_size | ||
REQ: EXACTLY_ONE | ||
CHECK: " v['value'] > 0" | ||
|
||
- KEY: | ||
NAME: opt_name | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the reference code, the optimizer name is not reported, but we do have a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We should report optimizer name in reference. The seed check is part of package_checker and we don't need to include it here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added optimizer name in reference. |
||
REQ: EXACTLY_ONE | ||
CHECK: " v['value'] == 'adam' " | ||
|
||
- KEY: | ||
NAME: opt_base_learning_rate | ||
REQ: EXACTLY_ONE | ||
CHECK: " v['value'] >= 0.0" | ||
|
||
- KEY: | ||
NAME: eval_accuracy | ||
REQ: AT_LEAST_ONE | ||
CHECK: | ||
- "'epoch_num' in v['metadata']" | ||
ATLEAST_ONE_CHECK: "v['value'] >= 0.72 and v['value'] < 1.0" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
|
||
- KEY: | ||
NAME: eval_accuracy | ||
REQ: AT_LEAST_ONE | ||
CHECK: | ||
- "'epoch_num' in v['metadata']" | ||
ATLEAST_ONE_CHECK: "v['value'] < 1.0" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,6 +28,7 @@ | |
'unet3d' : 40, | ||
'rnnt': 10, | ||
'stable_diffusion': 10, | ||
'gnn': 10, | ||
}, | ||
"hpc": { | ||
'cosmoflow': 10, | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
{ | ||
|
||
"gnn_ref_4096": | ||
{ | ||
"Benchmark": "gnn", | ||
"Creator": "NVIDIA", | ||
"When": "Reference RCPs before v4.0", | ||
"Platform": "1xDGX-A100 and 8xDGX-A100", | ||
"BS": 4096, | ||
"Hyperparams": { | ||
"opt_base_learning_rate": 0.001 | ||
}, | ||
"Epochs to converge": [ | ||
0.85,0.75,0.75,0.80,0.80,0.75, | ||
0.75,0.85,0.75,0.75,0.80,0.80, | ||
0.80,0.75,0.80,0.80,0.80,0.80, | ||
0.80,0.85 ] | ||
}, | ||
|
||
"gnn_ref_16384": | ||
{ | ||
"Benchmark": "gnn", | ||
"Creator": "NVIDIA", | ||
"When": "Reference RCPs before v4.0", | ||
"Platform": "8xDGX-A100", | ||
"BS": 16384, | ||
"Hyperparams": { | ||
"opt_base_learning_rate": 0.002 | ||
}, | ||
"Epochs to converge": [ | ||
0.85,0.95,0.85,0.80,0.90,0.75, | ||
0.80,0.90,0.90,0.85,0.90,0.85, | ||
0.85,0.85,0.85,0.90,0.85,0.85, | ||
0.85,0.90 ] | ||
}, | ||
|
||
"gnn_ref_32768": | ||
{ | ||
"Benchmark": "gnn", | ||
"Creator": "Intel", | ||
"When": "Reference RCPs before v4.0", | ||
"Platform": "16xSPR-2S", | ||
"BS": 32768, | ||
"Hyperparams": { | ||
"opt_base_learning_rate": 0.002 | ||
}, | ||
"Epochs to converge": [ | ||
1.00,0.95,0.90,0.95,0.95,1.00, | ||
0.90,0.95,0.95,0.95,1.00,0.90, | ||
0.95,0.95,0.95,0.90,0.95,0.90, | ||
0.90,0.90 ] | ||
}, | ||
|
||
"gnn_ref_65536": | ||
{ | ||
"Benchmark": "gnn", | ||
"Creator": "NVIDIA", | ||
"When": "Reference RCPs before v4.0", | ||
"Platform": "32xDGX-A100", | ||
"BS": 65536, | ||
"Hyperparams": { | ||
"opt_base_learning_rate": 0.003 | ||
}, | ||
"Epochs to converge": [ | ||
1.25,1.20,1.25,1.20,1.15,1.15, | ||
1.15,1.20,1.15,1.20,1.25,1.15, | ||
1.20,1.20,1.15,1.25,1.20,1.15, | ||
1.10,1.15 | ||
] | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We actually do not have this key in the reference implementation (
mllog.constants.GRADIENT_ACCUMULATION_STEPS
). Should we also include this in the reference branch? Or is it okay for us to directly remove this in theclosed_common.yaml
checker?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's include gradient acccumulation in reference, as we should not be modifying the common yaml.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gradient accumulation is included in reference.