utils/fetch_aiger.py
: Fetch the benchmark from the hwmcc, and simplly preprocess it (processed data should be inbenchmark_folder/
folder)- usage:
cd utils && python fetch_aiger.py
- note:
- change
aag_dir
(dump folder),csv_file
(running info and aiger location prefix) andaag_list[i]
(aiger files location) in the script to fetch different benchmarks. Modifyfetch_aig_from_csv
to ajust the preprocessing methods (only unsat? only sat? Only hard cases?). Modifysimplify
to determine whether convertaig
toaag
- Comment out
aag_list = [aag for aag in aag_list if aag.split('/')[-1] in hard_aag_list]
if we don't want to filter out the some cases - Choose what filter method you want to use in
fetch_aig_from_csv()
- change
- usage:
build_dataset.py
: Build the dataset from the benchmark- usage:
python build_dataset.py --model-checker <model-checker> --simplification-level <simplification-level> --benchmark <benchmark> --ground_truth_folder_prefix <ground_truth_folder_prefix> --subset_range <subset_range>
- parameters:
--model-checker
: The model checker to use (e.g.abc
,ic3ref
)--simplification-level
: The simplification level to use (e.g.thorough
,deep
,moderate
,slight
,naive
)--benchmark
: The benchmark to use in benchmark_folder (e.g.hwmcc2007_all
,hwmcc2020_all
,hwmcc2020_all_only_unsat
,hwmcc2020_small
)--ground_truth_folder_prefix
: The prefix of the ground truth folder (e.g./data/guangyuh/coding_env/AIG2INV/AIG2INV_main/ground_truth/hwmcc20_abc_7200_result
,/data/guangyuh/coding_env/AIG2INV/AIG2INV_main/ground_truth/hwmcc20_abc_7200_result
,/data/guangyuh/coding_env/AIG2INV/AIG2INV_main/clause-learning/data-collect/hwmcc07-7200-result/output/tip/
,/data/guangyuh/coding_env/AIG2INV/AIG2INV_main/clause-learning/data-collect/hwmcc07-7200-result/output/tip/
)--subset_range
: The range of data in benchmark to generate (e.g.1
,23
)
- note:
- There exists a list named
AigCaseBlackList
, which can exclude some cases from the benchmark (e.g. usingutils/check_proc.sh
to find thosehard-extraction
cases, and add them to the list.echo "$count"
orecho "$aag_files_and_count"
after sourcing) - Delete the existing
dataset_xxxx/
folder before running the script
- There exists a list named
- usage:
select which model to use (currently only neurograph
is available)
-
train_neurograph/train.py
: Train the model- usage:
python train.py --task-name <task-name> --local_rank <local_rank> --dim <dim> --n_rounds <n_rounds> --epochs <epochs> --inf_dev <inf_dev> --gen_log <gen_log> --log-dir <log-dir> --model-dir <model-dir> --data-dir <data-dir> --restore <restore> --train-file <train-file> --val-file <val-file> --mode <mode> --gpu-id <gpu-id> --batch-size <batch-size> --pos-weight <pos-weight> --lr <lr>
- parameters:
--task-name
: The task name (e.g.neuropdr
)--local_rank
: The local rank for dpp (e.g.-1
)--dim
: The dimension of variable and clause embeddings (e.g.128
)--n_rounds
: The number of rounds of message passing (e.g.26
)--epochs
: The number of epochs (e.g.10
)--inf_dev
: The device to use (e.g.gpu
)--gen_log
: The log file to use (e.g.log/data_maker.log
)--log-dir
: The log folder dir (e.g.log/
)--model-dir
: The model folder dir (e.g.neurograph_model/
)--data-dir
: The data folder dir (e.g.dataset/
)--restore
: The model to continue training (e.g.neuropdr_2021-01-06_07:56:51_last.pth.tar
)--train-file
: The train file dir (e.g.dataset_hwmcc2020_small_abc_slight_1
)--val-file
: The validation file dir (e.g.dataset_hwmcc2020_small_abc_slight_1
)--mode
: The mode to use (e.g.train
,debug
)--gpu-id
: The gpu id to use (e.g.0
,1
)--batch-size
: The batch size to use (e.g.2
)--pos-weight
: The positive weight in BCEWithLogitsLoss (e.g.1.0
)--lr
: The learning rate to use (e.g.0.00001
)
- usage:
-
train_gcn/train.py
parser.add_argument('--dataset', type=str, default=None, help='dataset name') # no need if load pickle parser.add_argument('--load-pickle', type=str, default=None, help='load pickle file name') parser.add_argument('--dump-pickle-name', type=str, default=None, help='dump pickle file name') # no need if load pickle parser.add_argument('--model-name', type=str, default=None, help='model name to save')- usage:
python train.py --dataset <dataset> (optional: --load-pickle, --model-name, --dump-pickle-name <dump-pickle-name> --dual , etc.)
- parameters:
--dataset
: The dataset to use (e.g./data/guangyuh/coding_env/AIG2INV/AIG2INV_main/dataset_hwmcc2007_tip_ic3ref_no_simplification_0-22/bad_cube_cex2graph/expr_to_build_graph/
)--load-pickle
: The pickle file to load (e.g.hwmcc07_tip_ic3ref_no_simplification_0-22.pickle
)--dump-pickle-name
: The pickle file to dump (e.g.hwmcc07_tip_ic3ref_no_simplification_0-22.pickle
)--model-name
: The model name to save (e.g.hwmcc07_ic3ref_tip_no_simplification_0-22.pt
)--dual
: Use dual GNN or not (One for pre-defined feature, one for unsuperivsed embedding)
- note:
- Adjust
DATASET_SPLIT
if you just want to test the code - Adjust
h = self.dropout(h)
intrain_gcn/GNN_Model.py
andWEIGHT_DECAY
intrain_gcn/train.py
if you want to prevent overfitting - Define how to split the dataset by yourself (using
train_test_split()
) - Uncomment
prof.step()
in training loop if you want to profile the code - Check
threshold_finder = ThresholdFinder(val_dataloader, model, device)
, confirm which dataloader you want to use in testing - Check
Validation loop
in training loop, uncomment if you need it (e.g. early stop or hyperparameter fine tuning) - Check node embedding method that you want to employ in
graph_list = employ_graph_embedding(graph_list,args)
- Comment out
default args
if you want to input them in command line - Comment out
G = G.to_undirected()
if you want to use directed graph - Uncomment
graph_list_struc_feat = employ_graph_embedding(graph_list,args)
if you want to use additional node embedding - If using
dual
GNN, choose the GNN model start withdual
(), other will fail. (e.g.DualGCNModle
,DualGraphSAGEModel
,etc.)
- Adjust
- usage:
main.py
: Validate the model- usage:
python main.py --threshold <threshold> --selected-built-dataset <selected-built-dataset> --NN-model <NN-model> --gpu-id <gpu-id>
(optional:--compare_with_abc
,--re-predict
, etc.) - parameters:
--threshold
: The threshold to use (e.g.0.5
)--selected-built-dataset
: The built dataset with graph generated to use (e.g.dataset_hwmcc2020_small_abc_slight_1
)--NN-model
: The NN model to use (e.g.neuropdr_2023-01-06_07:56:51_last.pth.tar
)--gpu-id
: The gpu id to use (e.g.0
,1
)--compare_with_abc
: Compare the result with abc (optional)--re-predict
: Re-predict the data (optional)--compare_with_ic3ref_basic_generalization
: Compare the result with ic3ref basic generalization (optional)--compare_with_nnic3_basic_generalization'
: Compare the result with nnic3 basic generalization (optional)--aig-case-name
: Test the single aiger (optional)
- note:
- delete the duplicate data in
case4comp/
folder before runing the script (if this script has been run before with the same arguments)
- delete the duplicate data in
- usage:
result_analyzer
: analyze the result- usage:
python result_analyzer.py --log-file <log-file>
- parameters:
--log-file
: The log file to use (e.g.compare_with_ic3ref.csv
)
- usage:
data2dataset/cex2smt2/collect.py
: Convert aig+inv to graph- usage:
python collect.py --aag <aag> --generalize <generalize> --cnf <cnf> --generate_smt2 <generate_smt2> --inv-correctness-check <inv-correctness-check> --run-mode <run-mode> --model-checker <model-checker> --thorough-simplification <thorough-simplification> --deep-simplification <deep-simplification> --moderate-simplification <moderate-simplification> --slight-simplification <slight-simplification> --naive-simplification <naive-simplification> --ground-truth-folder-prefix <ground-truth-folder-prefix> --dump-folder-prefix <dump-folder-prefix>
- parameters:
--aag
: The aag file to use (e.g.tip2_2.aag
)--generalize
: Generalize the predesessor (e.g.True
)--cnf
: The cnf file to use (e.g.tip2_2.cnf
)--generate_smt2
: Generate smt2 file (e.g.True
)--inv-correctness-check
: Check the correctness of the invariant (e.g.True
)--run-mode
: Normal or debug. Debug model will exit after inv correctness check (e.g.debug
)--model-checker
: The model checker to use (e.g.ic3ref
)--thorough-simplification
: Use sympy in tr simplification + aig operator simplification during tr construction + z3 simplification + counterexample cube simplification (e.g.False
)--deep-simplification
: Use sympy in tr simplification + aig operator simplification during tr construction + z3 simplification (e.g.False
)--moderate-simplification
: Aig operator simplification during tr construction + z3 simplification (e.g.False
)--slight-simplification
: Z3 simplification + ternary simulation (e.g.False
)--naive-simplification
: Only use sympy to simplify the counterexample cube (e.g.False
)--ground-truth-folder-prefix
: The prefix of the ground truth folder (e.g./data/guangyuh/coding_env/AIG2INV/AIG2INV_main/clause-learning/data-collect/hwmcc07-7200-result/output/tip/
)--dump-folder-prefix
: The prefix of the dump folder (e.g./data/guangyuh/coding_env/AIG2INV/AIG2INV_main/dataset
)
- note:
_check_tr_correctness_after_simplification()
should be called if the simplification is turned on or debug
- usage:
tool_box.py
: Visualize the result/clean log- usage:
python tool_box.py --clean_trivial_log <clean_trivial_log> --calculate_pickle_number <calculate_pickle_number> --json2mermaid <json2mermaid> --file_path <file_path>
- parameters:
--clean_trivial_log
: Clean trivial log (e.g.True
)--calculate_pickle_number
: Calculate the number of pickle files (e.g.True
)--json2mermaid
: Convert json to mermaid (e.g.True
)--file_path
: The file path to use (e.g.dataset_hwmcc2020_small_abc_slight_1
)
- usage:
data2dataset/cex2smt2/symbolic_reg_model.py
: Symbolic regression model- usage:
python symbolic_reg_model.py --model-file <model-file> --validate <validate> --model <model>
- parameters:
--model-file
: The path to the model file (e.g.symbolic_reg_model_2021-01-06_07:56:51_last.pth.tar
)--validate
: Determin whether to validate the model (e.g.True
)--model
: Determin which model to use (e.g.1
)
- usage:
data2dataset/cex2smt2/pytorch_auto_regression.py
: Auto-pytorch regression modeldata2dataset/cex2smt2/sklearn_auto_regression.py
: Auto-sklearn regression model
data2dataset/cex2smt2/ModelSampler.py
: SAT model uniform distribution sampling- usage:
python ModelSampler.py --smt2-file <smt2-file>
- parameters:
--smt2-file
: The path to the smt2 file (e.g.tip2_2.smt2
)
- usage:
data2dataset/cex2smt2/ModelSample_real_ud
: Using crytominisat to perform real uniform distribution sampling- note: This script is under development. It is not used in the paper.
utils/check_proc.sh
: check the process (normally used to find complex case ifbuild_data.py
has been run for a long time)- usage:
source ./check_proc.sh && echo "$count"
orsource ./check_proc.sh && echo "$aag_files_and_count"
- usage: