-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathlog_partition_dataset_creation.txt
148 lines (137 loc) · 7.01 KB
/
log_partition_dataset_creation.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
/Users/judithbernett/miniforge3/envs/pipr/bin/python /Users/judithbernett/PycharmProjects/PPIs_MA/rewrite_datasets.py
############################ GUO DATASET ############################
3 unmatched IDs ...
14 unmatched IDs ...
0 unmatched IDs ...
0 unmatched IDs ...
/Users/judithbernett/PycharmProjects/PPIs_MA/rewrite_datasets.py:126: FutureWarning: The squeeze argument has been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.
simap_dict = pd.read_csv(
Cleaning and balancing partition 0 ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
sampling more negatives (2929 positives, 2070 negatives)...
Cleaning and balancing partition 1 ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (682 positives, 815 negatives)...
Cleaning and balancing partition both ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (1872 positives, 2603 negatives)...
writing positive files: only partition 0: 2929 proteins...
writing negative files: only partition 0: 2929 proteins...
writing positive files: only partition 1: 682 proteins ...
writing negative files: only partition 1: 682 proteins ...
writing positive files: both partitions: 1872 proteins ...
writing negative files: both partitions: 1872 proteins ...
############################ HUANG DATASET ############################
285 unmatched IDs ...
288 unmatched IDs ...
73 unmatched IDs ...
116 unmatched IDs ...
/Users/judithbernett/PycharmProjects/PPIs_MA/rewrite_datasets.py:130: FutureWarning: The squeeze argument has been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.
simap_dict = pd.read_csv(
Cleaning and balancing partition 0 ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
sampling more negatives (1110 positives, 785 negatives)...
Cleaning and balancing partition 1 ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (748 positives, 1232 negatives)...
Cleaning and balancing partition both ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (1485 positives, 2049 negatives)...
writing positive files: only partition 0: 1110 proteins...
writing negative files: only partition 0: 1110 proteins...
writing positive files: only partition 1: 748 proteins ...
writing negative files: only partition 1: 748 proteins ...
writing positive files: both partitions: 1485 proteins ...
writing negative files: both partitions: 1485 proteins ...
############################ RICHOUX DATASET ############################
81964 PPIs!
/Users/judithbernett/PycharmProjects/PPIs_MA/rewrite_datasets.py:187: FutureWarning: The squeeze argument has been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.
simap_dict = pd.read_csv(
Cleaning and balancing partition 0 ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
sampling more negatives (15053 positives, 9358 negatives)...
Cleaning and balancing partition 1 ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (6864 positives, 11399 negatives)...
Cleaning and balancing partition both ...
0 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (17500 positives, 20458 negatives)...
writing positive files for SPRINT: only partition 0: 15053 proteins ...
writing negative files: only partition 0: 15053 proteins ...
writing positive files: only partition 1: 6864 proteins ...
writing negative files: only partition 1: 6864 proteins ...
writing positive files: both partitions: 17500 proteins ...
writing negative files: both partitions: 17500 proteins ...
############################ PAN DATASET ############################
Mapping Protein IDs ...
#unmapped IDs: 812
100%|██████████| 73109/73109 [00:00<00:00, 691378.04it/s]
/Users/judithbernett/PycharmProjects/PPIs_MA/rewrite_datasets.py:249: FutureWarning: The squeeze argument has been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.
simap_dict = pd.read_csv(
65171 PPIs!
Cleaning and balancing partition 0 ...
22 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
sampling more negatives (14659 positives, 9704 negatives)...
Cleaning and balancing partition 1 ...
6 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (4541 positives, 6678 negatives)...
Cleaning and balancing partition both ...
27 duplicates in positive dataset!
0 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (12279 positives, 17132 negatives)...
writing positive files for SPRINT: only partition 0: 14659 proteins ...
writing negative files: only partition 0: 14659 proteins ...
writing positive files: only partition 1: 4541 proteins ...
writing negative files: only partition 1: 4541 proteins ...
writing positive files: both partitions: 12279 proteins ...
writing negative files: both partitions: 12279 proteins ...
############################ DU DATASET ############################
100%|██████████| 65852/65852 [00:00<00:00, 2031414.52it/s]
65851 PPIs!
/Users/judithbernett/PycharmProjects/PPIs_MA/rewrite_datasets.py:297: FutureWarning: The squeeze argument has been deprecated and will be removed in a future version. Append .squeeze("columns") to the call to squeeze.
simap_dict = pd.read_csv(
Cleaning and balancing partition 0 ...
1 duplicates in positive dataset!
985 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (7959 positives, 17818 negatives)...
Cleaning and balancing partition 1 ...
0 duplicates in positive dataset!
363 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (2361 positives, 6475 negatives)...
Cleaning and balancing partition both ...
0 duplicates in positive dataset!
1162 duplicates in negative dataset!
Number of overlaps between pos and neg: 0
randomly dropping negatives (6936 positives, 21791 negatives)...
writing positive files for SPRINT: only partition 0: 7959 proteins ...
writing negative files: only partition 0: 7959 proteins ...
writing positive files: only partition 1: 2361 proteins ...
writing negative files: only partition 1: 2361 proteins ...
writing positive files: both partitions: 6936 proteins ...
writing negative files: both partitions: 6936 proteins ...
Process finished with exit code 0