Skip to content

Commit

Permalink
Merge remote-tracking branch 'upstream/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
HanzhangRen committed Jul 26, 2024
2 parents 462b128 + d11f2a0 commit ee5a799
Show file tree
Hide file tree
Showing 7 changed files with 30 additions and 8 deletions.
7 changes: 7 additions & 0 deletions .github/workflows/checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,13 @@ jobs:
- name: Run prediction
run: docker run --rm -v "$(pwd)/.:/data" eyra-rank:latest /data/PreFer_fake_data.csv /data/PreFer_fake_background_data.csv --output /data/predictions.csv

- name: Check if file exists
run: |
if [ ! -f "predictions.csv" ]; then
echo "Predictions file not found. Please check the logs to see what went wrong."
exit 1
fi
- name: Build Docker scoring image
uses: docker/build-push-action@v4
with:
Expand Down
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,8 +46,10 @@ Submit your method via the "Submit Method" task on the Next platform by providin

ℹ️ If the check fails go to [FAQ](https://github.com/eyra/fertility-prediction-challenge/wiki/PreFer-Challenge-Wiki#frequently-asked-questions). You might need to add dependencies as described [here](https://github.com/eyra/fertility-prediction-challenge/wiki/PreFer-Challenge-Wiki#how-to-add-or-edit-dependencies-librariespackages).

4. On the main page of your repository, above the file list, click commits to view a list of commits, as described [here](https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/about-commits#about-commit-branches-and-tag-labels)
5. Go to the commit that you want to submit and right click on view commit details, then click "Copy Link Address", see example below:
4. On the main page of your repository, above the file list, click "Commits" to view a list of commits. Do NOT click "N commits ahead of". See example below:
![](https://github.com/eyra/fertility-prediction-challenge/blob/master/images/screenshot_commits.PNG)

5. Go to the commit that you want to submit and right click on "view commit details", then click "Copy Link Address", see example below:

![](https://github.com/eyra/fertility-prediction-challenge/blob/master/images/Copy%20link%20to%20commit.png)

Expand Down
Binary file modified images/Copy link to commit.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added images/screenshot_commits.PNG
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified model.rds
Binary file not shown.
2 changes: 1 addition & 1 deletion python.Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ COPY *.py /app
COPY *.joblib /app

ENTRYPOINT ["conda", "run", "-n", "eyra-rank", "python", "/app/run.py"]
CMD ["predict", "/data/fake_data.csv"]
CMD []
23 changes: 18 additions & 5 deletions score.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,17 @@
The predictions need to be in a separate file with two columns (nomem_encr, prediction).
Update from April 30:
Starting from the second intermediate leaderboard, we use this updated `score.py` script.
When calculating recall, we now take into account not only the cases when a predicted value was available (i.e., not missing) but all cases in the holdout set.
Specifically, in the updated script, we divide the number of true positives by the total number of positive cases in the ground truth data
(i.e., the number of people who actually had a new child), rather than by the sum of true positives and false negatives.
This change only matters if there are missing values in predictions.
We made this change to avoid a situation where a model makes very accurate predictions for only a small number of cases
(where the remaining cases were not predicted because of missing values on predictor variables),
yet gets the same result as a model that makes similar accurate predictions but for all cases.
Commented lines of code were part of our original scoring function.
"""

import sys
Expand Down Expand Up @@ -55,26 +66,28 @@ def score(prediction_path, ground_truth_path, output):
merged_df
)

# Calculate true positives, false positives, and false negatives
# Calculate true positives and false positives
true_positives = len(
merged_df[(merged_df["prediction"] == 1) & (merged_df["new_child"] == 1)]
)
false_positives = len(
merged_df[(merged_df["prediction"] == 1) & (merged_df["new_child"] == 0)]
)
false_negatives = len(
merged_df[(merged_df["prediction"] == 0) & (merged_df["new_child"] == 1)]
)

# Calculate the actual number of positive instances (N of people who actually had a new child) for calculating recall
n_all_positive_instances = len(merged_df[merged_df["new_child"] == 1])

# Calculate precision, recall, and F1 score
try:
precision = true_positives / (true_positives + false_positives)
except ZeroDivisionError:
precision = 0

try:
recall = true_positives / (true_positives + false_negatives)
recall = true_positives / n_all_positive_instances
except ZeroDivisionError:
recall = 0

try:
f1_score = 2 * (precision * recall) / (precision + recall)
except ZeroDivisionError:
Expand Down

0 comments on commit ee5a799

Please sign in to comment.