-
-
Notifications
You must be signed in to change notification settings - Fork 563
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugs of Incorrect Calculation of Baseline Hazard & Baseline Cumulative Hazard #1475
Comments
Hi @bofeng2018, thank you for the detailed issue. This is interesting. Give me a few days to try some corrections. |
Happy new year! It has been a while and just want to follow up with the issue that I reported. Please let me know if there is anything that I can help with. |
Hey @bofeng2018, I was just thinking about this issue. I haven't had time to devote to this - do you want to submit a PR with a recommend change (even a temporary one that I can replace later)? With a fix in, I can cut a new release too |
Sure. I just created a pull request. Please let me know if you see any problems. |
Hello I came accross this issue, and was wondering if it is still open and what the impact is. I would like to use lifelines for Cox PH regression, but I don't know enough about this area to be decide if this implementation is safe to use. Thank you! |
Hi, I'm finally investigating this more carefully. I've written these two tests below, but am unable to make them fail (all assertions are true). def test_baseline_prediction_with_extreme_means(self, rossi):
cph = CoxPHFitter()
cph.fit(rossi, "week", "arrest")
rossi_shifted = rossi.copy()
rossi_shifted['prio'] += 100
cph_shifted = CoxPHFitter()
cph_shifted.fit(rossi_shifted, "week", "arrest")
# make sure summary stats are equal
assert_frame_equal(cph_shifted.summary, cph.summary)
# confirm hazards are equal
assert_frame_equal(cph.baseline_hazard_, cph_shifted.baseline_hazard_)
assert_frame_equal(cph.baseline_cumulative_hazard_, cph_shifted.baseline_cumulative_hazard_)
def test_baseline_prediction_with_extreme_scaling(self, rossi):
cph = CoxPHFitter()
cph.fit(rossi, "week", "arrest")
rossi_scaled = rossi.copy()
rossi_scaled['prio'] *= 100
cph_scaled = CoxPHFitter()
cph_scaled.fit(rossi_scaled, "week", "arrest")
# make sure summary stats are equal - note that CI and coefs are unequal since we scaled params.
assert_frame_equal(cph_scaled.summary[['z', 'p']], cph.summary[['z', 'p']])
# confirm hazards are equal
assert_frame_equal(cph.baseline_hazard_, cph_scaled.baseline_hazard_)
assert_frame_equal(cph.baseline_cumulative_hazard_, cph_scaled.baseline_cumulative_hazard_) Also, when I implement your solution, the following important tests fail: https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/tests/test_estimation.py#L4573 Some others fail, too, but these are the more relevant ones as they compare against a R's Can you provide a simple example, dummy data or not, that shows the difference? |
Hi Cameron,
Sorry for the late reply. It took me a while to prepare the notebook in the
attachment. You can refer to the notebook to see solid proof of what was
wrong with the baseline (cumulative) hazard calculation.
I did not have time to go through the R library to see the other issues you
mentioned. Maybe after we reach an agreement on the issues as demonstrated
in the notebook, we can talk again about the comparison issues.
Regards,
Lizhou
--
Lizhou Nie
Senior Data Scientist
Prudential Financial
…On Sat, Dec 30, 2023 at 4:53 PM Cameron Davidson-Pilon < ***@***.***> wrote:
Hi, I'm finally investigating this more carefully. I've written these two
tests below, but am unable to make them fail (all assertions are true).
def test_baseline_prediction_with_extreme_means(self, rossi):
cph = CoxPHFitter()
cph.fit(rossi, "week", "arrest")
rossi_shifted = rossi.copy()
rossi_shifted['prio'] += 100
cph_shifted = CoxPHFitter()
cph_shifted.fit(rossi_shifted, "week", "arrest")
# make sure summary stats are equal
assert_frame_equal(cph_shifted.summary, cph.summary)
# confirm hazards are equal
assert_frame_equal(cph.baseline_hazard_, cph_shifted.baseline_hazard_)
assert_frame_equal(cph.baseline_cumulative_hazard_, cph_shifted.baseline_cumulative_hazard_)
def test_baseline_prediction_with_extreme_scaling(self, rossi):
cph = CoxPHFitter()
cph.fit(rossi, "week", "arrest")
rossi_scaled = rossi.copy()
rossi_scaled['prio'] *= 100
cph_scaled = CoxPHFitter()
cph_scaled.fit(rossi_scaled, "week", "arrest")
# make sure summary stats are equal - note that CI and coefs are unequal since we scaled params.
assert_frame_equal(cph_scaled.summary[['z', 'p']], cph.summary[['z', 'p']])
# confirm hazards are equal
assert_frame_equal(cph.baseline_hazard_, cph_scaled.baseline_hazard_)
assert_frame_equal(cph.baseline_cumulative_hazard_, cph_scaled.baseline_cumulative_hazard_)
------------------------------
Also, when I implement your solution, the following important tests fail:
https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/tests/test_estimation.py#L4594
https://github.com/CamDavidsonPilon/lifelines/blob/master/lifelines/tests/test_estimation.py#L4573
Some others fail, too, but these are the more relevant ones as they
compare against a R's survival library.
------------------------------
Can you provide a simple example, dummy data or not, that shows the
difference?
—
Reply to this email directly, view it on GitHub
<#1475 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJCTY7PGLT4VUBRIWE23QF3YMCENNAVCNFSM6AAAAAASORQBPWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZSGYYTGNRQGQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@bofeng2018 email attachments won't show in issues, can you repost your nb in the issue thread? |
Attached is the zipped nb. Let me know if you can't open it. |
(Sorry about the spam - I keep alternating between if I am convinced I've found the problem or not - still investigating) |
Totally understandable. It also took me a while to finally confirm that the problem was because of the data standardization. Let me know if you have any questions. |
Okay, here's an (updating) summary of what I know. This may change:
![]()
![]()
¹ although our tests say |
Some more thoughts:
So I think the problem is that it's not obvious where the factor is being introduced. We either introduce it in the calculation of the baseline hazard and then avoid it in the |
The weird second graph is probably caused by super low mortality rates. Instead of dat['Health Score'] += 10, can you try dat['Health Score'] -= 10? The blue data points were based on classical textbook methods. Unless the implementation was wrong, they should be very consistent with theoretical results with large sample sizes. |
Yes, I'm confident that the problem comes from the data standardization procedure. To be more specific, covariate means were incorrectly shifted to the baseline hazard. As a simple way to confirm it, for data preprocessing, can you try to only convert the covariate variations to be 1 but not shift their locations? That applies to lines 1248-1250 in the coxph_fitter.py file, and the corresponding codes are shown below. I would be very curious to see how it changes the results.
|
Here's my edits of your notebook, I made some changes to reduce the complexity and focus on a simpler example.
I don't think this is the correct interpretation. The covariate means have to go somewhere. Either they exist in the log-partial-hazard, or they exist in baseline hazard. We (implicitly) have them exist in the log-partial-hazard. More explicitly:
If I understand correctly, the differences we see are due to what we think the cumulative base hazard is:
Sorry, I'm not following: which results should we be looking at? |
Yes, I agree we probably have disagreements on what would define a baseline. I guess I'm always following the most classical way that a Cox proportional hazard model would be formulated, which does not include steps to subtract sample means from each covariate. In this case, the baseline hazard would be calculated in the way I described. However, I also totally agree that the most important thing is to get the final hazard estimations correctly. As long as covariate sample means are always subtracted from future data as in the case of the training data, the estimated hazard functions should be correct. Overall, very inspiring discussions! Thanks for all the thought sharing! |
💯 This is forcing me to re-evaluate these very important details. We'll make some changes to make this easier (at the very least: more transparent) for users. |
Hello, I am a senior data scientist from Prudential Financial. While working on one project involving the Cox proportional hazard models, I found that the baseline hazard and baseline cumulative hazard were obviously calculated incorrectly using the CoxPHFitter module within the lifelines package. Before jumping into the bug details, I want to share the version information first. The lifelines package I was using was 0.27.0, but by the time I checked the source codes in GitHub again this morning, which belonged to the version 0.27.4 I believe, I still saw the same bugs.
The bugs originate from the fit_model function in the SemiParametricPHFitter class, which start from the line 1252 of the coxph_fitter.py file. Notice that, the standardized data are supplied to the fit_model function for further estimation. This will not be a problem for the Cox coefficient estimation (i.e., the params in the function), because they are restored to their original scales after being divided by corresponding standard deviations of the original data as in line 1399. However, no similar action has been taken for the predicted_partial_hazards calculated in line 1392. I notice that in line 1393, a matrix multiplication of the standardized data with the uncorrected Cox coefficients was used to avoid the scale issues. However, the location issues were never taken care of. As a result, it is almost like the raw data are shifted which direction and extent depend on their original mean values, and the effect of the shift is incorrectly transferred to the baseline hazard of the Cox model. Thus, it causes all the subsequent calculations of the baseline hazard and baseline cumulative hazard to be incorrect.
The fixes for the bugs are straightforward, which is basically to use the unstandardized data for baseline hazard related calculations. I have appended some codes below for a lazy fix. By adding them right at the line 1270, it should generate the correct baseline hazard and baseline cumulative hazard. However, this is definitely not ideal, since incorrect calculations are not removed but rather just replaced. If desired, I would be very happy to work with the lifelines development team to come up with a permanent and neater fix for it.
The text was updated successfully, but these errors were encountered: