Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance Degradation Introduced in New Relic PHP Agent v10.13.0.2 #806

Open
theophileds opened this issue Dec 20, 2023 · 4 comments
Open
Labels
bug Something isn't working

Comments

@theophileds
Copy link

theophileds commented Dec 20, 2023

Description

A significant increase in CPU usage, latency, and fluctuating php-fpm processes occurred after upgrading the New Relic PHP agent from version 10.0.0.312 to version 10.13.0.2. Despite attempting to downgrade New Relic, compatibility issues arose with PHP 8.2, leading to agent disablement and subsequent performance improvement.

Hypothesis: Hypervisor Clock Settings

Upon contacting New Relic support, a potential connection to hypervisor clock settings was suggested. Despite transitioning to TSC (Timestamp Counter) for clock configuration, benchmark results displayed a marginal improvement in average duration.

This benchmark was executed with 100,000,000 iterations, repeated a hundred times on two different containers running on machines set with TSC and kvm-clock configurations.

for (int i = 0; i < iterations; i++) {
    gettimeofday(&end, NULL);
}

Benchmark Results:

TSC-based Configuration: Average Duration 2.321919 seconds
kvm-clock-based Configuration: Average Duration 2.817715 seconds
The observed result indicated a 17.56% decrease in average time when using TSC.

However, we acknowledge that our benchmarking approach may not accurately mirror the load pattern experienced by the New Relic agent. Moreover, despite conducting tests using TSC, we did not observe any noteworthy improvement in performance.

Feature Disabling and Version Testing

To pinpoint the source of the issue, extensive testing was conducted, including the disabling of features such as distributed tracing, code-level metrics, and application logging. The performance impact persisted across multiple tests and versions.

newrelic.distributed_tracing_enabled = false
newrelic.code_level_metrics.enabled = false
newrelic.application_logging.enabled = false
newrelic.custom_events.max_samples_stored = 10000

newrelic.daemon.dont_launch = 3
newrelic.daemon.utilization.detect_aws = false
newrelic.daemon.utilization.detect_azure = false
newrelic.daemon.utilization.detect_gcp = false
newrelic.daemon.utilization.detect_pcf = false
newrelic.daemon.utilization.detect_docker = false
newrelic.daemon.app_timeout = "2m"
newrelic.browser_monitoring.auto_instrument = false
newrelic.framework = "symfony4"

newrelic.error_collector.enabled = false
newrelic.transaction_tracer.enabled = false
newrelic.transaction_tracer.detail = 0
newrelic.transaction_tracer.slow_sql = false
newrelic.transaction_events.enabled = false
newrelic.attributes.enabled = false
newrelic.custom_insights_events.enabled = false
newrelic.synthetics.enabled = false
newrelic.datastore_tracer.instance_reporting.enabled = false
newrelic.datastore_tracer.database_name_reporting.enabled = false
newrelic.application_logging.forwarding.enabled = false

Regrettably, these efforts did not result in any substantial improvement. After repeating the experiment multiple times, it became evident that enabling New Relic consistently led to a significant negative impact on performance. This observation persisted across various versions of the New Relic agent, including:

  • 10.7.0.319
  • 10.13.0.2
  • 10.14.0.3

PHP-fpm Processes and CPU Usage

As illustrated in the Grafana metrics screen captures, the tests were conducted in the following sequence with the specified configurations:

  1. New Relic fully disabled
  2. New Relic enabled (All features disabled) with TSC clock
  3. New Relic enabled (All features disabled) with kvm-clock configuration
Screenshot 2023-12-20 at 2 40 40 PM

Conclusion

The bump to version 10.13.0.2 introduced significant performance degradation, challenging explanations based solely on new features or clock system changes. The issue persists despite clock configuration adjustments and feature disabling.

Your Environment

PHP backend applications built on Symfony, Docker image php:8.2.13-fpm
Deployed on EKS 1.24, EC2 instance type: m5.xlarge (Hypervisor Nitro)
Clock configuration tested with TSC and kvm-clock

@theophileds theophileds added the bug Something isn't working label Dec 20, 2023
@theophileds
Copy link
Author

Additional Experiment with Version 10.15.0.4

Further experiments were conducted with New Relic agent version 10.15.0.4 (under the same newrelic.ini configuration), both enabled and disabled. Unfortunately, no significant improvement was observed in performance.

Screenshot 2023-12-26 at 2 51 54 PM Screenshot 2023-12-26 at 2 52 11 PM Screenshot 2023-12-26 at 2 52 29 PM

In terms of memory consumption, we observed an increase of approximately 70 MB per pod when the New Relic agent is enabled, resulting in an average of approximately 375 MB per pod. In comparison, when the agent is disabled, the memory usage averages around 305 MB per pod.

@dorain47
Copy link

dorain47 commented Jan 1, 2024

@theophileds agree with your observation 💯
CPU spike has reduced (slightly) for me since 10.15.0.4 but for the memory part, I have been facing higher memory usage since last few newrelic agent releases.

@theophileds
Copy link
Author

theophileds commented Jan 25, 2024

Hello,

I have some exciting updates to share with you.

Firstly, we conducted performance tests using the latest version of the New Relic agent, v10.16.0.5, and observed a modest ~5% reduction in CPU overhead.

Additionally, after thorough performance testing, we noticed a significant efficiency improvement by transitioning to Amazon EC2 C7a instances. These instances utilize AMD processors, surpassing their Intel chip generation counterparts in performance.

Our comparison involved several machines, including c7a.xlarge (AMD), c7i.xlarge (Intel), c5.xlarge (Intel), and our current m5.xlarge. Attached are screenshots depicting the results.

Screenshot 2024-01-24 at 7 42 50 PM

The c7a.xlarge emerged as the top performer, demonstrating significant performance improvements. Although some of this enhancement can be linked to the higher frequencies of AMD processors, the noteworthy performance variability, particularly given that c7a.xlarge instances still use kvm-clock while utilizing AMD chips that provide 3.7 GHz per core compared to the 3.5 GHz performance per core of c5.xlarge instances, hints at the potential influence of AMD's architecture and cache structure on these outcomes.

@Winfle
Copy link

Winfle commented Sep 1, 2024

@theophileds I think, main difference between current Intel and c7a instances is in SMT- vCPU is locked not on cpu thread but on actual CPU core.
So you have more real cores especially on heavy tasks, that involve high CPU usage

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants