Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When the papertrail firehose is verbose, papertrail -f silently drops some logs #93

Open
topher200 opened this issue Sep 7, 2017 · 3 comments

Comments

@topher200
Copy link

topher200 commented Sep 7, 2017

Steps to reproduce:

  1. Have many log messages. We generate 10GB of messages per day
  2. Run papertrail -f
  3. Observe that there are occasional gaps in the messages of 1-2 seconds. For example, we'll see a message from 12:01:01, followed by a message from 12:01:03 (without any of the messages from 12:01:02).

I assume this is by design! I'm guessing that if there are a ton of messages, you didn't want to overwhelm the servers or delay the CLI with too much data.

Regardless, I'd like a realtime (or near realtime) firehose to parse. What is the best way to get that data? My ideas:

  1. Use the "archive to s3" function, but that forces a delay of 1-2 hours and is unusable for this project
  2. Manually "chunk" the data on my side, by requesting 5 minutes of data at a time (so at 12:05, I request the data for 12:00 til 12:05, etc)
  3. ...?

Is there any way to get papertrail -f to stop dropping messages? If not, how would you develop a realtime-ish system?

@topher200
Copy link
Author

I ended up going with option 2 - it's working well!

I have a cron job that runs every minute. That cron job kicks off a python script. The python script calls the papertrail CLI to get all the log messages from the previous minute and dumps them to file.

One annoyance is that the CLI doesn't format messages the same as Papertrail's s3 archiver. We need to convert the log messages to a matching format manually.

Here's the python script: https://github.com/topher200/assertion-context/blob/42db16a17f2d0bb3f714e316663fd319f9a1373f/web/realtime_updater/run.py
and the cron job: https://github.com/topher200/assertion-context/blob/42db16a17f2d0bb3f714e316663fd319f9a1373f/web/realtime_updater/crontab

@jareware
Copy link

jareware commented Jan 6, 2019

Also seeing this behaviour. Somewhat inconvenient.

Thanks for your suggestions @topher200, sounds like option 2 is the way to go for us as well. 👍

@ziemkowski
Copy link

ziemkowski commented Jun 24, 2021

We're also experiencing this issue, but manually chunking it isn't viable for tailing with the amount of logs we have (not to mention it delays our response time).

Here's a snippet of the only line numbers output by papertrail-cli for a 15 frame stack trace:

PHP   4. ...
PHP  11. ...
PHP  12. ...

And then a minute later, another trace has massive gaps too:

PHP Stack trace:
PHP   2. ...
PHP   4. ...
PHP   7. ...
PHP   8. ...
PHP  15. ...

The stack traces suggest that other single-line logs are being completely missed as lines get dropped everywhere.

This issue has increased pressure on the devops team to move our logs to native AWS services 😞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants