-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AWS-S3] Add a timestamp filter for s3 polling mode #41232
Comments
Pinging @elastic/obs-ds-hosted-services (Team:obs-ds-hosted-services) |
This is a nice addition for end users. Besides, this should improve performance with object parsing and storing, as beats will only process objects that fulfill the timestamp requirement. |
@Kavindu-Dodan @kaiyan-sheng For your reference, I’m sharing the workaround the customer developed to retrieve only the new files. Please take a look and let me know if the logic might be useful for us as well. Overview - The scripts uses Boto3 library to interact with AWS services which are present in S3 bucket for fetching AWS logs. Once the logs are fetched , they are sent over UDP to a designated Port for further processing. Logic Breakdown
First the script initializes the Boto3 client with the necessary AWS credentials, allowing it to authenticate and interact with AWS services.
We then pass bucket name and object key asynchronously to retreive file data based on file suffix
|
Hi @Kavindu-Dodan @kaiyan-sheng , Could you please provide an update and share the ETA for this? This would help us communicate effectively with the customer, as they consider this a critical feature and have been following up regularly. |
@anuj-elastic I am working on adding this feature through PR #41817. And I am aiming to release this with 8.18.0, which is planned for early next year. Along with that, I am planning to upgrade the integrations to support the new feature. This is planned through elastic/integrations#11919 |
Thanks for the update @Kavindu-Dodan. It's wonderful to hear this update. The customer has been consistently following up on this for the past couple of months, and now I have a timeline to share with them and set expectations accordingly. |
@Kavindu-Dodan Do you have any idea if the similar issue with Cloudflare Integration will also clubbed with the same fix? |
@anuj-elastic related to your question, please see the update here
|
@anuj-elastic Additionally to what @bturquet said, PR #41817 will bring configurations to address the performance considerations. Since Cloudflare integration internally utilizes the S3 implementation, I hope these new configurations can also fix the referenced issue. The update of the integrations will be done through elastic/integrations#11919 |
PR #41817 was merged on |
Update - We are backporting this improvement to |
Describe the enhancement:
Current S3 input without SQS notification calls
ListObjects
API to collect all logs/objects from the given S3 bucket. There is no filter functionality so users will get logs both old and new from the bucket.It would be nice to have a
start_timestamp
config parameter for users to specify a timestamp. Instead of ingesting all logs from the bucket, we can call the sameListObjects
API call, filter the results using thestart_timestamp
and only store logs that has a contentLastModified
>=start_timestamp
.Describe a specific use case for the enhancement or feature:
With the config below, we should only store logs with
LastModified
afterstart_timestamp: 2024-10-11T00:00:00+00:00
.This is what the
ListObjects
API call returns:The text was updated successfully, but these errors were encountered: