Skip to content

Latest commit

 

History

History
34 lines (19 loc) · 2.29 KB

README.md

File metadata and controls

34 lines (19 loc) · 2.29 KB

TraceZip

It is a collection of works which is made to build a prototype system of TraceZip, made by 4 pieces.

  • otel-auto-instrumentation-survey is a Java Project, shows how we generated work loads from some famous middlewares.
  • otel-compressor is an online-version implementation of TraceZip.
  • static-compressor is an offline-version implementation of TraceZip. You can use it to compress CSV files.
  • train-ticket-workload show how we generated Train-Ticket benchmark tracing datas.

For more information, please enter in the corresponding directory and read its README.

Here’s a suggested annotation for your README:


Note on Data Usage: Cluster-Trace-Microservices-v2022

Our research utilizes the Cluster-Trace-Microservices-v2022 dataset from the Alibaba Cluster Trace Program, which provides fine-granularity, large-scale microarchitectural metrics from Alibaba's colocation datacenters. Specifically, we have extracted and compressed trace data for the same interface to streamline our analysis.

For further details on this dataset, refer to the official repository: Alibaba Cluster Data. The static-compressor compress the CallGraph.

We express our gratitude to Alibaba Group for making this invaluable dataset available for the research community. If you have any questions about the trace data, consider filing an issue on their GitHub repository for community-wide discussions.


TrainTicket Benchmark Spans collection

Our TraceZip compression middleware can be applied to microservice systems that use OpenTelemetry for trace collection.

In the paper, we used TrainTicket and self-made services as a benchmark to test our compression middleware. Here, we provide the Span data generated by these services, which you can use to test the compression rate of our program for Spans.

We have removed data that might contain author information from the dataset to maintain anonymity during the review process. The total size of the dataset after extraction is approximately 16 GB.

Specifically, before sending the dataset we provided, you need to categorize all the span data by hostname and send them accordingly. You can find the hostname in the attributes of the resource_spans in each spans file.