Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bibtex for @inproceedings from Crossref #40

Open
mfenner opened this issue Feb 28, 2023 · 7 comments
Open

bibtex for @inproceedings from Crossref #40

mfenner opened this issue Feb 28, 2023 · 7 comments

Comments

@mfenner
Copy link
Contributor

mfenner commented Feb 28, 2023

@jowens for the ProceedingsArticle https://doi.org/10.1145/3448016.3452841 i get this bibtex output:

@inproceedings{https://doi.org/10.1145/3448016.3452841,
    author = {Pandey, Prashant and Conway, Alex and Durie, Joe and Bender, Michael A. and Farach-Colton, Martin and Johnson, Rob},
    booktitle = {Proceedings of the 2021 International Conference on Management of Data},
    copyright = {https://www.acm.org/publications/policies/copyright_policy#Background},
    doi = {10.1145/3448016.3452841},
    month = jun,
    publisher = {Association for Computing Machinery (ACM)},
    title = {Vector Quotient Filters},
    url = {https://dl.acm.org/doi/10.1145/3448016.3452841},
    urldate = {2021-06-09},
    year = {2021}
}

It missed the series information and adds a copyright field, but is it otherwise what you expect?

@jowens
Copy link

jowens commented Mar 1, 2023

The BibTeX title should be "Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design". If both title and subtitle fields are present, they should be joined with a colon and put into the BibTeX title field.

@jowens
Copy link

jowens commented Mar 1, 2023

There are no page numbers in your output (should be pages = {1386--1399}), but that might be an upstream problem.

@jowens
Copy link

jowens commented Mar 1, 2023

I paste in ACM's generated BibTeX for posterity:

@inproceedings{10.1145/3448016.3452841,
author = {Pandey, Prashant and Conway, Alex and Durie, Joe and Bender, Michael A. and Farach-Colton, Martin and Johnson, Rob},
title = {Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design},
year = {2021},
isbn = {9781450383431},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3448016.3452841},
doi = {10.1145/3448016.3452841},
abstract = {Today's filters, such as quotient, cuckoo, and Morton, have a trade-off between space and speed; even when moderately full (e.g., 50%-75% full), their performance degrades nontrivially. The result is that today's systems designers are forced to choose between speed and space usage. In this paper, we present the vector quotient filter (VQF). Locally, the VQF is based on Robin Hood hashing, like the quotient filter, but uses power-of-two-choices hashing to reduce the variance of runs, and thus offers consistent, high throughput across load factors. Power-of-two-choices hashing also makes it more amenable to concurrent updates, compared to the cuckoo filter and variants. Finally, the vector quotient filter is designed to exploit SIMD instructions so that all operations have O (1) cost, independent of the size of the filter or its load factor. We show that the vector quotient filter is 2\texttimes{} faster for inserts compared to the Morton filter (a cuckoo filter variant and state-of-the-art for inserts) and has similar lookup and deletion performance as the cuckoo filter (which is fastest for queries and deletes), despite having a simpler design and implementation. The vector quotient filter has minimal performance decline at high load factors, a problem that has plagued modern filters, including quotient, cuckoo, and Morton. Furthermore, we give a thread-safe version of the vector quotient filter and show that insertion throughput scales 3\texttimes{} with four threads compared to a single thread.},
booktitle = {Proceedings of the 2021 International Conference on Management of Data},
pages = {1386–1399},
numpages = {14},
keywords = {filters, membership query, dictionary data structure},
location = {Virtual Event, China},
series = {SIGMOD '21}
}

@jowens
Copy link

jowens commented Mar 1, 2023

Anyway, except for the title and pages, yes, it looks splendid!

@mfenner
Copy link
Contributor Author

mfenner commented Mar 1, 2023

This is very helpful. There are three groups of issues here:

  1. Metadata not submitted to Crossref during DOI registration
  2. Metadata not exposed in the JSON REST API
  3. Metadata not parsed correctly by commonmeta-py

Comparing the submitted XML and generated JSON helps distinguish between the three categories:

commonmeta-py supports both API calls using the via property (the default for Crossref DOIs is using the JSON REST API). In a Jupyter notebook (and soon on the command line) you can write:

string = '10.1145/3448016.3452841'
metadata = Metadata(string, via='crossref')
# or metadata = Metadata(string) as `crossref` is the default for a Crossref DOI
bibtex = metadata.bibtex()
print(bibtex)

string = '10.1145/3448016.3452841'
metadata = Metadata(string, via='crossref_xml')
bibtex = metadata.bibtex()
print(bibtex)

Results:

  • Page numbers are not submitted to Crossref
  • Abstract is not submitted to Crossref
  • Keywords are not submitted to Crossref
  • ISBN is not exposed in the Crossref JSON
  • Title/subtitle is not correctly processed by commonmeta-py
  • Event information (series, location) is not correctly processed by commonmeta-py

I will work on the last two, they seem pretty straightforward. I will tell Crossref to look into exposing the ISBN in the JSON REST API, and more importantly, work with Crossref members to include page numbers in DOI metadata. Not including abstracts in DOI metadata is a known issues, but missing page numbers is new to me. I have seen this before, but thought this info was lost during XML processing.

@jowens
Copy link

jowens commented Mar 2, 2023

soon on the command line
That is fantastic.

This is all fantastic! You are doing amazing work here. Always excited when any other human is as dedicated to quality bibtex as I am. :)

@mfenner
Copy link
Contributor Author

mfenner commented Mar 6, 2023

I published commonmeta-py 0.7.0 on PyPi. The bibtex for your example DOI – when using the crossref_xml reader because some metadata (e.g. page numbers) are missing in the Crossref JSON – looks like this:

@inproceedings{https://doi.org/10.1145/3448016.3452841,
    author = {Pandey, Prashant and Conway, Alex and Durie, Joe and Bender, Michael A. and Farach-Colton, Martin and Johnson, Rob},
    booktitle = {Proceedings of the 2021 International Conference on Management of Data},
    copyright = {https://www.acm.org/publications/policies/copyright_policy#Background},
    doi = {10.1145/3448016.3452841},
    isbn = {9781450383431},
    location = {Virtual Event China},
    month = jun,
    pages = {1386-1399},
    publisher = {Association for Computing Machinery (ACM)},
    series = {SIGMOD/PODS '21},
    title = {Vector Quotient Filters: Overcoming the Time/Space Trade-Off in Filter Design},
    url = {https://dl.acm.org/doi/10.1145/3448016.3452841},
    urldate = {2021-06-09},
    year = {2021}
}

The only thing missing is abstract and keywords, but that info is not submitted to Crossref. And some minor formatting differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants