-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elements -> cRT2 -> eSchol speed ideas #74
Comments
Low-hanging-fruit: Stop printing the full author diffs
|
eSchol API's item access query 500-author limitDescription
Effect this has
Possible fixes
Invasiveness levelThis change would be somewhat involved, in that we'd have to modify the eSchol API, and possibly connectRT2 -- However, it effects mainly the later, post-Elements stages of the syncing process, meaning it shouldn't (e.g.) trigger a full resync of every publication from the Elements side. |
Journal-specific metadata fields (fpage, lpage, etc)?During the prod migration, while monitoring the RT2 output and comparing against the resulting eScholarship metadata, I noticed that the updates for fpage, lpage, journal, and few others didn't seem to be registering in eScholarship. Example (connectRT2 logs):
Scope of the problemDuring December 2024, each day there were between 200 and 1,500 of these journal-related metadata updates. It's unclear how many of these are actually updating anything. |
What
During the last two years, the increasing traffic between Elements and eScholarship during Elements' diff sync process has necessitated occasional modifications to programs involved in the syncing process.
This card will be used to collect various ideas on improving the diff sync runtime, should we require this. Presently, the diff syncing is running at an acceptable speed -- and frankly, the less we monkey around with this system, the better.
Background
Modifications anywhere in this chain of programs can have DRAMATIC effects on the diff syncing process' runtime. This is especially true with the Elements' Relevance Scheme and Crosswalks, which are the first steps in determining whether a pub should proceed through the diff syncing initially.
This syncing process is very complex, and its runtime is effected by the Elements Relevance Scheme and Crosswalk files; connectRT's transform steps (for both input and output); the eScholarship API; and the prodigious & ever-increasing scholarly output the UC system produces.
Historically, much of the complexity comes from layering new systems atop existing systems. For example (working from the "outside-in"):
The Syncing Process in Detail
These programs involved in this process are as follows, roughly in the order they're triggered by the syncing process:
For more information, see this google doc.
The text was updated successfully, but these errors were encountered: