Feasability experiment: 'Hard-coded' disease terms against bioRxiv/medRxiv #18
jvwong
started this conversation in
Show and tell
Replies: 2 comments
-
Notes:
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Two more items: A. A look at a 2-week window (August 15-28), which, as expected, starts to give you at least 4/5 decent hits in a search across most terms: B. Summary looking at the bio(med)Rxiv total search hits for each term, up to 4 weeks. Looks pretty linear |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Goal
This empirical experiment was aimed at determining whether searching set of recently posted bioRxiv and medRxiv articles by individual (fixed) query terms, in this case, important/prevalent diseases, returns relevant results.
Approach
Search terms: Diseases
An initial experiment would require a handful of query terms referencing diseases. These 'top' diseases were based upon:
Information from these sources was used to build a set of 20 disease terms, as shown in Table I below.
*Searches used the MeSH term as a search tag (i.e. explicitly)
Data download
We are using the technology developed in this remote, launched by scripts in the
prototype-disease
branch.Data was retrieved from the bioRxiv and medRxiv for the following date ranges:
Search
A search was performed in 'strict' mode, where every hit contains all query tokens (e.g. for query "Alzheimer Disease", each hit contains "alzheimer" and "disease" (with stemming).
For each search term we report:
Results
A month of article results (July 2022)
Table II. Raw data for MONTH
A week of article results (August 19-25, 2022)
Table III. Raw data for WEEK
Beta Was this translation helpful? Give feedback.
All reactions