papercheck

The goal of papercheck is to automatically check scientific papers for best practices.

Installation

You can install the development version of papercheck from GitHub with:

# install.packages("devtools")
devtools::install_github("scienceverse/papercheck")

You can launch an interactive shiny app version of the code below with:

papercheck_app()

Example

library(papercheck)

Convert a PDF to grobid XML format, then read it in as a paper object.

pdf <- demopdf() # use the path of your own PDF
grobid <- pdf2grobid(pdf)
paper <- read_grobid(grobid)

Search Text

Search the returned text. The regex pattern below searches for text that looks like statistical values (e.g., N=313 or p = 0.17).

pattern <- "[a-zA-Z]\\S*\\s*(=|<)\\s*[0-9\\.-]*\\d"
text <- search_text(paper, pattern, 
                    return = "match", 
                    perl = TRUE)

text	section	header	div	p	s	id
M = 9.12	results	Results	3	1	2	to_err_is_human.xml
M = 10.9	results	Results	3	1	2	to_err_is_human.xml
t(97.7) = 2.9	results	Results	3	1	2	to_err_is_human.xml
p = 0.005	results	Results	3	1	2	to_err_is_human.xml
M = 5.06	results	Results	3	2	1	to_err_is_human.xml
M = 4.5	results	Results	3	2	1	to_err_is_human.xml
t(97.2) = -1.96	results	Results	3	2	1	to_err_is_human.xml
p = 0.152	results	Results	3	2	1	to_err_is_human.xml

Large Language Models

You can query the extracted text of papers with LLMs using groq.

Use search_text() first to narrow down the text into what you want to query. Below, we returned the first two papers’ introduction sections, and returned the full section. Then we asked an LLM “What is the hypothesis of this study?”.

hypotheses <- search_text(papers[1:2], 
                          section = "intro", 
                          return = "section")
query <- "What is the hypothesis of this study? Answer as briefly as possible."
llm_hypo <- llm(hypotheses, query)

id	answer
eyecolor.xml	The hypothesis of this study is that humans exhibit positive sexual imprinting, where individuals choose partners with physical characteristics similar to those of their opposite-sex parent.
incest.xml	The hypothesis is that moral opposition to third-party sibling incest is greater among individuals with other-sex siblings than among individuals with same-sex siblings.

Batch Processing

The functions pdf2grobid() and read_grobid() also work on a folder of files, returning a list of XML file paths or paper objects, respectively. The functions search_text(), expand_text() and llm() also work on a list of paper objects.

# read in all the XML files in the demo directory
grobid_dir <- demodir()
papers <- read_grobid(grobid_dir)

# select sentences in the intros containing the text "previous"
previous <- search_text(papers, "previous", 
                        section = "intro", 
                        return = "sentence")

text	section	header	div	p	s	id
Royzman et al’s non-replication potentially calls into question the reliability of previously reported links between having an other-sex sibling and moral opposition to third-party sibling incest.	intro	Introduction	1	3	3	incest.xml
Previous research has shown that making cost-benefit analyses of using statistical approaches explicit can influence researchers’ attitudes.	intro	[div-01]	1	8	5	prereg.xml
When exploring difference in responses between previous experience with pre-registration, we see a clear trend where reasearchers who have pre-registered studies in their own research indicate pre-registration is more beneficial, and indicate higher a higher likelihood of pre-registering studies in the future, and higher percentage of studies for which they would consider pre-registering (see Table 2).	intro	Attitude	3	7	1	prereg.xml

Modules

Papercheck is designed modularly, so you can add modules to check for anything. It comes with a set of pre-defined modules, and we hope people will share more modules.

You can see the list of built-in modules with the function below.

module_list()

all-p-values: List all p-values in the text, returning the matched text (e.g., ‘p = 0.04’) and document location in a table.
all-urls: List all the URLs in the main text
imprecise-p: List any p-values reported with insufficient precision (e.g., p < .05 or p = n.s.)
llm-summarise: Generate a 1-sentence summary for each section
marginal: List all sentences that describe an effect as ‘marginally significant’.
osf-check: List all OSF links and whether they are open, closed, or do not exist.
ref-consistency: Check if all references are cited and all citations are referenced
retractionwatch: Flag any cited papers in the RetractionWatch database
sample-size-ml: [DEMO] Classify each sentence for whether it contains sample-size information, returning only sentences with probable sample-size info.
statcheck: Check consistency of p-values and test statistics

To run a built-in module on a paper, you can reference it by name.

p <- module_run(paper, "all-p-values")

text	section	header	div	p	s	id
p = 0.005	results	Results	3	1	2	to_err_is_human.xml
p = 0.152	results	Results	3	2	1	to_err_is_human.xml
p > .05	results	Results	3	2	2	to_err_is_human.xml

Reports

You can generate a report from any set of modules. The default set is c("imprecise-p", "marginal", "osf-check", "retractionwatch", "ref-consistency")

paper_path <- report(paper, output_format = "html")

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github		.github
R		R
data-raw		data-raw
data		data
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
profile		profile
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
papercheck.Rproj		papercheck.Rproj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

papercheck

Installation

Example

Search Text

Large Language Models

Batch Processing

Modules

Reports

About

Licenses found

Languages

License

Licenses found

scienceverse/papercheck

Folders and files

Latest commit

History

Repository files navigation

papercheck

Installation

Example

Search Text

Large Language Models

Batch Processing

Modules

Reports

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Languages