Welcome! biblio-reader is a literature parsing tool based on Christian Kreibich's scholar.py that compiles and analyzes publications matched by Google Scholar searches. For publications found on Google Scholar between pages 1 and 99, it can do the following:
- Compile key information from each publication (such as article title, year, authors, journal title, URL, and citations)
- Write all key information into a CSV file
- Look for trends in journal fields, publication growth over the years, publication types, journal impact, citations, and more
- Find and display author information, including relationships between each author and attributed articles
- Help users find full-text PDFs for each publication
- Subsequently analyze and categorize full text files for each PDF
- Map author affiliations on Google Maps
- Facilitate manual publication review, including assigning articles to separate reviewers and analyzing their input
- Create a sortable table displaying publications and key information about each article
Manager.py is the utilities manager, and provides support for reading and writing files through the inputs, outputs, and working directories. It is in charge of updating the main data CSV file with the update_data() method.
This is also where users can enter project-specific variables including marking which publications are connected to the original work of interest, and categories of search terms with regular expressions Google Scholar may have used to find them.
scholar.py is where the original Google Scholar results are compiled. (More in link)
See README
Directory containing all user inputs. Journal categories and attributes have been included. For full text analysis, all PDFs should go here inside a subdirectory entitled "pdfs".
Manual review and categorization of each publication should be stored here as well, under a subdirectory entitled "article_review". It should contain csv files with specific categorization of each article that will then be analyzed by 'review_analysis.py' in biblio_reader directory.
All final outputs are stored here. This includes matplotlib graphs generated by scholar_reader.py, the main CSV file, and the reviewer assignments.
Provides location for intermediate files including Pubmed bibliographies, keyword paragraphs, and TXT converted PDFs.
Provides support for creating a sortable, viewable table HTML based on the csv file. In data_mg.py, the data can be filtered to only show specific publications based on criteria set by the user.