SciAnalyzer enables an individual selection of scientific articles published after January 1st, 2000 and that are accessible via the PubMedCentral's (PMC) database. Metadata and study characteristics are extracted from the XML file collection with JATSdecoder. The extracted article features are displayed in tables, bar plots, network graphs, word clouds, trend charts and interactive scatter plots.

Notes on generalizability: The PMC article collection represents a big, but still selective sample of scientific content in general. All analyses are based on JATSdecoder's extraction heuristics, which may produce false positive and negative hits. Always consider these circumstances, when drawing conclusions from your analysis.

Notes on processing performance: The bigger your selection is, the longer some analytical processes may take. Make careful choices on your article selection, before starting an analysis. Some analyses are restricted to narrow selections.

You can download the raw data of a selection limited to 20,000 articles within the 'Raw data'-tab and perform an individual analysis with your selection.


Global summary of article selection

Your selection consists of:
Loading...

Define a new search task and update article selection

You may add several free textual search terms or select from specific values to identify documents of your interest. Text searches are not case-sensitive. You must update the data table after your search query is specified by clicking the button below.

Note: All fields will be connected with an AND operator. Entries within a field are connected with the OR operator in standard mode.





Study characteristics

You have several options to identify studies with specific methodological features that where extracted with JATSdecoder.

Data table of article selection

The table contains the full data which results in your selection. Sort the data table by any column to identify articles with specific features within your article selection. You can download the full data table to process individual analysis:



            
Loading...

Global statistics of article selection

Number of released articles per year

Loading...

Distribution of article types

Loading...

Number of articles by journal

Loading...

Article processing time

The processing times are calculated based on the recieved, accepted and the first publishing date (including date of pre print).

As creating the graph may be a bit time consuming, the graph is limited to selections of less than 2,000 articles. You must activate the checkbox to generate the graph of publishing dates.


Loading...

Study characteristics of article selection

All study characteristics were extracted with JATSdecoders function study.character().


Distribution of estimated sample size

Loading...

Distribution of extracted maximum alpha-error

Loading...

Distribution of extracted test power

Loading...

Distribution of number of studies per article

Loading...

Distribution of extracted alpha-error correction procedure

Loading...

Distribution of analytical software used

Loading...

Distribution of categorized statistical method used

The table contains the absolute number of articles using the specific statistical method, which are extracted from the raw list of extracted methods (see below) with a set of regular expressions.

Loading...

Distribution of uncategorized statistical method used

As the result space may be very large, you must activate the checkbox to generate the frequency table of extracted methods by JATSdecoder's function get.method().

You may reduce the frequency table to specific methods containing one or several search terms.


Loading...

Topic wordcloud

A collection of the most or less often extracted keywords and/or subjects is displayed as word cloud. To increase the information level and well fitting to the canvas, the most often detected levels can be omitted. To reduce duplications, character lowerization can be activated.

Note: If the graphic results small or empty, try the removal option until the graphic is respectable.




Country of origin and country connections

Most articles supply the authors country information. If authors from distinct countries are involved in a publication, a country connection matrix is generated to be displayed in the world map, if desired. If several authors from the same country are involved in a publication, their country is only counted once.

You may select a specific region of interest and activate the checkbox to add country connection lines to the map.

Loading...

Most frequent countries of origin

Loading...

Most frequent country collaborations

Loading...

Author involvement

Loading...

Author names that are supplied as ORCIDs are converted to readable names with rorcid. A combined name search that includes the authors names and affiliations and/or ORCIDs is recommended to generate a precise network graph without name siblings. Articles that are published by more than 25 authors are omitted for connection analysis, due to unhandable connection length.

Author network

Loading...

Most frequent author names

Loading...

Most frequent author collaborations

Loading...

Trend Analysis

Select a variable and analyze changes in methodological study characteristics over time. Since study features can be multidimensional (contain more than one extraction), the sum of the detections can result in higher absolute frequencies than analyzed articles / relative frequencies greater than 1.

Loading...

Statistical results

JATSdecoder's function get.stats() is used to extract the reported statistical results within the text parts of the documents. Results reported in tables and figures are not exported. If a result is reported in a manner that enables a recomputation of the p-value (e.g.: 't(120)=1.96, p=.05'), get.stats() performs this recalculation ('recalculatedP').

Overview

Loading...

Distribution of extracted results

Select a statistical measure to display its categorized distribution within the article selection. You can switch between cumulated, absolute and relative view

Loading...

Distribution of statistical measures

Select two statistical measures to investigate their interplay and find studies with interesting co-occurances.

Graphing the recalculated and reported p-values enables a quick consistency check. Deviations may be caused by reporting or extraction errors, as well as adjusted p-values. To facilitate the detection of inconsistencies, the graph contains lines for expected p-values of two- and one-sided test results.
The extracted degree of freedom can be considered as a vague estimate of sample size. In case of F-values that are reported with df1 and df2, only df2 is extracted and displayed
You may identify studies with big effects (Cohen's d, eta^2, r) in big samples (high degrees of freedom).
The identification of studies by a single measure can be realized by selecting the same measure twice.

- Hover a dot to get details about the article and the result.
- Click on a dot to open the DOI link of the article in a new tab.
- You may zoom into the graph by selecting a specific area with your mouse courser. Double-click to zoom back to full scales.

Loading...

About SciAnalyzer

SciAnalyzer was built with RStudio and the shiny package. The metadata and study characteristics displayed were extracted with the JATSdecoder package.

JATSdecoder

JATSdecoder is a metadata and text extraction and manipulation tool set for the statistical programming language R. JATSdecoder facilitates text mining projects on scientific research papers by enabling an individual selection of metadata and text parts. Its function JATSdecoder() extracts metadata, sectioned text and reference list from NISO-JATS coded XML files. Its function study.character() uses the JATSdecoder() result to perform fine-tuned text extraction tasks to identify key study characteristics like statistical methods used, alpha-error, statistical results reported in text and others.

An installation and usage guide for the JATSdecoder package is stored at its github repository (https://github.com/ingmarboeschen/JATSdecoder)

Resources

SciAnalyzer uses various resources to extract, process and display the metadata and study characteristics.

Article data

- The full open-acces PubMedCentral data base was bulk downloaded as NXML files from: ftp://ftp.ncbi.nlm.nih.gov/pub/pmc

R packages

Extraction of metadata and study characteristics
- Böschen, Ingmar (2021). JATSdecoder: A metadata and text extraction and manipulation tool set.
- Chamberlain, Scott (2021). rorcid: Interface to the 'Orcid.org' API.

Barplots, Trend chart
- Böschen, Ingmar (2021). graphing.

Data management
- Ooms, Jeroen (2014). The jsonlite Package: A Practical and Consistent Mapping Between JSON Data and R Objects. arXiv:1403.2805 [stat.CO]

Involved countries on worldmap
- South, Andy (2011). rworldmap: A New R package for Mapping Global Data. The R Journal Vol. 3/1 : 35-43.
- Hijmans, R. J., Williams, E., Vennes, C., & Hijmans, M. R. J. (2017). Package 'geosphere'. Spherical trigonometry, 1(7).
- Bivand, R., Rundel, C., Pebesma, E., Stuetz, R., Hufthammer, K. O., & Bivand, M. R. (2017). Package 'geosphere'. The Comprehensive R Archive Network (CRAN).
- Neuwirth, E., & Brewer, R. C. (2014). ColorBrewer palettes.
- Arel-Bundock, V., Enevoldsen, N., & Yetman, C. J. (2018). countrycode: An R package to convert country names and country codes. Journal of Open Source Software, 3(28), 848.

Author network graph
- Csárdi, Gábor & Nepusz, Tamás. (2016). The igraph software package for complex network research. InterJournal Complex Systems, 1695.
- Feinerer, I., Hornik, K., & Feinerer, M. I. (2015). Package 'tm'. Corpus, 10(1).

Interactive visualisations
- Sievert, Carson (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC.
- Lang, Dawei & Chien, Guan-tin (2018). wordcloud2: Create Word Cloud by 'htmlwidget'.


How to cite SciAnalyzer / JATSdecoder

- Böschen, I. (2024). SciAnalyzer.com. Web application. www.SciAnalyzer.com

- Böschen, I. (2023). JATSdecoder: A metadata and text extraction and manipulation tool set. R package version 1.2.0.

Articles

- Böschen, I. (2024) statcheck is flawed by design and no valid spell checker for statistical results. arXiv preprint https://arxiv.org/abs/2408.07948

- Böschen, I. (2023) Changes in methodological study characteristics in psychology between 2010-2021. PLOS ONEhttps://doi.org/10.1371/journal.pone.0283353

- Böschen, I. (2023) Evaluation of the extraction of methodological study characteristics with JATSdecoder. Scientific Reports 13, 139. https://doi.org/10.1038/s41598-021-98782-3

- Böschen, I. (2021) Software review: The JATSdecoder package—extract metadata, abstract and sectioned text from NISO-JATS coded XML documents; Insights to PubMed Central’s open access database. Scientometrics. https://doi.org/10.1007/s11192-021-04162-z

- Böschen, I. (2021) Evaluation of JATSdecoder as an automated text extraction tool for statistical results in scientific reports. Scientific Reports 11, 19525. https://doi.org/10.1038/s41598-021-98782-3


Funding

JATSdecoder was developed for my dissertation project about the evolution of methodological characteristics in psychological research and is financed by a grant awarded by the Department of Research Methods and Statistics, Institute of Psychology, University Hamburg, Germany.

Contact

If you have any questions or recommendations feel free to contact me:

Dr. Ingmar Böschen
University Hamburg
Institute of Psychology
Research Methods and Statistics
Von-Melle-Park 5
20146 Hamburg
Germany
ingmar.boeschen@uni-hamburg.de

Thanks to:
- Dennis Warnholtz for technical assistance on server and mongoDB issues
- Petra Begas for improving the sample size extraction from abstracts


Hosted by:
http://www.uni-hamburg.de
Department of Research Methods and Statistics
Dr. Ingmar Böschen
Von-Melle-Park 5
20146 Hamburg
Germany
Empowered by:
https://github.com/ingmarboeschen/JATSdecoder