Skip to contents

authors_clean This function takes the output from references_read and cleans the author information.

Usage

authors_clean(references)

Arguments

references

output from references_read

Details

Information on addresses, emails, ORCIDs, etc are matched.

It then attempts to match same author entries together into likely author groups based on common full names, addresses, emails, ORCIDs etc.

Records that are not matched this way have a Jaro-Winkler similiarty analysis metric calculated for all possible matching author names.

This calculates the amount of character similarities based on distance of similar character.

Examples

## Load the refsplitr sample dataset "BITR" 
data(BITR) 
BITR_clean <- authors_clean(BITR)
#> 
#> Splitting author records
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
#> Splitting addresses
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========                                                            |  14%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===============                                                       |  21%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================================================               |  79%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |============================================================          |  86%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
#> Matching authors
#> 
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===                                                                   |   4%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======                                                                |   9%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========                                                              |  11%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===========                                                           |  16%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=================                                                     |  24%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====================                                                  |  29%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================                                                |  31%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========================                                             |  36%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===============================                                       |  44%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================================                                    |  49%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====================================                                  |  51%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================================                               |  56%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=============================================                         |  64%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |================================================                      |  69%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================================================                    |  71%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=====================================================                 |  76%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===========================================================           |  84%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==============================================================        |  89%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |================================================================      |  91%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================================================   |  96%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |                                                                      |   0%
  |                                                                            
  |======================================================================| 100%
#> 
#> Pruning groupings...

## The output of authors_clean is a list with two elements, 
## which can be assigend to dataframes.
BITR_review_df <- BITR_clean$review
BITR_prelim_df <- BITR_clean$prelim

## Users can save the these dataframes outside of R as .csv files.
## The "review_df.csv" is then used to review the groupID or authorID 
## assignments and make any necessary corrections. 
## The function "authors_refine" is used to load and merge the changes 
## into R and create a dataframe used for analyses.