Skip to contents

jst_combine_outputs() helps you to manage the multitude of files you might receive after running jst_import() or jst_import_zip() with more than one batch.

Usage

jst_combine_outputs(
  path,
  write_to_file = TRUE,
  out_path = NULL,
  overwrite = FALSE,
  clean_up = FALSE,
  warn = TRUE
)

Arguments

path

A path to a directory, containing .csv-files from jst_import() or jst_import_zip(), or a vector of files which are to be imported.

write_to_file

Should combined data be written to a file?

out_path

A directory where to write the combined files. If no directory is supplied and write_to_file is TRUE, the combined files are written to path.

overwrite

Should files be overwritten?

clean_up

Do you want to remove the original batch files? Use with caution.

warn

Should warnings be raised, if the file type cannot be determined?

Value

Either writes to disk, or returns a list with all combined files.

Details

Splitting the output of jst_import() or jst_import_zip() might be done for multiple reasons, but in the end you possibly want to combine all outputs into one file/data.frame. This function makes a few assumptions in order to combine files:

  • Files with similar names (except for trailing dashes with numbers) belong together and will be combined into one file.

  • The names of the combined files can be determined from the original files. If you want to combine foo-1.csv and foo-2.csv, the combined file will be combined_foo.csv.

  • The directory only contains files which were imported via jst_import() or jst_import_zip(). If the directory contains other .csv files, you should supply a character vector with paths to only those files, which you want to import.

See also

Examples

# set up a temporary directory
tmp <- tempdir()

# find multiple files
file_list <- rep(jst_example("article_with_references.xml"), 2)

# convert and write to file
jst_import(file_list, "article", out_path = tmp, .f = jst_get_article,
             n_batches = 2, show_progress = FALSE)
#> Starting to import 2 file(s).
#> Processing chunk 1/2
#> Processing chunk 2/2
#> Finished importing 2 file(s) in 0.26 secs.
             
# combine outputs
jst_combine_outputs(tmp)
#> Re-importing 2 batches.
#> Writing combined file `/tmp/RtmpJm2eCS/combined_article.csv` to disk.
#> Warning: The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
#>  Please use the `file` argument instead.
#>  The deprecated feature was likely used in the jstor package.
#>   Please report the issue at <https://github.com/ropensci/jstor/issues>.
list.files(tmp, "csv")
#> [1] "article-1.csv"        "article-2.csv"        "combined_article.csv"

if (FALSE) {
# Trying to combine the files again raises an error.
jst_combine_outputs(tmp)
}

# this doesn't
jst_combine_outputs(tmp, overwrite = TRUE)
#> Re-importing 2 batches.
#> Writing combined file `/tmp/RtmpJm2eCS/combined_article.csv` to disk.

# we can remove the original files too
jst_combine_outputs(tmp, overwrite = TRUE, clean_up = TRUE)
#> Re-importing 2 batches.
#> Writing combined file `/tmp/RtmpJm2eCS/combined_article.csv` to disk.
#> Deleting original batches.
list.files(tmp, "csv")
#> [1] "combined_article.csv"