jst_combine_outputs()
helps you to manage the multitude of files you might
receive after running jst_import()
or jst_import_zip()
with more than
one batch.
Usage
jst_combine_outputs(
path,
write_to_file = TRUE,
out_path = NULL,
overwrite = FALSE,
clean_up = FALSE,
warn = TRUE
)
Arguments
- path
A path to a directory, containing .csv-files from
jst_import()
orjst_import_zip()
, or a vector of files which are to be imported.- write_to_file
Should combined data be written to a file?
- out_path
A directory where to write the combined files. If no directory is supplied and
write_to_file
isTRUE
, the combined files are written topath
.- overwrite
Should files be overwritten?
- clean_up
Do you want to remove the original batch files? Use with caution.
- warn
Should warnings be raised, if the file type cannot be determined?
Details
Splitting the output of jst_import()
or jst_import_zip()
might be done
for multiple reasons, but in the end you possibly want to combine all outputs
into one file/data.frame. This function makes a few assumptions in order to
combine files:
Files with similar names (except for trailing dashes with numbers) belong together and will be combined into one file.
The names of the combined files can be determined from the original files. If you want to combine
foo-1.csv
andfoo-2.csv
, the combined file will becombined_foo.csv
.The directory only contains files which were imported via
jst_import()
orjst_import_zip()
. If the directory contains other.csv
files, you should supply a character vector with paths to only those files, which you want to import.
Examples
# set up a temporary directory
tmp <- tempdir()
# find multiple files
file_list <- rep(jst_example("article_with_references.xml"), 2)
# convert and write to file
jst_import(file_list, "article", out_path = tmp, .f = jst_get_article,
n_batches = 2, show_progress = FALSE)
#> Starting to import 2 file(s).
#> Processing chunk 1/2
#> Processing chunk 2/2
#> Finished importing 2 file(s) in 0.29 secs.
# combine outputs
jst_combine_outputs(tmp)
#> Re-importing 2 batches.
#> Writing combined file `/tmp/RtmpmbBd7M/combined_article.csv` to disk.
#> Warning: The `path` argument of `write_csv()` is deprecated as of readr 1.4.0.
#> ℹ Please use the `file` argument instead.
#> ℹ The deprecated feature was likely used in the jstor package.
#> Please report the issue at <https://github.com/ropensci/jstor/issues>.
list.files(tmp, "csv")
#> [1] "article-1.csv" "article-2.csv" "combined_article.csv"
if (FALSE) { # \dontrun{
# Trying to combine the files again raises an error.
jst_combine_outputs(tmp)
} # }
# this doesn't
jst_combine_outputs(tmp, overwrite = TRUE)
#> Re-importing 2 batches.
#> Writing combined file `/tmp/RtmpmbBd7M/combined_article.csv` to disk.
# we can remove the original files too
jst_combine_outputs(tmp, overwrite = TRUE, clean_up = TRUE)
#> Re-importing 2 batches.
#> Writing combined file `/tmp/RtmpmbBd7M/combined_article.csv` to disk.
#> Deleting original batches.
list.files(tmp, "csv")
#> [1] "combined_article.csv"