streamable chunked parquet using arrow
Details
Parquet files are streamed to disk by breaking them into chunks that are
equal to the nlines
parameter in the initial call to ark
. For each tablename
, a
folder is created and the chunks are placed in the folder in the form part-000000.parquet
.
The software looks at the folder, and increments the name appropriately for the next
chunk. This is done intentionally so that users can take advantage of arrow::open_dataset
in the future, when coming back to review or perform analysis of these data.