Skip to contents

streamable chunked parquet using arrow




a streamable_table object (S3)


Parquet files are streamed to disk by breaking them into chunks that are equal to the nlines parameter in the initial call to ark. For each tablename, a folder is created and the chunks are placed in the folder in the form part-000000.parquet. The software looks at the folder, and increments the name appropriately for the next chunk. This is done intentionally so that users can take advantage of arrow::open_dataset in the future, when coming back to review or perform analysis of these data.