Make Fake Data • charlatan

charlatan makes fake data, inspired from and borrowing some code from Python’s faker (https://github.com/joke2k/faker)

Make fake data for:

person names
jobs
phone numbers
colors: names, hex, rgb
credit cards
DOIs
numbers in range and from distributions
gene sequences
geographic coordinates
emails
URIs, URLs, and their parts
IP addresses
more coming …

Possible use cases for charlatan:

Students in a classroom setting learning any task that needs a dataset.
People doing simulations/modeling that need some fake data
Generate fake dataset of users for a database before actual users exist
Complete missing spots in a dataset
Generate fake data to replace sensitive real data with before public release
Create a random set of colors for visualization
Generate random coordinates for a map
Get a set of randomly generated DOIs (Digital Object Identifiers) to assign to fake scholarly artifacts
Generate fake taxonomic names for a biological dataset
Get a set of fake sequences to use to test code/software that uses sequence data

Reasons to use charlatan:

Light weight, few dependencies
Relatively comprehensive types of data, and more being added
Comprehensive set of languages supported, more being added
Useful R features such as creating entire fake data.frame’s

Installation

cran version

install.packages("charlatan")

dev version

remotes::install_github("ropensci/charlatan")

library("charlatan")
set.seed(12345)

high level function

… for all fake data operations

x <- fraudster()
x$job()
#> [1] "Corporate investment banker"
x$name()
#> [1] "Dr. Garey Hamill"
x$color_name()
#> [1] "Ivory"

locale support

ch_job(locale = "fr_FR", n = 3)
#> [1] "Tailleur de pierre" "Soigneur"           "Ingénieur"
ch_job(locale = "hr_HR", n = 3)
#> [1] "Stalni sudski vještak" "Viši muzejski pedagog" "Kozmetičar"
ch_job(locale = "uk_UA", n = 3)
#> [1] "Льотчик"  "Дипломат" "Педагог"
ch_job(locale = "zh_TW", n = 3)
#> [1] "行政人員"     "珠心算老師"   "飯店工作人員"

generate a dataset

ch_generate()
#> # A tibble: 10 × 3
#>    name                    job                                      phone_number
#>    <chr>                   <chr>                                    <chr>       
#>  1 Deana Mraz DDS          Printmaker                               +25(2)69696…
#>  2 Delina Kilback          Administrator, charities/voluntary orga… 04380296996 
#>  3 Mandi Bailey PhD        Systems analyst                          09381790856 
#>  4 Ms. Trista Jacobson DVM Pharmacist, hospital                     214-956-893…
#>  5 King Bartoletti         Teacher, music                           1-312-788-3…
#>  6 Dr. Ike Gerhold         Audiological scientist                   743.877.3448
#>  7 Dr. Tatyanna Blanda DVM Manufacturing systems engineer           09691101846 
#>  8 Antione Grant           Regulatory affairs officer               (406)994-27…
#>  9 Michal Gutmann          Chartered management accountant          (576)667-99…
#> 10 Ross Cartwright PhD     Video editor                             07913227887

ch_generate("job", "phone_number", n = 30)
#> # A tibble: 30 × 2
#>    job                               phone_number        
#>    <chr>                             <chr>               
#>  1 Scientist, research (medical)     +63(0)0054265468    
#>  2 Contracting civil engineer        +97(1)8445952277    
#>  3 Geneticist, molecular             167-865-4109x84457  
#>  4 Equities trader                   737.695.1498x1212   
#>  5 Interior and spatial designer     +49(7)9909862225    
#>  6 Geophysical data processor        1-884-863-2289x58137
#>  7 Ophthalmologist                   060-919-7672x6069   
#>  8 Engineer, agricultural            180-370-0811x1948   
#>  9 Dealer                            1-838-787-0534      
#> 10 Environmental health practitioner 884.224.4881        
#> # ℹ 20 more rows

job

ch_job()
#> [1] "Set designer"

ch_job(10)
#>  [1] "Actuary"                                    
#>  [2] "Public house manager"                       
#>  [3] "Orthoptist"                                 
#>  [4] "Broadcast engineer"                         
#>  [5] "Scientist, research (physical sciences)"    
#>  [6] "Nature conservation officer"                
#>  [7] "Camera operator"                            
#>  [8] "Psychologist, prison and probation services"
#>  [9] "Engineer, communications"                   
#> [10] "IT sales professional"

credit cards

ch_credit_card_provider()
#> [1] "JCB 15 digit"
ch_credit_card_provider(n = 4)
#> [1] "VISA 16 digit"               "Voyager"                    
#> [3] "JCB 15 digit"                "Diners Club / Carte Blanche"

ch_credit_card_number(n = 10)
#>  [1] "3009338214996378"    "4713530558707"       "3158362208111956356"
#>  [4] "53355347405525029"   "3720351812179086"    "3044619385256147"   
#>  [7] "3789072424345968"    "4208219491023"       "3096893682997724534"
#> [10] "4419344554874021"

ch_credit_card_security_code()
#> [1] "866"
ch_credit_card_security_code(10)
#>  [1] "351"  "462"  "439"  "1922" "497"  "879"  "998"  "368"  "280"  "337"

Documentation

All providers have documentation available through the help functions. All providers of the same locales, are linked together, and for every language we have a generic page, for example?`dutch-language`.

There are three vignettes, about contributing to this project, what {charlatan} does and a more in depth vignette about creating realistic data.

Usage in the wild

eacton/R-Utility-Belt-ggplot2 (https://github.com/eacton/R-Utility-Belt-ggplot2/blob/836a6bd303fbfde4a334d351e0d1c63f71c4ec68/furry_dataset.R)

Contributors

Roel M. Hogervorst (https://github.com/rmhogervorst)
Scott Chamberlain (https://github.com/sckott)
Kyle Voytovich (https://github.com/kylevoyto)
Martin Pedersen (https://github.com/MartinMSPedersen)

If you would like to contribute, see CONTRIBUTING (on github)

similar art

wakefield (https://github.com/trinker/wakefield)
ids (https://github.com/richfitz/ids)
rcorpora (https://github.com/gaborcsardi/rcorpora)
synthpop (https://cran.r-project.org/package=synthpop)

charlatan

Installation

high level function

locale support

generate a dataset

job

credit cards

Documentation

Usage in the wild

Contributors

similar art

Meta

About

Community

Resources