Coder Social home page Coder Social logo

faker-cli's Introduction

Faker CLI

Faker is an awesome Python library, but I often just want a simple command I can run to generate data in a variety of formats.

With Faker CLI, you can easily generate CSV, JSON, or Parquet data with fields of your choosing.

You can also utilize pre-built templates for common data formats!

Installation

pip install faker-cli

Tip

To use Parquet or Delta Lake, use pip install faker-cli[parquet] or pip install faker-cli[delta]

Usage

Once installed you should have the fake command in your path. Run the following see usage / help:

fake --help

By default, fake will generate a CSV output for you. You just specify the number of rows you want and the column types.

fake -n 10 pyint,user_name,date_this_year

BAM! You've got a CSV file with your data.

pyint,user_name,date_this_year
8649,fward,2023-03-08
3933,zharris,2023-03-20
1469,jasonellis,2023-05-16
3660,heather91,2023-02-10
9160,cameronlopez,2023-05-05
2735,candacemoore,2023-05-12
7240,zachary06,2023-01-23
9778,thomasstacey,2023-05-23
5820,kenneth36,2023-04-26
2856,michael23,2023-01-16

JSON

Wnat a JSON file? Sweet, use -f json.

fake -n 10 pyint,user_name,date_this_year -f json
{"pyint": 3854, "user_name": "cchavez", "date_this_year": "2023-01-20"}
{"pyint": 2008, "user_name": "vnguyen", "date_this_year": "2023-04-03"}
{"pyint": 1434, "user_name": "karen38", "date_this_year": "2023-03-02"}
{"pyint": 4922, "user_name": "duncanellen", "date_this_year": "2023-04-22"}
{"pyint": 230, "user_name": "tiffany72", "date_this_year": "2023-02-25"}
{"pyint": 7252, "user_name": "maydouglas", "date_this_year": "2023-04-01"}
{"pyint": 2716, "user_name": "sheilaflores", "date_this_year": "2023-03-20"}
{"pyint": 2827, "user_name": "parksandra", "date_this_year": "2023-04-01"}
{"pyint": 3353, "user_name": "melissaatkinson", "date_this_year": "2023-02-10"}
{"pyint": 5306, "user_name": "mark12", "date_this_year": "2023-04-16"}

Column Names

Default column names aren't good enough for you? Fine, use your own.

fake -n 10 pyint,user_name,date_this_year -f json -c id,awesome_name,last_attention_at
{"id": 6048, "awesome_name": "jtran", "last_attention_at": "2023-04-24"}
{"id": 4310, "awesome_name": "stacey99", "last_attention_at": "2023-04-27"}
{"id": 1839, "awesome_name": "jho", "last_attention_at": "2023-03-07"}
{"id": 236, "awesome_name": "melissamassey", "last_attention_at": "2023-04-17"}
{"id": 6599, "awesome_name": "mwells", "last_attention_at": "2023-04-25"}
{"id": 6071, "awesome_name": "wilcoxrick", "last_attention_at": "2023-01-17"}
{"id": 9646, "awesome_name": "michael92", "last_attention_at": "2023-04-22"}
{"id": 6986, "awesome_name": "ballen", "last_attention_at": "2023-01-08"}
{"id": 6892, "awesome_name": "jennifer61", "last_attention_at": "2023-01-03"}
{"id": 1967, "awesome_name": "jmendoza", "last_attention_at": "2023-01-23"}

Provider Arguments

Some Faker providers (like pyint) take arguments. You can also specify those if you like, separated by semi-colons (because some arguments take a comma-separated string :))

fake -n 10 "pyint(1;100),credit_card_number(amex),pystr_format(?#-####)" -f json -c id,credit_card_number,license_plate

Important

When using arguments with output formats like JSON, it's best to provide column headers as well with -c.

And unique values are supported as well.

fake -n 10 "unique.pyint(1;10),unique.name"

Parquet

OK, it had to happen, you can even write Parquet.

Install with the parquet module: pip install faker-cli[parquet]

fake -n 10 pyint,user_name,date_this_year -f parquet -o sample.parquet

youcanevenwritestraighttos3 ๐Ÿคญ

fake -n 10 pyint,user_name,date_this_year -f parquet -o s3://YOUR_BUCKET/data/sample.parquet

Delta Lake

Data can be exported as a delta lake table.

Install with the delta module: pip install faker-cli[delta]

fake -n 10 pyint,user_name,date_this_year -f deltalake -o sample_data

Templates

The libary includes a couple templates that can be used to generate certain types of fake data easier.

Today, the only templates that exist are for S3 Access and CloudFront logs.

Want to generate 1 MILLION S3 Access logs in ~2 minutes? Now you can. (But I only show 10 below so as not to crash your terminal)

fake -t s3access -n 10

How about CloudFront? Go ahead.

fake -t cloudfront -n 10

Warning: Both of these templates are still being validated - please be cautious!

faker-cli's People

Contributors

dacort avatar rahulj51 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

faker-cli's Issues

Make delta an extra module

For lightweight usages, Delta support added 37MB of extra downloads. Could be nice if we had an extra modules so you could do pip install faker-cli[delta] or similar.

Collecting faker-cli
  Downloading faker_cli-0.3.0-py3-none-any.whl (12 kB)
Collecting click<9.0.0,>=8.1.3
  Downloading click-8.1.6-py3-none-any.whl (97 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 97.9/97.9 kB 860.9 kB/s eta 0:00:00
Collecting deltalake<0.10.0,>=0.9.0
  Downloading deltalake-0.9.0-cp37-abi3-macosx_10_9_x86_64.macosx_11_0_arm64.macosx_10_9_universal2.whl (37.2 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 37.2/37.2 MB 1.4 MB/s eta 0:00:00
Collecting faker<19.0.0,>=18.9.0
  Using cached Faker-18.13.0-py3-none-any.whl (1.7 MB)
Collecting pyarrow<13.0.0,>=12.0.0
  Downloading pyarrow-12.0.1-cp311-cp311-macosx_10_14_x86_64.whl (24.7 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 24.7/24.7 MB 743.0 kB/s eta 0:00:00
Collecting python-dateutil>=2.4
  Downloading python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 247.7/247.7 kB 2.2 MB/s eta 0:00:00
Collecting numpy>=1.16.6
  Downloading numpy-1.25.2-cp311-cp311-macosx_10_9_x86_64.whl (20.8 MB)
     โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ” 20.8/20.8 MB 1.2 MB/s eta 0:00:00
Collecting six>=1.5
  Downloading six-1.16.0-py2.py3-none-any.whl (11 kB)
Installing collected packages: six, numpy, click, python-dateutil, pyarrow, faker, deltalake, faker-cli

Saving Parquet to S3 on M1 mac results in a segfault

A simple fake command to output parquet data to S3 results in a segfault.

โฏ fake -n 1000 pyint,user_name,date_this_year -c id,awesome_name,last_attention_at -f parquet -o s3://<BUCKET>/data/sample.parquet
zsh: segmentation fault  fake -n 1000 pyint,user_name,date_this_year -c  -f parquet -o 

Some more details on the stack trace when trying to write without being authenticated:

Traceback (most recent call last):
  File "/private/tmp/faker/.venv/bin/fake", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/faker_cli/cli.py", line 84, in main
    writer.close()
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/faker_cli/writer.py", line 59, in close
    pq.write_table(self.table, self.filename)
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/pyarrow/parquet/core.py", line 3084, in write_table
    with ParquetWriter(
         ^^^^^^^^^^^^^^
  File "/private/tmp/faker/.venv/lib/python3.11/site-packages/pyarrow/parquet/core.py", line 995, in __init__
    sink = self.file_handle = filesystem.open_output_stream(
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/_fs.pyx", line 868, in pyarrow._fs.FileSystem.open_output_stream
  File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
OSError: When initiating multiple part upload for key 'data/sample.parquet' in bucket '<BUCKET>': AWS Error ACCESS_DENIED during CreateMultipartUpload operation: Anonymous users cannot initiate multipart uploads.  Please authenticate.
zsh: segmentation fault  fake -n 1000 pyint,user_name,date_this_year -c  -f parquet -o 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.