Coder Social home page Coder Social logo

joke2k / faker Goto Github PK

View Code? Open in Web Editor NEW
17.1K 221.0 1.9K 9.62 MB

Faker is a Python package that generates fake data for you.

Home Page: https://faker.readthedocs.io

License: MIT License

Python 99.99% Makefile 0.01% Shell 0.01%
python fake testing dataset fake-data test-data test-data-generator faker faker-generator

faker's People

Contributors

bact avatar chrisvoncsefalvay avatar clarmso avatar confirmationbias616 avatar crd avatar dependabot[bot] avatar fcurella avatar george0st avatar grantbachman avatar guinslym avatar iamjazzar avatar illia-v avatar item4 avatar jdufresne avatar joke2k avatar jremes-foss avatar kdeldycke avatar kity-linuxero avatar malefice avatar mdxs avatar metcalfetom avatar mondeja avatar pdaw avatar pishchalnikov avatar ppeemk avatar reverbc avatar sanga avatar vema avatar zeal18 avatar zulupro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

faker's Issues

Enhance random generator repeatibility

The docs mention being able to call the seed() method so you can use a generated dataset as part of a unit test.

Due to the way Faker uses the random module, this usecase is a bit fragile. Any modification to the data requested, or any outside uses of the random module during generation will diverge the dataset.

Here is a quick script demonstrating the problem along with a couple of potential solutions:

import random
from faker import Faker
fake = Faker()

# initial run
fake.seed(1234)
print fake.name()
print fake.name()
print fake.name()

# repeated run with same data
fake.seed(1234)
print fake.name()
print fake.name()
print fake.name()

# adding new fake calls prevent us from getting the same names we had originally
fake.seed(1234)
print fake.name(), fake.email()
print fake.name(), fake.email()
print fake.name(), fake.email()

# One way is to implement a preserve/restore mechanism so that the user can get back to the previous trail of data
fake.seed(1234)
print fake.name()
r = random.getstate()
print fake.email()
random.setstate(r)
print fake.name()
r = random.getstate()
print fake.email()
random.setstate(r)
print fake.name()
r = random.getstate()
print fake.email()
random.setstate(r)

# A similar problem arises if the program using faker happens to use a non-instance random call during generation.
# The best way to prevent this issue is to have faker use an instance of random rather than the module version.

# If faker used an instance version of random, you could also resolve the original problem by using different faker instances
fake.seed(1234)
fake2 = Faker()
fake2.seed(1234)
print fake.name(), fake2.email()
print fake.name(), fake2.email()
print fake.name(), fake2.email()

Honor Environment LANG

I am currently using a wrapper for fake-factory to be able to choose the output but it would be great it would become part of fake-factory core.

This is the script i have in my path: https://gist.github.com/makefu/9101269

usage:
$ LANG=de_DE.utf-8 faker address
Davide-Kaul-Weg 175
94892 Königs Wusterhausen

Are images outside of the scope of this project?

Hi,

We're using this currently in our tests to generate test data. However we'd also like to use it to generate sample HTML pages (blog posts - for example). For this it would be great if faker could have an image provider (or maybe a file provider as a lower level).

Would you be averse to this idea? If not I'm more than happy to work on the provider and submit a pull request.

Cheers,
Ben

Parameter for disallowed characters?

It would be useful to have a parameter that would disallow a set of characters from a provider's output.

# Don't use outputs that have /, %, or &
fake.bs(disallowed_characters=['/', '%', '&'])

The use case I ran into was that we needed fake strings that could safely be put into URIs and therefore cannot contain /.

Thoughts on this?

Prepare a release

faker has so many great new changes in git, I think you guys should release all of them onto pypi soon, perhaps after pulling in the pull request with the docs.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 100: ordinal not in range(128)

I'm having problems to install Faker 0.4.2 on Python 3.4.2:

$ pip install fake-factory
Collecting fake-factory
  Using cached fake-factory-0.4.2.tar.gz
    Traceback (most recent call last):
      File "<string>", line 20, in <module>
      File "/private/var/folders/98/hxvgjtd93ql1s1c4695y6w2h0000gq/T/pip-build-e5pfmuys/fake-factory/setup.py", line 9, in <module>
        NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()
      File "/Users/pedro.teixeira/.virtualenvs/cave/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 100: ordinal not in range(128)
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

      File "<string>", line 20, in <module>

      File "/private/var/folders/98/hxvgjtd93ql1s1c4695y6w2h0000gq/T/pip-build-e5pfmuys/fake-factory/setup.py", line 9, in <module>

        NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()

      File "/Users/pedro.teixeira/.virtualenvs/cave/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode

        return codecs.ascii_decode(input, self.errors)[0]

    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 100: ordinal not in range(128)

    ----------------------------------------
    Command "python setup.py egg_info" failed with error code 1 in /private/var/folders/98/hxvgjtd93ql1s1c4695y6w2h0000gq/T/pip-build-e5pfmuys/fake-factory

Capital O missing an umlaut

Hello, I noticed in faker/Providers/De_de/internet.py in the _to_ascii method, the capital O is missing an umlaut.

It should be: ('Ö', 'Oe')

Currently:
replacements = (
('ä', 'ae'), ('Ä', 'Ae'),
('ö', 'oe'), ('O', 'Oe'),
('ü', 'ue'), ('Ü', 'Ue'),
('ß', 'ss')

Change project name to avoid confusion

I just received this feedback talking about the original Faker in PHP:

"I would recommend some sort of distinguishing name then. They both have the same name, that is going to be really confusing. Even something like FakerPy or something."

I think it makes sense and FakerPy is a good option.

Providers autodiscovery

Currently, every time a provider is added, we need to update the lists in __init__.

This is error-prone and it would be more sustainable if we could discover providers automatically.

Tests fail (on Xubuntu 14.04) due to timestamp out of range issue

Running the tests fail on my Xubuntu 14.04 virtual machine (32-bit with Python 2.7.6) due to a ValueError: timestamp out of range for platform time_t in L246 of faker/providers/date_time.py; see below for the output:

$ python setup.py test
running test
running egg_info
writing dependency_links to fake_factory.egg-info/dependency_links.txt
writing fake_factory.egg-info/PKG-INFO
writing top-level names to fake_factory.egg-info/top_level.txt
reading manifest file 'fake_factory.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'fake_factory.egg-info/SOURCES.txt'
running build_ext
test_add_provider_gives_priority_to_newly_added_provider (faker.tests.FactoryTestCase) ... ok
test_command (faker.tests.FactoryTestCase) ... 6588 Shasta Locks
South Tamikaville, CO 72509-4971


ok
test_documentor (faker.tests.FactoryTestCase) ... ERROR
test_format_calls_formatter_on_provider (faker.tests.FactoryTestCase) ... ok
test_format_transfers_arguments_to_formatter (faker.tests.FactoryTestCase) ... ok
test_get_formatter_returns_callable (faker.tests.FactoryTestCase) ... ok
test_get_formatter_returns_correct_formatter (faker.tests.FactoryTestCase) ... ok
test_get_formatter_throws_exception_on_incorrect_formatter (faker.tests.FactoryTestCase) ... ok
test_magic_call_calls_format (faker.tests.FactoryTestCase) ... ok
test_magic_call_calls_format_with_arguments (faker.tests.FactoryTestCase) ... ok
test_parse_returns_same_string_when_it_contains_no_curly_braces (faker.tests.FactoryTestCase) ... ok
test_parse_returns_string_with_tokens_replaced_by_formatters (faker.tests.FactoryTestCase) ... ok

======================================================================
ERROR: test_documentor (faker.tests.FactoryTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/mdxs/dev/gh/faker/faker/tests.py", line 65, in test_documentor
    print_doc()
  File "/home/mdxs/dev/gh/faker/faker/cli.py", line 77, in print_doc
    formatters = doc.get_formatters(with_args=True, with_defaults=True)
  File "/home/mdxs/dev/gh/faker/faker/documentor.py", line 28, in get_formatters
    (provider, self.get_provider_formatters(provider, **kwargs))
  File "/home/mdxs/dev/gh/faker/faker/documentor.py", line 78, in get_provider_formatters
    example = self.generator.format(name)
  File "/home/mdxs/dev/gh/faker/faker/generator.py", line 56, in format
    return self.get_formatter(formatter)(*args, **kwargs)
  File "/home/mdxs/dev/gh/faker/faker/providers/date_time.py", line 246, in date_time_ad
    return datetime.fromtimestamp(random.randint(-62135600400, int(time())))
ValueError: timestamp out of range for platform time_t

----------------------------------------------------------------------
Ran 12 tests in 0.201s

FAILED (errors=1)

fake.date_time_this_month()

All date_time fakers generate within the last given time period (now month), not in 'this' time period (aka current month.)

cls.random_sample

I wanted to add a method to BaseProvider that allows for sampling n unique elements.
There are situations in which I want to grab several random things, but I want those results to be unique. I just forked and added this to my own fork, but I wanted to run it by you before making a pull request.

    # in faker/provides/__init__.py BaseProvider
    @classmethod
    def random_sample(cls, array=('a','b','c'), number=2):
        """ Returns $number unique elements from $array"""
        return random.sample(array, number)

Added fake-factory to Ohloh

I've added fake-factory to ohloh.net at https://www.ohloh.net/p/fake-factory to keep some statistics on the code base and to allow contributors to claim/track their commits.

At the moment, there is no "Manager" ... Who should register as a project manager Someone who works on the project. Ideally the owner, founder, lead developer, or release manager.

So I guess either @joke2k or @fcurella should claim that role by clicking on the "Become the first manager for fake-factory" on the https://www.ohloh.net/p/fake-factory page.

US_en phone number formats

The US_en phone_number() provider includes formats that can generate invalid phone numbers (i.e. numbers which can't be parsed as standard US numbers by phonenumbers.py):

import phonenumbers
from fake import Faker
faker = Faker()
number = faker.phone_number()
phonenumber.parse(number,'US')

The above code will return a NumberParseException if the phone number is generated using the first format, '+##(#)##########' with an invalid country code (e.g. +08(1)111111111). One possibility is to try and force this format to always use a valid country code following the +. However, because other providers/localizations can already be used to generate specific international number formats including leading country codes, etc... I think it'd be simpler to only include valid US numbers in the US_en provider. In this case, it'd be easiest to simply remove the '+##(#)##########' formats from the provider?

differentiate between male and female first names

As I can see, fake.first_name() can return either a male or female first name. Do you plan to make a difference between them? Like fake.first_name(gender='male'), where the default value could be 'any'.

I ask it because I want to add support for Hungarian names. I have an up-to-date list with all the Hungarian names, put in two files: males and females. I could put them in two sets, or I could add them in one set.

Refactor Profile to be used with locale: how

I got this idea but i'm not sure it would be the simplest: the actual profile.py becomes something like "internal_profile.py", its methods are renamed "internal_simple_profile()" and "internal_profile()", and is removed from the list of standard providers. Then we will have a standard profile.py that simply calls self.generator.internal_profile(). For each locale instead, we will be able to add more logic, for example to customize field names and eventually values.

Do you think there would be a simpler way to do it?

timezone() randomly throws an exception

fake.timezone() sometimes throws an exception, possibly when a country doesn't have any timezones defined:

>>> from faker import Faker
>>> f = Faker()
>>> f.timezone()
'Africa/Mogadishu'
>>> f.timezone()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/vagrant/.python/lib/python3.3/site-packages/faker/providers/date_time.py", line 378, in timezone
    return cls.random_element(cls.countries)['timezones'].pop(0)

This is with Python 3.3 using fake-factory 0.4.0 from pypi.

Pip install fails in 0.4.1

Downloading/unpacking fake-factory from https://pypi.python.org/packages/source/f/fake-factory/fake-factory-0.4.1.tar.gz#md5=27ac002a6f3a4b46d8996b5ef6ad5a7c
  Downloading fake-factory-0.4.1.tar.gz (306kB): 306kB downloaded
  Running setup.py egg_info for package fake-factory
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/Users/gkisel/.virtualenvs/faker/build/fake-factory/setup.py", line 9, in <module>
        NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()
    IOError: [Errno 2] No such file or directory: '/Users/gkisel/.virtualenvs/faker/build/fake-factory/CHANGELOG.rst'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/Users/gkisel/.virtualenvs/faker/build/fake-factory/setup.py", line 9, in <module>

    NEWS = open(os.path.join(here, 'CHANGELOG.rst')).read()

IOError: [Errno 2] No such file or directory: '/Users/gkisel/.virtualenvs/faker/build/fake-factory/CHANGELOG.rst'

faker coding style guide/standard?

What do you think about adding a coding style guide/standard for this project?
I can see that style differs a lot from file to file. As a result it needs a lot of cleanup work to do.

No module named faker (v 0.3)

$ pip install fake-factory==0.3
Downloading/unpacking fake-factory==0.3
  Downloading fake-factory-0.3.tar.gz (86kB): 86kB downloaded
  Running setup.py egg_info for package fake-factory

Installing collected packages: fake-factory
  Found existing installation: fake-factory 0.2
    Uninstalling fake-factory:
      Successfully uninstalled fake-factory
  Running setup.py install for fake-factory

Successfully installed fake-factory
Cleaning up...
$ python
Python 2.7.5+ (default, Jun  2 2013, 13:26:34) 
[GCC 4.7.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from faker import Factory
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named faker
>>> 

Version 0.2 works great.

Integrate with Factory Boy

Factory Boy provides easy replacement for fixtures. It allows for an easy definition of factories, various build factories, factory inheritance etc.
It has a FuzzyAttribute mechanism which suites perfectly for faker.

Only getting APO addresses with system python 2.7.7

With system python:

☄ python --version
Python 2.7.7

☄ which faker
/usr/local/bin/faker

☄ python -c "import faker; print faker.VERSION"
0.4.2

☄ faker address -r 10
PSC 7159, Box 2889
APO AP 50457

PSC 5924, Box 3842
APO AA 79576-2701

PSC 4394, Box 0547
APO AA 13834-3973

PSC 1353, Box 2874
APO AE 17295

PSC 8492, Box 6715
APO AE 89299-8347

PSC 0676, Box 5745
APO AA 45384

PSC 7082, Box 0817
APO AE 39616

PSC 9015, Box 5179
APO AP 79298

PSC 3885, Box 3107
APO AA 97447

PSC 3078, Box 3599
APO AE 16713-0587

In virtualenv:

☄ python --version
Python 3.4.1

☄ which faker
/Users/kyl/Code/Playground/faker/.venv/bin/faker

☄ python -c "import faker; print(faker.VERSION)"
0.4.2

☄ faker address -r 10
94283 Jewell Shoal Suite 192
West Cade, TN 16897-7888

93143 Runolfsdottir Summit Suite 471
Lilliamouth, KS 80170-8892

PSC 5138, Box 8808
APO AE 12600-9380

787 Rohan Drive Apt. 652
Port Ebertport, FL 84541-9565

12609 Gulgowski Club
Waelchihaven, VT 93071

Unit 6204 Box 4740
DPO AA 61620-2499

0791 Daxton Avenue
Chaneltown, TN 87248-1822

6046 Emard Camp
Lennyborough, FM 79310

83026 Kane Shore
Lake Casie, SD 63881-1429

881 Davis Walks Suite 491
McKenziehaven, TX 35051-3973

Provide random gender

I may be missing something but I don't think faker spits out random genders in the person provider. While trivial to write, I think this should still be included in faker.

Put docs onto readthedocs

I think that the current way of documenting everything on Github only doesn't scale very well. I suggest you put the docs onto readthedocs.

How about a release?

Last release was in March. Perhaps a new release would be in order to make people using pypi get it as well? Would be appreciated! Keeps the ecosystem going and all that.

Support Python 3

pip installation under Python 3 fails:

$ python --version
Python 3.3.5
$ pip install faker
Downloading/unpacking faker
  Downloading Faker-0.0.4.tar.gz
  Running setup.py (path:/home/abcde/temp/faker_test/env3/build/faker/setup.py) egg_info for package faker
    Traceback (most recent call last):
      File "<string>", line 17, in <module>
      File "/home/abcde/temp/faker_test/env3/build/faker/setup.py", line 5, in <module>
        import faker
      File "./faker/__init__.py", line 11, in <module>
        import data
    ImportError: No module named 'data'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 17, in <module>

  File "/home/abcde/temp/faker_test/env3/build/faker/setup.py", line 5, in <module>

    import faker

  File "./faker/__init__.py", line 11, in <module>

    import data

ImportError: No module named 'data'

----------------------------------------
Cleaning up...
Command python setup.py egg_info failed with error code 1 in /home/abcde/temp/faker_test/env3/build/faker
Storing debug log for failure in /home/abcde/.pip/pip.log

AttributeError: 'Generator' object has no attribute 'password'

This error occurs when attempting to use the password method on a Factory object.

Python 2.7.6 (default, Feb 26 2014, 12:07:17) 
[GCC 4.8.2 20140206 (prerelease)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from faker import Factory
>>> fake = Factory.create()
>>> fake.password()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Generator' object has no attribute 'password'

Random job provider

It would be useful to have a job provider together with the company provider. If anyone could point me to a good list, i would work on it.

Default locale to language if no territory given.

It would be great that if faker was initialized with only a locale and no territory, that it would use a sensible default.

For example I currently have to do the following if using something such as "en" instead of "en_US".

from faker import Factory
from faker import AVAILABLE_LOCALES

locale = 'en'
if locale not in AVAILABLE_LOCALES:
    locale = next(l for l in AVAILABLE_LOCALES if l.startswith(locale))

factory = Factory.create(locale)

This happens when using dynamic mock data in local development where django sets the locale to "en" because we do not define territories.

Clarify using from the shell docs

In the using from shell section of the docs, I understand how to display the result of a fake. There is an example:

$ python -m faker address

However, it is not clear to me how to give a provider's name, for example 'Lorem' (should that be lowercase 'lorem'?), and display all of the provider's fakes. It would be good if there was an example provided.

Extract provider logic from provider data?

The provider data and provider logic are pretty tightly intertwined.

It'd be nice if they were separated out--then it'd be a lot easier to port some of the other provider lists out there.

For example, look at how ForgeryPy structures the data separate from the logic--ForgeryPy dictionaries are the equivalent of Faker's Providers: https://github.com/tomekwojcik/ForgeryPy/tree/master/forgery_py/dictionaries

He's got a generic loader that kicks in when a custom function isn't defined for a provider.

That project seems relatively abandoned, so it'd be nice to pull that clean functionality into this project.

It'd also probably make it easier for people to localize their providers because they just change the data files without having to think about the attached python code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.