Coder Social home page Coder Social logo

alirezabakhtiari / german-nouns Goto Github PK

View Code? Open in Web Editor NEW

This project forked from gambolputty/german-nouns

0.0 0.0 0.0 15.04 MB

A list of ~98,000 German nouns and their grammatical properties compiled from WiktionaryDE as CSV file. Plus a module to look up the data and parse compound words.

License: Creative Commons Attribution Share Alike 4.0 International

Python 100.00%

german-nouns's Introduction

German nouns

A comma seperated list of ~98 thousand German nouns and their grammatical properties (tense, number, gender) as CSV file. Plus a module to look up the data and parse compound words. Compiled from the WiktionaryDE.

The list can be found here: german_nouns/nouns.csv

If you want to look up nouns or parse compound words, install this package (for Python 3.8+) and follow the instructions below:

Installation

pip install german-nouns

Lookup words

from pprint import pprint
from german_nouns.lookup import Nouns

nouns = Nouns()

# Lookup a word
word = nouns['Fahrrad']
pprint(word)

# Output:
[{'flexion': {'akkusativ plural': 'Fahrräder',
              'akkusativ singular': 'Fahrrad',
              'dativ plural': 'Fahrrädern',
              'dativ singular': 'Fahrrad',
              'dativ singular*': 'Fahrrade',
              'genitiv plural': 'Fahrräder',
              'genitiv singular': 'Fahrrades',
              'genitiv singular*': 'Fahrrads',
              'nominativ plural': 'Fahrräder',
              'nominativ singular': 'Fahrrad'},
  'genus': 'n',
  'lemma': 'Fahrrad',
  'pos': ['Substantiv']}]

# parse compound word
words = nouns.parse_compound('Vermögensbildung')
print(words)

# Output:
['Vermögen', 'Bildung'] # Now lookup nouns['Vermögen'] etc.

Compiling the list

To compile the list yourself, you need Python 3.8+ and Poetry installed.

1. Clone the repository and install dependencies with Poetry:

$ git clone https://github.com/gambolputty/german-nouns
$ cd german-nouns
$ poetry install

2. Compile the list of nouns from a Wiktionary XML file:

Find the latest XML-dump files here: https://dumps.wikimedia.org/dewiktionary/latest, for example this one and download it. Then execute:

$ poetry run python -m german_nouns.parse_dump /path-to-xml-dump-file.xml.bz2

The CSV file will be saved here: german_nouns/nouns.csv.


License: CC BY-SA 4.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.