Coder Social home page Coder Social logo

nkowaokwu / igbo_api Goto Github PK

View Code? Open in Web Editor NEW
305.0 14.0 134.0 17.56 MB

An API exposing Igbo words, definitions, and more

Home Page: https://igboapi.com

License: Apache License 2.0

JavaScript 7.54% Dockerfile 0.02% CSS 3.33% TypeScript 87.02% MDX 2.10%
igbo dictionary-api igbo-api dictionary igbo-language mongodb

igbo_api's People

Contributors

chidexebere avatar chimise avatar chriswenzy avatar dhaxor avatar ebubae avatar ebugo avatar effiti avatar emmo00 avatar ezesundayeze avatar greatgrant avatar ifyndu avatar ijemmao avatar inezabonte avatar itsjayway avatar kenzdozz avatar mancancode avatar mr-talukdar avatar mustafaumar avatar namit-chandwani avatar ndohjapan avatar oemekaogala avatar pappyj avatar robertito1 avatar rohit-rambade avatar samantatarun avatar semantic-release-bot avatar sir-radar avatar temtechie avatar toyin5 avatar xaerru avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

igbo_api's Issues

Organise output of search response in demo site

The demo site (see #21 ) set up here already has an axios request implemented to query the API and display results, but the results have still not been organised or styled.

Note: This project is built with Gatsby and uses Tailwind utility classes in order to give a consistent look to the site.

Create Separate Usage Docs

Currently, all the documentation on how to interact with the API is in the README.

That information should be moved out to its own doc. Maybe a USAGE.md file or create a new page in the Wiki.

Maybe we explore using automatically generating API docs with packages like apidoc

Create prevColumn enum

Currently, the plain strings left, center, and right are used through out the buildDictionary function.

They should be placed in an enum for consistency

Update gatsby-dev .gitignore

The generated directories db/ and build/ need to be added to the .gitignore to prevent those files from accidentally entering this branch's codebase.

How should we implement a Suggestions Feature for adding/removing/changing words?

After the initial version of the front site is complete, it would be nice to add a suggestions feature where users can request to see a change within the API.

I don't have any concrete ideas for how this feature will work. So I wanted to ask for people's opinions on what they think is the best way to capture user requests.

Here are a couple of ideas I've had so far:

  • On the front site, there would be a 'Suggestions' button where a user could input key information about what changes they want to see. When they submit that form, a new GitHub issue would be created

    I was thinking about tracking changes in GitHub so it's easier to track, but are there concerns with mixing user-requested changes with technical implementation efforts?

  • Again on the front site, instead of using GitHub issues, any requests will be sent to an email

    This approach would be nice because it would only house the issues found via the front site, but it's less accessible for future contributors to address.

  • Again on the front site, instead of issues or emails, we could make a new document in the database under a Request model

    This is my least favorite approach because it encourages human tampering with production-level data (once we've addressed the request we would have to go into the database to delete the document or update its status), but it's another thought I had

Also, a couple of more questions I had for people: how would we verify user-requested changes? What would the verification process look like? Would we want to

Implement Database Migration

Follow this guide to enable the database to be automatically migrated

Database migrations will act as a form of version control so if the database becomes invalidated or malformed, it can be rolled back

Initial Test + Continuous Integration

Write a few tests that check to see if the app is functioning properly.

After that, create a GitHub Actions workflow that creates a continuous integration pipeline that automatically runs the tests.

Use Airbnb ESLint Rules

The project is currently using eslint:recommended which I don't think is strong enough linting.
There are small formatting errors that fall through the cracks with this configuration.

So the .eslintrc.json file should "extends": "airbnb"

Double-check with your text editor or IDE that it's set up correctly (it's showing red error lines when a rule has been broken).

Change JSON Dictionary shape

Currently, the phrases key has an object for its value. On that object, each key is a phrase object that has a definitions and examples keys.

phrases should change from an object to an array of objects.

The updated shape is already represented in the MongoDB database, but it makes sense to keep the JSON up-to-date

Before

{
    "phrases": {
            "(agụū) -gụ": {
                "definitions": [
                    "be hungry"
                ],
                "examples": []
            }
        }
}

After

{
    "phrases": [
            {
                "phrase": "(agụū) -gụ",
                "definitions": [
                    "be hungry"
                ],
                "examples": []
            }
        ]
}

Hash Function for Terms

There are different ways a user can type out the same word. One use might insist on using accent marks, while another doesn't know how to use accent marks. The API should still return the same information to the user.

For this to happen, a hash function that knows how to map normalized and non-normalized to text to the same word so the user gets the expected information.

Normalize Text While Building Dictionaries

Currently, the raw text that's found in the the PDF dictionaries use accent marks that users of this API might not want to include when search for terms.

While building dictionaries, there should be a dictionary that's built with normalized text.

Remove Prefixed A., B., C., etc.

Words that have multiple definitions will have multiple strings inside of their definitions array.

What happens often is that those strings in the definitions array are prefixed with the letters A., B., C., or any other letter to denote that there are multiple definitions.

Those letters should be removed so that the cleaned text will serve as one of the definitions for a given term.

Map words that use 'see <word>'

There are numerous cases in the original PDF where words don't have any word class, definition, phrases, or examples.
Instead, it just says to see a different word.

Words that are in this situation shouldn't be considered a standalone word in the JSON dictionary. Instead, it should be considered a variation in the word that it's telling the reader to see.

Screen Shot 2020-10-03 at 6 59 26 PM

Pagination doesn't go to the next 10 words

Instead of paginating through sets of ten words each time, the next page only introduces one new word while removing one term.
So instead of getting a set of ten new words, the frontend will only get one new word at a time.

Include variations keys in JSON data

A lot of words can have different spellings or variations.

When the project is parsing the HTML and building the JSON objects, a variations key should be placed on each word object. This will allow words to keep track of their variations.

In the Columbia PDF, variations are denoted by the use of commas in the far left column in the dictionary table. Each comma-separated term should be separated into an array. The first word of the possible variations will be the key of the word object, and the subsequent words will be placed in the variations array.

The MongoDB Word schema should also be updated to capture this new variations key. It will be an array that contains strings.

The search functionality should also check each word's variations key to see if the searched keyword matches any of a given word's variations.

Build out MongoDB Collections from JSON object with Mongoose

Now that this repo is able to parse the dictionary PDF and create a well-structured JSON object with terms and their information, it's time to start preparing this data to live in MongoDB.

This issue focuses on creating basic MongoDB documents and collections. Mongoose will be used to build out some basic schemas.

Schemas

Word

  • word - String
  • wordClass - String
  • definitions - Array[String]
  • phrases - Array[Phrase]
  • examples - Array[Example]

Phrase

  • phrase - String
  • parentWord - Word
  • definition - String
  • Examples - Array[Example]

Example

  • example - String
  • parentPhrase - Phrase
  • parentWord - Word

Search phrase definitions with English terms

Currently, the backend will search word definitions if they provide an English term. This search should be extended to phrase definitions that belong to a particular word.

This extension of the search functionality will help provide more search results to the client.

Deploy API

Now that the API has basic functionality to search for Igbo terms using either Igbo or English, it's time to release to the world.

The tools that I'm thinking about using include:

  • Heroku -hosting and managing the Igbo Dictionary Node API service
  • MongoDB Atlas - host the MongoDB data

What are people's thoughts about these platforms? Is there something else we should consider using that could help down the road with scaling?

Convert Dictionary JSON Files to MongoDB

In order to make the search feature more scalable and easy to maintain, the data found in the dictionary JSON files need to be transferred into a MongoDB database.

Here are the current models that would be helpful:

  • Term
    • Word Class
    • Definitions
    • List of Phrase documents
    • List of Example documents
    • Dialect/Region
  • Phrase
    • Definition
    • List of Example documents
  • Example
    • Textual example

Note: Terms in bold are still a work in progress and might not be included in the document

Switch Express Routes

The main route api/v1/search/words should use MongoDB data while api/v1/test/words should use JSON data.

This is one step closer to consistently relying on the MongoDB data as a real site or service would.

Paginate Large Response

If there are more than 20 words that come back from the database, then the client should be able to paginate through the responses.

This will help with network response times.

Use the query param page to allow the client to specify which page of responses they want to see.

Normalization map

Build a normalization map where for each key in the map is the normalized term and the value is an array of all the non-normalized terms.

So whenever a user searches without tonal marks, the program can find the term as a key in the map and then grab all the term data for each of the words in the current array.

Regex Search - Ignore Apostrophes and Dashes

The following search features should be included to help the user search easier:

  • If n obe is provided, then the expected n’obe should be returned
    • Apostrophe can be denoted with spaces
  • If bia is provided, then the expected -bia should be returned
    • Dashes denote that the term is a verb, but users shouldn't be required to included dashes to find that term

Basic English to Igbo Search Functionality

Currently, the API allows for Igbo to English search, but to further expand what this API can do it needs to have an English to Igbo search capability.

This issue doesn't focus on implementing a full fledge English to Igbo, instead the main focus is to lay down the ground work for future related features.

Dictionary Site

Create a quick site that uses the API and displays the results.

Delete normalized dictionaries

There are two normalized JSON dictionaries ig-en_normalized.json and ig-en_normalized_expanded.json that aren't used in the project and probably won't be used in the future.

Delete these two files along with the logic that's responsible for building them when the script yarn build:dictionaries is executed.

Return Search Phrases

If a user searches for a singular word that's in the phrases section of a term, that phrases should be returned with its information.

Move A. B. Text into Definition Section

The Columbia paper has primary and secondary definitions for words using the prefixes A. and B.

parseAndBuild.js needs to move those phrases into a given word's definition property.

Restructure ig Folder

Create a new folder inside /ig that will hold all the *.html files.

Make sure that all the routes pointing to these files are updated.

Regex Search for Terms

The following search features should be included to help the user search easier:

  • If kpo is provided, then the user should get kpọ
    • Terms that have accents should be returned even if the user doesn't provide that information

Create Express Routes for MongoDB Data

Now that the project is able to move the JSON data into MongoDB, the API should start grabbing the data from MongoDB instead of the JSON files.

Create a new /GET endpoint similar to the one that exists. When query keyword is provided, the word along with it's resolved information should be included.

Add tests.

Move Accidental Examples to Definitions

Currently, the parsing script thinks that a new line for a definition or example is considered to be an entirely new example.

The way that the script should determine if a new line item is either a continuation of a definition or example or a completely new example should be based on the top pixel differences.

If the script moves to a new row in the same column and the difference of tops is a clean 15px then it's a continuation of that column's cell. If the difference between tops isn't a flat integer, it's a new cell in that column.

Allow searching with ids

When an id is provided in the API route for a word or phrase, then the API should return back the object with that id.

Enforce react/forbid-prop-types

According to react/forbid-prop-types projects should stay away from PropTypes.object.

Remove all instances of PropTypes.object and provide more detailed object structures.

Also, remove the eslintrc.json react/forbid-prop-types rule from the 'rules' section so that it gets enforced.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.