Coder Social home page Coder Social logo

numbworks / nw.ngramtextclassification Goto Github PK

View Code? Open in Web Editor NEW
7.0 7.0 1.0 1.39 MB

NW.NGramTextClassification is a library to perform text classification tasks on the text snippets you provide. Text Classification is a machine learning technique that calculates the similarity between the string of text you need to categorize and a collection of already categorized strings you provide to the library.

License: MIT License

PowerShell 4.39% C# 95.61%
csharp library machine-learning ngrams text-classification

nw.ngramtextclassification's Introduction

About

I'm a Technical Product Manager with strong software development roots (C# and Python) and a passionate love affair with data pipelines, which I do nurture every day.

In my daily job I do manage five teams, lead the product strategy of the company, enforce Agile and other best practices (clean architecture, unit test coverage > 70%, updated documentation as part of the definition-of-done, code readibility as a priority, ...), ensure quarterly deliveries despite the limited development capacity. In addition, I do back-end development work when required. Few of the many technologies I do work with every week: C#, Python, Jupyter Notebook, Pandas, PowerBi, ETL, databases, REST APIs, AWS.

In my off-work time I do actively develop and maintain some open-source software packages (C# and Python) under my numbworks brand. Last but not least, to improve my craft, I studied 276 technical books in the past nine years.

A software developer profile is defined by the software he develops and by the continuous learning activities he performs. This is my portfolio and you'll find information about both aspects of my off-work journey as a back-end developer.

Contact: Email | Twitter

Development

Libraries (C#)

Intended for other developers who want to integrate my software in their own software.

Repository Effort Quick Links License Tests NuGet Last Update
NW.UnivariateForecasting 208 h MIT codecoverage_library.svg 4.2.0 2024-02-15
NW.NGramTextClassification 207 h MIT codecoverage.svg 4.2.0 2024-02-14
NW.MarkdownTables 21 h MIT codecoverage_library.svg 3.0.0 2024-01-21

CLI Applications (C#)

Intended for data analysts who want to use my libraries thru a command-line interface.

Repository Quick Links License Tests Binaries
NW.UnivariateForecastingClient Documentation MIT codecoverage_client.svg 4.2.0
NW.NGramTextClassificationClient Documentation MIT codecoverage_client.svg 4.2.0

Shared Libraries (C#)

Pieces of logic shared among my libraries. These might not be feature-rich enough to be useful for the general audience.

Repository Effort Quick Links License Tests NuGet Last Update
NW.Shared.Files 5 h MIT codecoverage_library.svg 1.0.0 2024-02-11
NW.Shared.Serialization 4 h MIT codecoverage_library.svg 1.0.0 2024-02-13
NW.Shared.Validation 3 h MIT codecoverage_library.svg 1.0.0 2024-02-10

Jupyter Notebooks (Python)

Intended to showcase my approach to solve specific data analysis problems:

Repository Effort Quick Links License Tests Version Last Update
nwreadinglist 95 h MIT codecoverage.svg 3.3.0 2024-05-21
nwtimetracking 73 h MIT codecoverage.svg 3.3.0 2024-05-21

Shared Packages (Python)

Pieces of logic shared among my libraries. These might not be feature-rich enough to be useful for the general audience.

Repository Effort Quick Links License Tests Version Last Update
nwshared 9 h MIT codecoverage.svg 1.1.0 2024-05-20

Other Projects

Repository Type Quick Links License Last Update
i3_eink_config Configuration File README MIT 2020-12-22

Related Pages

Continuous Learning

The following table summarizes how many technical books I studied since I started a continuous learning path (2016):

Years Books Pages TotalSpend LastUpdate
9 279 75882 $7747.15 2024-06-25

Related Pages

Self-Improvement Status

Areas of Expertise

Data, Databases, Development, Software Usability, Clean Software Architecture, Clean Code, Design Patterns, (Parametric) Unit Testing, OOP, Dependency Injection, Single Responsability Principle, Console Applications, Services, CLIs,CI/CD, (Web) Scraping, REST APIs, Business Analysis, ETL Process, Star Schema, Data Warehouses, Prompt Engineering.

Tech Stack

C#, NET Core, NET Standard, NUNit,SQL Server, PowerBi, TeamCity, Azure DevOps, Excel, VBA, Powershell, Ubuntu Server, Docker, MariaDB, NuGet Packages, SQLite, Proxmox VE, Jupyter Notebooks, Python 3.x, Pandas, ollama.

Incoming

Python Packages, Data Science, Machine Learning, Postgresql, Time-Series Analysis, Sensor Data, AWS Architecture.

nw.ngramtextclassification's People

Contributors

numbworks avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

nw.ngramtextclassification's Issues

String index bug

When I invoke the .PredictLabel method with some inputs, a call to Substring() throws an unhandled exception up the stack.

at System.String.Substring(Int32 startIndex, Int32 length)
   at NW.NGramTextClassification.TextClassifierComponents.<>c.<.cctor>b__31_1(String text, UInt32 length)
   at NW.NGramTextClassification.TextClassifier.PredictLabel(String text, ITokenizationStrategy strategy, INGramTokenizerRuleSet ruleSet, List1 labeledExamples)    at NW.NGramTextClassification.TextClassifier.PredictLabel(String text, List1 labeledExamples)
   at ****.InferReporterType(String input) in ****:line 100

[v3.6.0] Performance improvement: the tokenization process for the provided labeled examples should happen only once

As a developer, I want to move the tokenization process of the provided labeled examples, so that it happens only once despite the number of text snippets and performances drastically improve.

Before:

...
[2022-11-03 21:40:42:714] The provided LabeledExample objects have been thru the tokenization process.
[2022-11-03 21:40:42:714] At least one LabeledExample object failed to be tokenized.
...
[2022-11-03 21:40:53:839] The provided LabeledExample objects have been thru the tokenization process.
[2022-11-03 21:40:53:839] At least one LabeledExample object failed to be tokenized.
...
[2022-11-03 21:41:04:812] The provided LabeledExample objects have been thru the tokenization process.
[2022-11-03 21:41:04:812] At least one LabeledExample object failed to be tokenized.
...
[2022-11-03 21:41:15:656] The provided LabeledExample objects have been thru the tokenization process.
[2022-11-03 21:41:15:656] At least one LabeledExample object failed to be tokenized.
...

After:

...
[2022-11-03 21:40:42:714] The provided LabeledExample objects have been thru the tokenization process.
[2022-11-03 21:40:53:839] At least one LabeledExample object failed to be tokenized.
[2022-11-03 21:41:04:812] Attempting to save the provided 'TextClassifierSession' object as: C:\ngramtc_session_20221104121705802.json.
[2022-11-03 21:41:15:656] The provided 'TextClassifierSession' object has been successfully saved.
...

[v3.5.0] Add "LoadTextSnippets" functionality

As a developer, I want to add the possibility to load a collection of text snippets from a JSON file, so that the library can classify snippets of texts taken from an external source.

[v3.6.0] Add "CleanLabeledExamples" functionality

As a developer, I want to add support for the "CleanLabeledExamples" functionality to LabeledExampleManager and TextClassifier, so that I can prepare the ground for the introduction of this option in the client.

add data to model when app is running and use

I am trying to make a talking bot using this library

I admit that the library is incredibly beautiful and the work is set up

But I wanted to know how I can add data to the model when app is running using this library and save it at runtime.

[v3.0.0] Add support for ASCII banner

As a developer, I want to add support for the ASCII banner, so that I can prepare the ground for the introduction of a CLI application as client.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.