trellixvulnteam / tmtt_project_g7c5 Goto Github PK

View Code? Open in Web Editor NEW

proposition de restructure du code du projet TMTT

License: MIT License

Shell 0.01% JavaScript 2.70% Python 95.91% C 0.74% Common Lisp 0.03% Fortran 0.02% PowerShell 0.01% CSS 0.20% TeX 0.25% Makefile 0.01% HTML 0.11% Smarty 0.01% Xonsh 0.01% Cython 0.01%

tmtt_project_g7c5's Introduction

ttmt_project

TTMT is a text mining project to automate classifaction of technicals files.

Description

A longer description of your project goes here...

Note

This project has been set up using PyScaffold 3.2.3. For details and usage information on PyScaffold see https://pyscaffold.org/.

Structure TTMT PROJECT

# package/subpackage de fonctions

General - Config :

Vérifier le chemin d'un fichier/dico
Vérifier accès à JAVA file
Accès BDD

Extraire le texte :

Repérage extension fichier
Extraction pdf (test avec les différents extracteur)
Extraction texte
Extraction nombre pages
Extraction metadata/infopdf (si existe)

# échec lecture pdf crypté # échec PDF image { input : fichier | output : texte brute

Analyse texte :

Tokenisation: Stopwords Lemmatisation Stemming #Bigram/Trigram > graph of words

Fréquences terme/document > TF Normalisation > IDF Similarité cosinus

{ input : texte brute | output : dico de mots # term frequency : nb occurence terme texte/max occurence de tous les textes > TF IDF

Gestion des dictionnaires :: Lire un dictionnaire Ecrire un dictionnaire

# mots très courants/peu courants/spécialisés { input :

Gestion des logs + FDL :: Création CSV (FDL) Ecrire (FDL) Lecture CSV (de logs) Ecrire (log)

{ input : logs | output : csv logs { input : score/data analyse | output : csv FDL

Score :: Score % dico Classifieur

# Documentation/Loi/Réglementation/Normes { input : dico de mots | output : score dico % dicos { input : dico de mots | output : catégorie

Gestion base de données/API :: Lire Ecrire Conversion JSON

# format JSON # texte brute/metadata "recherchables" { input : fichier traité | output : enregistrement

Interface graphique :: MainWindow

Recommend Projects

trellixvulnteam / tmtt_project_g7c5 Goto Github PK

tmtt_project_g7c5's Introduction

ttmt_project

Description

Note

Structure TTMT PROJECT

tmtt_project_g7c5's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent