Coder Social home page Coder Social logo

dotnet / machinelearning-samples Goto Github PK

View Code? Open in Web Editor NEW
4.3K 298.0 2.6K 1.97 GB

Samples for ML.NET, an open source and cross-platform machine learning framework for .NET.

Home Page: https://dot.net/ml

License: MIT License

PowerShell 100.00%
machine-learning algorithms dotnet csharp ml

machinelearning-samples's Introduction

Note: We'd love to hear your thoughts about MLOps. Let us know in this survey.

ML.NET Samples

ML.NET is a cross-platform open-source machine learning framework that makes machine learning accessible to .NET developers.

In this GitHub repo, we provide samples which will help you get started with ML.NET and how to infuse ML into existing and new .NET apps.

Note: Please open issues related to ML.NET framework in the Machine Learning repository. Please create the issue in this repo only if you face issues with the samples in this repository.

There are two types of samples/apps in the repo:

  • Getting Started : ML.NET code focused samples for each ML task or area, usually implemented as simple console apps.

  • End-End apps : End-user sample web and desktop apps infused with Machine Learning models based on ML.NET.

The official ML.NET samples are divided in multiple categories depending on the scenario and machine learning problem/task, accessible through the following tables:

Binary classification
Binary classification chart
Getting started icon
Sentiment Analysis
C#     F#
Movie Recommender chart
Getting started icon
Spam Detection
C#     F#
Power Anomaly detection chart
Getting started icon
Credit Card Fraud Detection
(Binary Classification)
C#    F#
disease detection chart
Getting started icon
Heart Disease Prediction
C#
Multi-class classification
Issue Labeler chart
End-to-end app icon
Issues Classification
C#  F#
Movie Recommender chart
Getting started icon
Iris Flowers Classification
C#    F#
Movie Recommender chart
Getting started icon
MNIST
C#
Recommendation
Product Recommender chart
Getting started icon
Product Recommendation
C#
Movie Recommender chart
Getting started icon
Movie Recommender
(Matrix Factorization)
C#
Movie Recommender chart
End-to-end app icon
Movie Recommender
(Field Aware Factorization Machines)
C#
Regression
Price Prediction chart
Getting started icon
Price Prediction
C#     F#

Sales ForeCasting chart
End-to-end app icon
Sales Forecasting (Regression)
C#

Demand Prediction chart
Getting started icon
Demand Prediction
C#    F#
Time Series Forecasting

Sales ForeCasting chart
End-to-end app icon
Sales Forecasting (Time Series)
C#

Anomaly Detection
Spike detection chart

Sales Spike Detection
Getting started icon C#      End-to-end app icon C#
Spike detection chart
Getting started icon
Power Anomaly Detection
C#
Power Anomaly detection chart
Getting started icon
Credit Card Fraud Detection
(Anomaly Detection)
C#
Clustering
Customer Segmentation chart
Getting started icon
Customer Segmentation
C#     F#
IRIS Flowers chart
Getting started icon
IRIS Flowers Clustering
C#     F#
Ranking
Ranking chart
Getting started icon
Rank Search Engine Results
C#
Computer Vision
Image Classification chart
Image Classification Training
(High-Level API)
Getting started icon C# F#      
Image Classification chart
Image Classification Predictions
(Pretrained TensorFlow model scoring)
Getting started icon C#   F#       End-to-end app icon C#
Image Classification chart
Image Classification Training
(TensorFlow Featurizer Estimator)
Getting started icon C#   F#

Object Detection chart
Object Detection
(ONNX model scoring)
Getting started icon C#      End-to-end app icon C#


Cross Cutting Scenarios
web image
End-to-end app icon
Scalable Model on WebAPI
C#
web image
End-to-end app icon
Scalable Model on Razor web app
C#
Azure functions logo
End-to-end app icon
Scalable Model on Azure Functions
C#
Database chart
End-to-end app icon
Scalable Model on Blazor web app
C#
large file chart
Getting started icon
Large Datasets
C#
Database chart
Getting started icon
Loading data with DatabaseLoader
C#
Database chart
Getting started icon
Loading data with LoadFromEnumerable
C#
Model explainability chart
End-to-end app icon
Model Explainability
C#
Extensibility icon
End-to-end app icon
Export to ONNX
C#

Automate ML.NET models generation (Preview state)

The previous samples show you how to use the ML.NET API 1.0 (GA since May 2019).

However, we're also working on simplifying ML.NET usage with additional technologies that automate the creation of the model for you so you don't need to write the code by yourself to train a model, you simply need to provide your datasets. The "best" model and the code for running it will be generated for you.

These additional technologies for automating model generation are in PREVIEW state and currently only support Binary-Classification, Multiclass Classification and Regression. In upcoming versions we'll be supporting additional ML Tasks such as Recommendations, Anomaly Detection, Clustering, etc..

CLI samples: (Preview state)

The ML.NET CLI (command-line interface) is a tool you can run on any command-prompt (Windows, Mac or Linux) for generating good quality ML.NET models based on training datasets you provide. In addition, it also generates sample C# code to run/score that model plus the C# code that was used to create/train it so you can research what algorithm and settings it is using.

CLI (Command Line Interface) samples
Binary Classification sample
MultiClass Classification sample
Regression sample

AutoML API samples: (Preview state)

THESE SAMPLES USE THE 0.1.x VERSION OF THE AUTOML API. WHILE THESE APIS STILL WORK IN VERSION 0.2.x WE RECOMMEND USING THE NEW APIS INTRODUCED IN 0.2.x AND LATER. FOR 0.2.x SAMPLES, SEE ML.NET 2.0 Samples.

ML.NET AutoML API is basically a set of libraries packaged as a NuGet package you can use from your .NET code. AutoML eliminates the task of selecting different algorithms, hyperparameters. AutoML will intelligently generate many combinations of algorithms and hyperparameters and will find high quality models for you.

AutoML API samples
Binary Classification sample
MultiClass Classification sample
Ranking sample
Regression sample
Advanced experiment sample

Additional ML.NET Community Samples

In addition to the ML.NET samples provided by Microsoft, we're also highlighting samples created by the community showcased in this separated page: ML.NET Community Samples

Those Community Samples are not maintained by Microsoft but by their owners. If you have created any cool ML.NET sample, please, add its info into this REQUEST issue and we'll publish its information in the mentioned page, eventually.

Translations of Samples:

Learn more

See ML.NET Guide for detailed information on tutorials, ML basics, etc.

API reference

Check out the ML.NET API Reference to see the breadth of APIs available.

Contributing

We welcome contributions! Please review our contribution guide.

Community

Please join our community on Gitter Join the chat at https://gitter.im/dotnet/mlnet

This project has adopted the code of conduct defined by the Contributor Covenant to clarify expected behavior in our community. For more information, see the .NET Foundation Code of Conduct.

License

ML.NET Samples are licensed under the MIT license.

machinelearning-samples's People

Contributors

arwinneil avatar asthana86 avatar bamurtaugh avatar cesardelatorre avatar chris-lauren avatar cmendible avatar colbylwilliams avatar daholste avatar drake7707 avatar dsyme avatar eerhardt avatar feiyun0112 avatar forki avatar franperezlopez avatar justinormont avatar jwood803 avatar kevmal avatar kunjee17 avatar luisquintanilla avatar mariuszwojcik avatar natke avatar nicolehaugen avatar nmzivkovic avatar oliag avatar prathyusha12345 avatar rustd avatar shmoradims avatar stephentoub avatar terrajobst avatar youssef1313 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

machinelearning-samples's Issues

StopWordRemover in conjunction with FeaturizeText

Hi

Looking through the text related samples (such as SentimentAnalysis), I observed that non has used stop word remover to preprocess the text before transforming and training.

May I request for samples on implementing stop word remover for text analysis?

How to get a predicted group list output?

is there a way to output grouped items using the prediction algorithm and clustering?
here is an example i have a list of 50 employees and i want to put them into groups of 10 by predicting which employees best suited to work together in a group and then output those 5 groups with 10 employees each.

the elements that will be used to make the prediction will be (

  • experience from 1 - 10 years experience,
  • easy to work with rated by yes or no options
  • Visual learner vs non visual learner
  • and if the employee is a quick learner or not

thank you in advance :)

,

Review samples for correct data science approach and ML.NET API usage

Status

Folder Sample Data Science Review API Review
C#\getting_started BinaryClassification_ CreditCardFraudDetection OK (#96) OK (v0.8)
C#\getting_started BinaryClassification_ SentimentAnalysis OK OK (v0.8)
C#\getting_started Clustering_ CustomerSegmentation OK (#95) OK (v0.8)
C#\getting_started Clustering_ Iris OK (#109) OK (v0.8)
C#\getting_started MulticlassClassification_ Iris OK OK (v0.8)
C#\getting_started Regression_ BikeSharingDemand OK OK (v0.8)
C#\getting_started Regression_ TaxiFarePrediction OK (#95) OK (v0.8)
C#\getting_started DeepLearning_ ImageClassification_ TensorFlow Pixel data preprocessing needed or not? Also, info/github page for the used inception model should be included OK (v0.8) #155
C#\getting_started DeepLearning_ TensorFlowEstimator Same as above Still 0.7
C#\getting_started MatrixFactorization_ MovieRecommendation OK Still 0.7
C#\end-to-end-apps MulticlassClassification- GitHubLabeler OK (#96) OK (v0.8)
C#\end-to-end-apps Recommendation- MovieRecommender OK Still 0.7
C#\end-to-end-apps Regression- SalesForecast OK OK (v0.8)
C#\getting_started AnomalyDetection- Sales OK OK (v0.11)

DS Review

BinaryClassification_CreditCardFraudDetection

  • Data preprocessing: OK (MeanVar normalization)
  • Feature engineering: Not needed (input data are PCA dimensions)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics: Ok
    • Accuracy: 99.9%
    • Auc: 97.5%
    • F1Score: 77.7%

BinaryClassification_SentimentAnalysis

  • Data preprocessing: Not needed
  • Feature engineering: Ok (Text -> Feature vector using TextTransform)
  • Learner: Ok
  • Training: Ok (single training on training data)
  • Scoring: Ok
  • Metrics: Ok
    • Accuracy: 72%
    • Auc: 97%
    • F1Score: 78%

Clustering_CustomerSegmentation

  • Data preprocessing: Ok (Join and pivot tables using Linq)
  • Feature engineering: Ok (Counting offers per customers)
  • Learner: Ok
  • Training: Why PcaEstimator seed is 42 (magic number)?
  • Scoring: Ok
  • Metrics: Ok
    • AvgMinScore: 2.3
    • Dbi: 2.5

Clustering_Iris

  • Data loading: Ok (Can load all the numeric values as vector instead of individually and then concatenating them. But keeping it as is for educational purposes.)
  • Data preprocessing: Ok (not needed)
  • Feature engineering: Ok (not needed)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics:
    • AvgMinScore: 0.564
    • DBI: 0.955

MulticlassClassification_Iris

  • Data loading: Ok (Can load all the numeric values as vector instead of individually and then concatenating them. But keeping it as is for educational purposes.)
  • Data preprocessing: Ok (not needed)
  • Feature engineering: Ok (not needed)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics: Ok (Accuracy is 1 because of small test set)

Regression_BikeSharingDemand

Ok.

Regression_TaxiFarePrediction

  • Data loading: Ok.
  • Data preprocessing: Ok (not needed)
  • Feature engineering: Ok (Categorical transform for text columns)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics: LossFn needs to be removed from outputted metrics. It's the same as L2 b/c no custom loss function is defined.
    • R2 Score: 0.7
    • RMS loss: 5.97
    • Absolute loss: .99

MatrixFactorization_MovieRecommendation

MF using MFTrainer. Evaluation done as regressions.

MulticlassClassification-GitHubLabeler

  • Data loading: Ok.
  • Data preprocessing: Ok (not needed)
  • Feature engineering: Ok (Categorical transform for text columns)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics: Ok
    • MicroAcuracy (Avg): 71.8%
    • MacroAccuracy (Avg): 50.3%
    • LogLoss: 1.076
    • LogLossReduction: 56.2

Regression-SalesForecast (eShopDashboardML)

  • Data loading: Ok.
  • Data preprocessing: Ok (not needed)
  • Feature engineering: Ok (Categorical transform for text columns)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics: Ok
    • Product model:
      • L1 Loss: 96.5
      • L2 Loss: 74493.7
      • RMS: 96.5
      • R-squared: 56.6%
    • Country model:
      • L1 Loss: 0.446
      • L2 Loss: 0.386
      • RMS: 0.446
      • R-squared: 45.3%

AnomalyDetection-Sales

  • Data loading: Ok.
  • Data preprocessing: Ok (not needed)
  • Learner: Ok
  • Training: Ok
  • Scoring: Ok
  • Metrics: there is no evaluation in timeseries- spike detection and change point detection algorithm

Is there any guide to choose the Algorithms

For a new member to machinelearning, I found there are multiple algorithms for the same scenario like Regression, there are Sdca Regression, FastTreeTweedie Regression, Fast Tree and etc.

How to choose which algorithms to use?

Or, we train the model with all the available algorithms and to choose the best one per to the Evaluate result.

Is there any doc to describe the algorithms, their advantages and disadvantages and when to use them?

F# samples problems in Visual Studio 2017 15.7.6

On opening samples\fsharp\getting-started\GettingStarted.sln a message box is shown:

Internal error: Critical capabilities changes were detected without any change
made to the project 'Regression_TaxiFarePrediction.fsproj', and forces the
project to be reloaded.  Potential capabilities involved: 'FSharp'.

After that the solution and its projects are loaded with the following oddities:

  • In the solution explorer projects miss their Dependencies.
  • Building gives errors like:
    1>------ Build started: Project: Regression_TaxiFarePrediction, Configuration: Debug Any CPU ------
    1>C:\Program Files\dotnet\sdk\2.1.302\Sdks\Microsoft.NET.Sdk\targets\Microsoft.PackageDependencyResolution.targets(198,5):
    error : Assets file 'C:\-\ML\machinelearning-samples\samples\fsharp\getting-started\Regression_TaxiFarePrediction\obj\project.assets.json' not found.
    Run a NuGet package restore to generate this file.
    1>Done building project "Regression_TaxiFarePrediction.fsproj" -- FAILED.

Restoring and building from the command line using dotnet works fine.

Adding F# samples

I'm looking at adding some initial F# samples

Should the pivot be
samples\CSharp\getting-started\BinaryClassification_SentimentAnalysis
samples\FSharp\getting-started\BinaryClassification_SentimentAnalysis
or
samples\getting-started\BinaryClassification_SentimentAnalysis\CSharp
samples\getting-started\BinaryClassification_SentimentAnalysis\FSharp

I don't mind either way. The former is probably better

  • PRO: you choose a language, then you get the sample set

  • PRO: if community provide samples they are likely to be in one language

  • CON: it means the sample sets are more likely will diverge over time

  • PRO: if samples are missing for one language it doesn't really matter, you don't really notice

In GitHubLabeler project, TextLoader should be made `useHeader` is true

pipeline.Add(new TextLoader(DataPath).CreateFrom<GitHubIssue>());

should be:

pipeline.Add(new TextLoader(DataPath).CreateFrom<GitHubIssue>(useHeader: true));

if useHeader=false that means in corefx-issues-train.tsv data file , a first row Area label will be as a new category and total number of category is 23, not 22.

Visual Studio 2017 (15.7) build error

Use Visual Studio 2017 (15.7) to open GettingStarted.sln, compile, and quote the following error: Error NU1100: Cannot parse.
Netcoreapp,version=v2.0 's "microsoft.ml (>= 0.2.0)" Using dotnet restore is also the same error

F:\GitHub\machinelearning-samples\samples\getting-started>dotnet restore
正在还原 F:\GitHub\machinelearning-samples\samples\getting-started\Clustering_Iris\Clustering_Iris.csproj 的包...
正在还原 F:\GitHub\machinelearning-samples\samples\getting-started\BinaryClassification_SentimentAnalysis\BinaryClassification_SentimentAnalysis.csproj 的包...
正在还原 F:\GitHub\machinelearning-samples\samples\getting-started\MulticlassClassification_Iris\MulticlassClassification_Iris.csproj 的包...
正在还原 F:\GitHub\machinelearning-samples\samples\getting-started\Regression_TaxiFarePrediction\Regression_TaxiFarePrediction.csproj 的包...
F:\GitHub\machinelearning-samples\samples\getting-started\Regression_TaxiFarePrediction\Regression_TaxiFarePrediction.csproj : error NU1100: 无法解析 .NETCoreApp,Version=v2.0 的“Microsoft.ML (>= 0.2.0)”。 [F:\GitHub\machinelearning-samples\samples\getting-started\GettingStarted.sln]
F:\GitHub\machinelearning-samples\samples\getting-started\MulticlassClassification_Iris\MulticlassClassification_Iris.csproj : error NU1100: 无法解析 .NETCoreApp,Version=v2.0 的“Microsoft.ML (>= 0.2.0)”。 [F:\GitHub\machinelearning-samples\samples\getting-started\GettingStarted.sln]
F:\GitHub\machinelearning-samples\samples\getting-started\BinaryClassification_SentimentAnalysis\BinaryClassification_SentimentAnalysis.csproj : error NU1100: 无法解析 .NETCoreApp,Version=v2.0 的“Microsoft.ML (>= 0.2.0)”。 [F:\GitHub\machinelearning-samples\samples\getting-started\GettingStarted.sln]
F:\GitHub\machinelearning-samples\samples\getting-started\Clustering_Iris\Clustering_Iris.csproj : error NU1100: 无法解析 .NETCoreApp,Version=v2.0 的“Microsoft.ML (>= 0.2.0)”。 [F:\GitHub\machinelearning-samples\samples\getting-started\GettingStarted.sln]
F:\GitHub\machinelearning-samples\samples\getting-started\MulticlassClassification_Iris\MulticlassClassification_Iris.csproj 的还原在 175.21 ms 内失败。
F:\GitHub\machinelearning-samples\samples\getting-started\Clustering_Iris\Clustering_Iris.csproj 的还原在 175.21 ms 内失败。
F:\GitHub\machinelearning-samples\samples\getting-started\Regression_TaxiFarePrediction\Regression_TaxiFarePrediction.csproj 的还原在 175.21 ms 内失败。
F:\GitHub\machinelearning-samples\samples\getting-started\BinaryClassification_SentimentAnalysis\BinaryClassification_SentimentAnalysis.csproj 的还原在 175.21 ms 内失败。

StochasticDualCoordinateAscentClassifier Issue With 86 bit

when i use StochasticDualCoordinateAscentClassifier Issue With 86 bit

it through the following exception
'A call to PInvoke function 'Microsoft.ML.CpuMath!Microsoft.ML.Runtime.Internal.CpuMath.Thunk::SumSqU' has unbalanced the stack. This is likely because the managed PInvoke signature does not match the unmanaged target signature. Check that the calling convention and parameters of the PInvoke signature match the target unmanaged signature.'

but when used it with 64 it work

unfortunately i need to integrate it with 32 bit application

any advice pleas e

this sample of my code

         `   pipeline.Add(CollectionDataSource.Create(noramalTagsTrainingData));

            pipeline.Add(new Dictionarizer("Label"));
            // Transform any text feature to numeric values
            pipeline.Add(new CategoricalOneHotVectorizer(
               "fontColor",
               "tagText",
               "firstWord"
               ));

pipeline.Add(new ColumnConcatenator(
"Features",
"fontSize",
"isBold",
"isItalic",
"isUnderLine",
"containsDot",
"containsQuestionMark",
"isAllCaps",
"tagText",
"firstWord"
));

            pipeline.Add(new StochasticDualCoordinateAscentClassifier
            {
                MaxIterations = 100,
                NumThreads = 7,
                LossFunction = new SmoothedHingeLossSDCAClassificationLossFunction()
            });

pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

            // The pipeline is trained on the dataset that has been loaded and transformed.
            var model = pipeline.Train<NormalTagsModelFeatures, NormalTagsPrediction>();`

it through the exception at last line

Confidence Interval for CV metrics should be calculated using t-distribution quantiles

In samples\csharp\common\ConsoleHelpers.cs , the 95% confidence interval upper bound for CV metrics is calculated using 1.96, which is the 97.5% quantile of the standard normal distribution. We should instead use the 97.5% quantile of the Student's t distribution with (n - 1) degrees of freedom, where n is the number of folds, because we use the sample standard deviation.

        public static double CalculateConfidenceInterval95(IEnumerable<double> values)
        {
            double confidenceInterval95 = 1.96 * CalculateStandardDeviation(values) / Math.Sqrt(values.Count());
            return confidenceInterval95;
        }

Moreover, the standard deviation calculation should use the formula for the sample instead of the population, i.e. use (n - 1) instead of n in the denominator.

        public static double CalculateStandardDeviation (IEnumerable<double> values)
        {
            double average = values.Average();
            double sumOfSquaresOfDifferences = values.Select(val => (val - average) * (val - average)).Sum();
            double standardDeviation = Math.Sqrt(sumOfSquaresOfDifferences / values.Count());
            return standardDeviation;
        }

@shmoradims @CESARDELATORRE @justinormont can we use either Accord.NET or Math.NET for the t-distribution inverse CDF function? Any potential licensing issues with either? Accord.NET is GNU Lesser GPL 2.1, Math.NET Numerics is MIT/X11.

Translate into Chinese

I plan to translate machinelearning-samples into Chinese and introduce ML.NET to Chinese programmers.

I created a repo and translated the first README.md.

Can I create a PR to add link in original English page or upload README.zh-cn.md ?

creditcard.csv data loading error in VB.NET sample

In CreditCardFraudDetection.Trainer project (VB.NET version), under input folder the creditcard.csv file is generated using the unzip feature in code. Works fine.
In ConsoleHelpers.vb file, line 100 we get an error "Parsing failed with an exception: Stream reading encountered exception". Detailed Inner Exception as below.

Could not open file 'D:\Source\Repos\machinelearning-samples\samples\visualbasic\getting-started\BinaryClassification_CreditCardFraudDetection\CreditCardFraudDetection.Trainer\bin\Debug\netcoreapp2.1......\assets'. Error is: Access to the path 'D:\Source\Repos\machinelearning-samples\samples\visualbasic\getting-started\BinaryClassification_CreditCardFraudDetection\CreditCardFraudDetection.Trainer\assets' is denied.

I'm not a local admin in this PC. creditcard.csv is created from the given zip file by the program running in my user id. I gave full control for the entire root folder. But still no use. How to get rid of "System.UnauthorizedAccessException"? I understand this is an OS issue not a program issue. But the program has created this file and the program itself is unable to read it. Please help.

Sample Requirement for Field Aware Factorization Machines

For Movie Recommender, it describe that using Field Aware Factorization Machines to add more features like UserId, ProductId, Ratings, Product Description, Product Price etc, but for the demo project, is just analyse the useid and movieid.

Is there any complete demo to take more features in Recommendation demo?

Namesapces in Samples do not exist

The following Namespaces for the C# samples do not exist.
Where can I find them?

using Microsoft.ML.Core.Data;
using Microsoft.ML.Transforms.Categorical;
using static Microsoft.ML.Transforms.Normalizers.NormalizingEstimator;

TaxiFarePrediction sample gets Error MS3491 File Path/Name are too long

Severity Code Description Project File Line Suppression State
Error MSB3491 Could not write lines to file "obj\Debug\netcoreapp2.0\Regression_TaxiFarePrediction.csproj.FileListAbsolute.txt". The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters. Regression_TaxiFarePrediction D:\Program Files (x86)\Microsoft Visual Studio\2017\Professional\MSBuild\15.0\Bin\Microsoft.Common.CurrentVersion.targets 5070

F# samples need to be migrated to ML.NET v0.7 (minor migration from v0.6)

We released v0.7 today (Nov. 6th) and there are a few minor breaking changes compared to v0.6 in some Estimators naming, most of all.
For now, I'm setting a special "Directory.Build.props" file targeting v0.6 for the F# samples until they are migrated to v0.7.

C# samples are most of them already migrated to v0.7, so contributors for F# can check it out in regards naming, etc.

[Discussion] Profusion of subfolders and approaches for samples structure

https://github.com/dotnet/machinelearning-samples/tree/master/samples
has three subfolders, each with few examples.

Sorting of the folders is alphabetical, not in order of difficutly.

Suggestion: Flatten this hierarchy as follows.

101_BinaryClassification_SentimentAnalysis
102_Clustering_Iris
103_MulticlassClassification_Iris
201_Regression_TaxiFarePrediction
202_BinaryClasification_Titanic
203_Regression_BikeSharingDemands
301_eShopDashboardML
302_GitHubLabeler

[BUG] CreditCard-Fraud-Detection - If “creditcard.csv” already exists, error when trying to create it from the compressed/zip file

If “creditcard.csv” already exists, error when trying to create it from the compressed/zip file in the following code.

    public static void UnZipDataSet(string zipDataSet, string destinationFile)
    {
        if (!File.Exists(destinationFile))
        {
            var destinationDirectory = Path.GetDirectoryName(destinationFile);
            ZipFile.ExtractToDirectory(zipDataSet, $"{destinationDirectory}");
        }
    }

Weird, the file does exist but the API says it does not.. Need to investigate this issue.

3 F# samples fail due to unhandled exceptions

Hopefully I miss something obvious, please point me to the right direction.
3 F# samples fail due to Unhandled System.ArgumentOutOfRangeException.

Namely,

BinaryClassification_SentimentAnalysis

It works fine.

Clustering_Iris, MulticlassClassification_Iris

They fail:

> dotnet run
=============== Training model ===============
Automatically adding a MinMax normalization transform, use 'norm=Warn' or 'norm=No' to turn this behavior off.

Unhandled Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentOutOfRangeException:
 Training feature column 'Features' must be a known-size vector of R4, but has type: Vec<R8, 4>.
Parameter name: data

Regression_TaxiFarePrediction

It fails:

> dotnet run
=============== Training model ===============

Unhandled Exception: System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.ArgumentOutOfRangeException:
 Column 'Features' has invalid source types: All source columns must have the same type. Source types: 'Vec<R4, 3>, Vec<R4, 7>, R8, R8, Vec<R4, 6>'.
Parameter name: Column

Move at least F# samples to Paket

Paket is default package manager for almost every F# project. If you skip few Xamarin once.

Instead of going into discussion that which one is better, I just like to propose paket for at least F# samples purely based on popularity within community.

If C# samples or ML.Net like to add paket support I am very much happy to help in that. But for now, for this repo it would be great if at least F# samples moved to paket.

cc/ @dsyme @CESARDELATORRE @mariuszwojcik

ml 0.6

please convert these samples to the new 0.6 api

Database example

Purpose

There are requests for ML.Net to support database integration to retrieve data. While we do not have database support, we do provide a way to create an IDataView from an IEnumerable, which provides an open-ended way to integrate data into an ML.Net pipeline.

Therefore I am proposing that we provide an example of how to integrate a database as a datasource into an ML.Net pipeline using ML.Net's interface to create an IDataView from an IEnumerable. A working example would give users a starting point for their own integration. Having an example will allow us to understand the performance and limitations with using a database as a datasource.

Entity Framework

Rather than provide a database specific implementation (like ODBC or SQL), this example will use the Entity Framework. The Entity Framework would cover a number of databases and is cross plat as it is supported in .net core. This also provides a common framework that .net developers will most likely be familiar with.

Entity Framework will also give configuration options that we can experiment with to determine what provides the best performance when using ML.Net. An example of this is eager loading vs lazy loading.

More information about the Entity Framework can be found here:
https://docs.microsoft.com/en-us/ef/

References

Here are the issues from ML.Net requesting/asking about database support:
dotnet/machinelearning#1130
dotnet/machinelearning#107
dotnet/machinelearning#96

How to use Feature selection

I want to use FeatureSelectorByMutualInformation feature selection, can you please provide me a sample code snippet how to use it with a dataset, i could not able find in the samples provided.

Error when trying to build solution: Multiple assemblies with equivalent identity have been imported

Hi! I'm unable to build the solution successfully, here's what I did:

  • downloaded the repo in zip format
  • unzipped to a local folder
  • opened the solution with VS Studio 2017 (v15.5.2; .Net framework 4.7.03056; Windows 10; 64 bit)
  • built solution

Immediately errors appeared. Here's the first one:

Severity Code Description Project File Line Suppression State
Error CS1703 Multiple assemblies with equivalent identity have been imported: 'C:\Users\hello.nuget\packages\microsoft.netcore.app\2.0.0\ref\netcoreapp2.0\System.Reflection.Emit.Lightweight.dll' and 'C:\Users\hello.nuget\packages\system.reflection.emit.lightweight\4.3.0\ref\netstandard1.0\System.Reflection.Emit.Lightweight.dll'. Remove one of the duplicate references. BinaryClassification_SentimentAnalysis C:\Projects\machinelearning-samples-master\samples\getting-started\BinaryClassification_SentimentAnalysis\CSC 1 Active

Please advise, thank you!

TensorFlow demo does not compile

Hi Team

TensorFlow demo does not compile properly because it requires ML.NET 0.6 Preview version which is not available in NuGet (even if I select 'Include Prelease' option)

Please have a look at it.

Thanks

Add VB code examples

I want to help VB developers add the "machine learning" capability to their programs. So, I'll translate existing C# .NET Core console examples into VB.

Exception in Product-Recommendation-Matrix Factorization sample

Exception in Product Recommendation-Matrix Factorization
This sample needs to be fixed. It is currently throwing the following exception when running:

System.FormatException
HResult=0x80131537
Message=Parsing failed with an exception: Stream reading encountered exception
Source=Microsoft.ML.Data
StackTrace:
at Microsoft.ML.Runtime.Data.TextLoader.Cursor.d__33.MoveNext()
at Microsoft.ML.Runtime.Data.TextLoader.Cursor.MoveNextCore()
at Microsoft.ML.Runtime.Data.RootCursorBase.MoveNext()
at Microsoft.ML.Runtime.Recommender.Internal.SafeTrainingAndModelBuffer.ConstructLabeledNodesFrom(IChannel ch, ICursor cursor, ValueGetter1 labGetter, ValueGetter1 rowGetter, ValueGetter1 colGetter, Int32 rowCount, Int32 colCount) at Microsoft.ML.Runtime.Recommender.Internal.SafeTrainingAndModelBuffer.Train(IChannel ch, Int32 rowCount, Int32 colCount, ICursor cursor, ValueGetter1 labGetter, ValueGetter1 rowGetter, ValueGetter1 colGetter)
at Microsoft.ML.Trainers.MatrixFactorizationTrainer.TrainCore(IChannel ch, RoleMappedData data, RoleMappedData validData)
at Microsoft.ML.Trainers.MatrixFactorizationTrainer.Train(IDataView trainData, IDataView validationData)
at ProductRecommender.Program.Main(String[] args) in D:\GitRepos\machinelearning-samples-master\samples\csharp\getting-started\MatrixFactorization_ProductRecommendation\ProductRecommender\Program.cs:line 60

Inner Exception 1:
FormatException: Stream reading encountered exception

Inner Exception 2:
IOException: Could not open file './Data/Amazon0302.txt'. Error is: Could not find a part of the path 'D:\GitRepos\machinelearning-samples-master\samples\csharp\getting-started\MatrixFactorization_ProductRecommendation\ProductRecommender\bin\Debug\netcoreapp2.1\Data\Amazon0302.txt'.

Inner Exception 3:
DirectoryNotFoundException: Could not find a part of the path 'XXXXXXXX\samples\csharp\getting-started\MatrixFactorization_ProductRecommendation\ProductRecommender\bin\Debug\netcoreapp2.1\Data\Amazon0302.txt'.

It also needs to be updated from version 0.8.0-preview-27128-6 to version 0.8

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.