Comments (4)
Hi, you can train directly on multi-dimensional numpy data as explained in the documentation: https://ydf.readthedocs.io/en/latest/tutorial/multidimensional_feature
The super short version of it is (with random data)
import ydf
num_examples = 10000
num_rows = 20
train_data = np.random.uniform(size=(num_examples, num_rows))
train_label = np.random.randint(0, 2, size=(num_examples))
train_ds = {"features": train_data, "label": train_label}
model = ydf.GradientBoostedTreesLearner(label="label").train(train_ds)
test_data = {"features": np.random.uniform(size=(1, num_rows))}
model.predict(test_data)
from yggdrasil-decision-forests.
Hi,
Thanks for the tip.
I have tried as you suggested but prediction values are like random values between 0.0 and 1.0, not at all useful.
from yggdrasil-decision-forests.
Ok, Here is the test. Extract files(train.npy, test.npy) from the attached zip file
import numpy as np
import ydf
train_data = np.load('train.npy')
train_label = np.random.randint(0, 2, size=(train_data.shape[0]))
print(train_data.shape)
train_ds = {"features": train_data, "label": train_label}
model = ydf.GradientBoostedTreesLearner(label="label").train(train_ds)
test_data = {"features": np.load('test.npy')}
predictions = model.predict(test_data)
print(predictions)
For the same data, TensorFlow's predictions are 99% correct but ydf's predictions look random to me. Am I missing something
here?
ydf.zip
from yggdrasil-decision-forests.
This notebook shows how to train a model on this dataset and make predictions with a Random Forest and a Gradient Boosted Trees model. The notebook also runs a cross-validation to evaluate the quality of predictions on this small dataset.
The model self evaluation (model.describe()
; out-of-bag accuracy of 53%) and cross-validation (learner.cross_validation(train_ds)
; accuracy=50%, AUC=51%) shows that the input features are virtually not correlated with the labels.
You mention that with "TensorFlow's predictions are 99% correct". Are you sure you are using the same dataset? If so, are you sure you are not evaluating on the training dataset?
from yggdrasil-decision-forests.
Related Issues (20)
- Windows Build Fails - Compiling .cc files results in syntax error HOT 2
- Missing whitespace on page /cli_install.html and please use sudo in build_binary_release.sh HOT 5
- Don't pollute my home !! HOT 1
- minor typo on page /intro_df.html HOT 1
- Cannot import ydf from windows vsc HOT 6
- Cannot compile standalone example on macOS HOT 1
- No aarch64 wheel or source distro HOT 2
- rich reports not rendering graphs in vscode HOT 3
- Cannot use 'discretize_numerical_columns' in tuner HOT 1
- Loading big models is slow HOT 9
- On MacOSX, Mac M Hardware (ARM), a segmentation fault happened with YDF when pyarrow is installed HOT 8
- MHLD_OBLIQUE is unknown while mentioned in the documentation HOT 2
- `to_tensorflow_function()` fails if added to the quickstart HOT 8
- import ydf returns error: ydf.so (no such file) and ydf.so (not a mach-o file) HOT 2
- `tree_plot.html()` Fails HOT 8
- (General inquiry) library differentiators HOT 2
- py312 whl support HOT 2
- Training is not reproducible between Intel and ARM (M2)
- Can't use the `verbose=2` argument HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from yggdrasil-decision-forests.