aleximmer / run-dmc Goto Github PK
View Code? Open in Web Editor NEWSolution repository of DMC-16 finalist team
Solution repository of DMC-16 finalist team
probably different classifiers should be allowed to have different classifiers.
Method that takes a DataFrame as input and returns the purity of each column and the columns's unique elements.
Get a CSV file containing already processed data
collect either in wiki or redmine.
currently we only use basic features based on date. Group a or b are working on more connected features. We should try implementing a few ourselves and see how they perform.
depending on Cor(returnQuantity, x), x might be fitted better having it as polynomial feature. THat especially holds if we use strictly linear models.
see if regression makes sense.
prevents memory overflow
How well do features perform? Find measures to give other groups feedback.
Currently we have:
{'100', '24', '25', '26', '27', '28', '29', '30', '31', '32', '33', '34', '36', '38', '40', '42', '44', '75', '80', '85', '90', '95', 'A', 'I', 'L', 'M', 'S', 'XL', 'XS'}
to predict splits the tree will make.
scikit learn offers many ensemble methods oob. We should try to put them into our architecture.
customerID, articleID can probably be binned by entropy. This is equal to forming customer or article clusters.
Should they bey dropped. What would the reason for keeping them be?
We're currently only looking at what's the overall return rate for a customer/size/color/...
Let's have a look at how likely it is that a product is returned in case the customer already returned a product before/a product with this color/size was returned before/ ... Buzz Word: Markov Chain/Markov Model
yyyy-mm-dd to weekdays, months etc.
Then use rules to predict specific label.
Currently we work on randomized samples. This makes sense only to a specific extent. It is indeed possible to have the same order split up in test and training set which should be impossible with respect to the final task.
Please have a look at our season feature again
def date_to_season(date):
if date.month <= 3 and date.day <= 22:
return 1
if date.month <= 6 and date.day <= 22:
return 2
if date.month <= 9 and date.day <= 22:
return 3
if date.month <= 12 and date.day <= 22:
return 4
return 1
and consider cases like month=12 & day=23
-> assigned to season 1
When trying to run process.py, I receive the error
AttributeError: 'DataFrame' object has no attribute 'orderDate'
cleansing.py:34 - df.orderDate = pd.to_datetime(df.orderDate)
Use imputation or other transformations
create cols and finally append all new cols to the frame. Easily done using pool
it turns out that colorcodes often fill gaps or are noisy in the target dataset. It would thus be helpful to define bins and learn on those. This can easily be done using entropy. Further, ideally we would have many prefix bins because we assume that colors starting with 1 are different than colors starting with 2 e.g..
binning? imputing? dependant on articleID?
Merge CSVs containing new features into our main dataframe.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.