Coder Social home page Coder Social logo

mlf's Introduction

弥勒佛

让天下没有难做的大数据模型!

现有的机器学习框架/软件包存在几个问题:

  • 无法处理大数据:多数Python,Matlab和R写的训练框架适合处理规模小的样本,没有为大数据优化。
  • 不容易整合到实际生产系统:standalone的程序无法作为library嵌入到大程序中。
  • 模型单一:一个软件包往往只解决一个类型的问题(比如监督式或者非监督式)。
  • 不容易扩展:设计时没有考虑可扩展性,难以添加新的模型和组件。
  • 代码质量不高:代码缺乏规范,难读懂、难维护。

弥勒佛项目的诞生就是为了解决上面的问题,在框架设计上满足了下面几个需求:

  • 处理大数据:可随业务增长scale up,无论你的数据样本是1K还是1B规模,都可使用弥勒佛项目。
  • 为实际生产:模型的训练和使用都可以作为library或者service整合到在生产系统中。
  • 丰富的模型:容易尝试不同的模型,在监督、非监督和在线学习等模型间方便地切换。
  • 高度可扩展:容易添加新模型,方便地对新模型进行实验并迅速整合到生产系统中。
  • 高度可读性:代码规范,注释和文档尽可能详尽,适合初学者进行大数据模型的学习。

安装/更新

go get -u github.com/huichen/mlf

功能

下面是弥勒佛框架解决的问题类型,括号中的斜体代表尚未实现以及预计实现的时间

  • 监督式学习:最大熵分类模型(max entropy classifier),决策树模型(decision tree based models,2014 Q1
  • 非监督式学习:聚类问题(k-means,2014 Q1
  • 在线学习:在线梯度递降模型(online stochastic gradient descent)
  • 神经网络(2014 Q2/3

项目实现了下面的组件

其它

mlf's People

Contributors

huichen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mlf's Issues

有没有考虑过先实现一下Apache Spark里的RDD,在此基础上实现分布式机器学习算法?

我用Spark写过两个机器学习算法,Naive Bayes classifier, Random forest, Apache Spark 的 RDD 是一个比Map-reduce和MPI更好用的分布式框架,表达力也很强。

看样子你目前是想直接使用golang 的 channel 实现分布式机器学习算法?这样的话,跟直接用MPI差不多,太底层,写的代码会很罗嗦。

方便程度:Spark > Hadoop > MPI, 代码简洁度:Spark > Hadoop > MPI,限制程度:MapReduce > Spark > MPI,MPI是最自由的,写起来也是最麻烦的,所以任何分布式计算框架,都似乎是在抽象和性能之间进行折中。

只有用更高层的抽象工具,写机器学习算法才会更简洁。

对了,Apache Tez,也是一个DAG计算框架(跟Spark很类似,本质上都是DAG),你也可以看看。还有MSRA的 Dryad,等等。我个人觉得Spark的RDD平衡的更好,也有非常成熟的实现,就是Spark本身。

关于用 golang 实现 Spark的RDD,有一个项目在此 Gopark,不过不太活跃,但是可以联系作者,一起做形成合力,照目前这种活跃度,项目完工遥遥无期。。。。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.