xxxnell / flex Goto Github PK

View Code? Open in Web Editor NEW

124.0 9.0 14.0 42.53 MB

Probabilistic deep learning for data streams.

License: MIT License

Scala 76.27% Python 23.42% Shell 0.23% HTML 0.08%

scala functional-programming probability probability-density-function probability-distribution statistics data-stream

flex's People

Contributors

Stargazers

Watchers

Forkers

fossabot hungnguyengoc leifwickland ipoemi kailuowang visenger oksktank rheehot nsho77 taekyulee segomin sungkmi horace-velmont

flex's Issues

`probability` diverges in some cases

The result of featured sample code in README.md diverges: it returns 0.9724554061259115.

Experiment with large scale `ConcatSmoothingPs`

When Sketch estimates the density distribution, too low a KL-divergence value is obtained because the boundary is not processed properly. Therefore, as a way of smoothing the edges, we use the large scale ConcatSmoothingPs and then re-examine KL-divergence when performing deepUpdate.

Configuration parameters are duplicated in several places

Configuration parameters are duplicated. For example, boundaryRatio of EqualizedIcdfSamplingConf is duplicated with boundaryCorrection of SketchConfB.

How about add CODE_OF_CONDUCT.md to flip?

Type of issue

Description

I think opensource project should have code of conduct. How about add CODE_OF_CONDUCT.md to flip?
You could add CODE_OF_CONDUCT.md below link 😄

Publish to maven central repository

Deploy to maven central repository.

FlatMap performance enhancement

FlatMap is too slow.

sbt task to execute all experiments

So far I had to call runMain to run the implemented experiments.However, as the number of experiments increases, it is no longer possible to run all the experiments one by one. Therefore, sbt task to execute all experiments must be needed.

Intellij didn't recognize the root package object import

import flip._ didn't work in Intellij syntax highlighter. (but compiling works fine.)

The purpose of `concat` of `RangePlot` is unclear

The purpose of concat of RangePlot is unclear. This function seems to decompose more than two primitives.

too many garbage samplings after flatMap

From now, 40% of samplings are garbages after flatMap. We have to reduced it by abount 10%.

ambiguous implicit values for `DistArthmeticSyntax`

Custom configuration generates the ambiguous implicit values for DistArthmeticSyntax. See flip.experiment.BasicBimodalDistExp

Improve KL-divergence accuracy

When calculating the KL-divergence, the boundary is vanishing. So, the calculation results doesn't included for it. Therefore, when the sampling number is too small (100>), or when the ratio of boundary is too high (0.01<), the numerical calculation result of KL-divergence is inaccurate.

`size` of `IcdfSamplingConf` != `size` of `CmapConf`

Change the structure type of Sketch: List → NonEmptyList

Too many Options in Sketch ops to handle Sketch with empty structure.

Interoperation for Java

To support Java, this project need an interface and syntactic sugar written in Java.

Remove `measure`

Abstract and separate `sampling`

I have now independently packaged the sampling algorithm to separate the sampling methods. However, the legacy is strongly combined, so one have to replace it with the new one.

See cmapForEqualSpaceCumCorr of EqualSpaceCdfUpdate

Sketch.fastPdf sometimes generates NaN

Sketch.fastPdf returns (_, NaN) in some case.

KLD takes conf2 only (no conf1)

KLD uses conf1 (configuration of first distribution) and conf2 at the same time, but it takes conf2 only.

No criteria of the parameter of the `fastSampling` of `Sketch`

The fastSampling function of Sketch uses a parameter (called ratio) when it defines the sampling points. However, there's no rule to set the value of the parameter. So, it should be customized or defined automatically.

Plot with Measurable

Now plot contains primitive records only. However, in some cases, plot with measurable range, or RangeM would be useful.

There is some error between `interpolationPdf` and `fastPdf`

There is some error (about 10-15%) between interpolationPdf and fastPdf. See fastPdf part of AdaPerSketchSpec.

`sample` of `Sketch` returns boundary values occasionally

sample of Sketch returns boundary values (e.g. List(..., -1.3346329812349141E307, 7.927339694866348E307, ..., 4.2420349412446703E307, ..., -1.1082857601763558E308...)).

Apply sbt-release

Apply sbt-release(https://github.com/sbt/sbt-release) to manage version simply.

Update sbt version: 0.13.x → 1.1.x

`icdfPlot` in `updateCmap` of `EqualSpaceCdfUpdate` doesn't returns infinity at 0 and 1.

Theoretically, inverse-cdf (quantile) returns ±∞ at 0 and 1. However, due to the limitations of the way Sketch treats boundaries, this value only returns a finite large value.

For now, we take the approach of artificially removing the two values of the boundaries, but we need a more sophisticated way of getting a new Cmap in this function.

Apply linting tool

The open source project have to be applied linting tool such as scalafix(https://github.com/scalacenter/scalafix).

`bind` returns NaN

bind returns NaN for this configuration:

    val samplingNo = 50

    implicit val conf: SketchConf = SketchConf(
      startThreshold = 50,
      thresholdPeriod = 100,
      boundaryCorr = 0.1,
      decayFactor = 0,
      queueSize = 30,
      cmapSize = samplingNo,
      cmapNo = 5,
      cmapStart = Some(-10d),
      cmapEnd = Some(10),
      counterSize = samplingNo
    )

For more detail, see the code.

`map` and `flatMap` generates NaN when Sketch contains nothing

map and flatMap generates NaN when Sketch contains nothing (Sketch.empty).

So:

add a unit test code
fix this bug

Remove `AdaptiveSketch` and merge it to `Sketch`

Apply scalafmt

Implement `SelectiveSketch`

Implement SelectiveSketch which performs deepUpdate selectively only when there is a discrepancy between the temporarily collected sample datas and the recorded distribution by Sketch.

HCounter is subcategory of Counter

HCounter is subcategory of Counter, but they recide in different packages.

Should sampling return Option?

Now sampling of SamplingDist returns Option of DensityPlot for empty structure Sketch. However, it can return DensityPlot.empty instead of None.

Memo `icdf`

Getting icdf is an expensive operation. Therefore, cache icdf to improve performance.

Add various samplings

Flip seems to be able to compose various sampling methodologies such as MCMC or Gibbs.

Implement `sample` for Sketch

Now the sample of Sketch isn't implemented. Implement sample using inverse transform sampling.

Modularize `smoothing`

smoothing operations are used in several places. The use of UpdateCmap and DeepUpdate is especially important.

As part of refactoring the smoothing operation, several methods should be applicable dynamically.

The mixing factor appears to depend on the size of the buffer and the number (or ratio) of Cmap's

Support for comprehension

Currently, Sketch only has a working monad operation, so if you include Dist except Sketch in for comprehension, it will not work properly.

Update scala version: 2.12.1 → 2.12.4

Execute all experiment codes in CLI

sbt experiment command in root should execute all experiments (c.f. flex.experiment package). However, for now, only one experiment is executed (with arg0). Therefore, the experiment command that does not have an argument must perform all the experiment codes. See Tasks.

Apply ND4J

ND4J, or N-Dimensional Arrays for Java is scientific computing libraries for the JVM. They are meant to be used in production environments, which means routines are designed to run fast with minimum RAM requirements.
It would be better to replace array computation.

RecurSketch doesn't update its count when it calls the narrowUpdate only

Now RecurSketch only override update, and count updates when the update is called. Therefore, RecurSketch doesn't update its count when it calls the narrowUpdate only.

narrowUpdate of RecurSketch must be overrided with updating its count.