Coder Social home page Coder Social logo

Comments (6)

dwhjames avatar dwhjames commented on July 3, 2024

I can add that to #32, or others can open PRs against that branch.

from framian.

dwhjames avatar dwhjames commented on July 3, 2024

There seems to be a design choice in the implementation of filterValues.

def filterValues(p: V => Boolean): Series[K, V] = {
  val b = new SeriesBuilder[K, V]
  b.sizeHint(this.size)
  cfor(0)(_ < index.size, _ + 1) { i =>
    val ix = index.indexAt(i)
    if (column.isValueAt(ix)) {
      val v = column.valueAt(ix)
      if (p(v)) {
        b += (index.keyAt(i), Value(v))
      }
    }
  }
  b.result()
}
def filterValues(p: V => Boolean): Series[K, V] = {
  val b = new SeriesBuilder[K, V]
  b.sizeHint(this.size)
  cfor(0)(_ < index.size, _ + 1) { i =>
    val ix = index.indexAt(i)
    if (column.isValueAt(ix)) {
      val v = column.valueAt(ix)
      if (p(v)) {
        b += (index.keyAt(i), Value(v))
      }
    } else {
      b += (index.keyAt(i), column.nonValueAt(ix))
    }
  }
  b.result()
}
  1. treat the series as though it is dense, so filter out all non values as well as the values that don’t satisfy the predicate.
  2. treat the series as though it is sparse, so preserve all cells other than the values that don’t satisfy the predicate.

Option 1, does seem somewhat consistent with other methods with ‘value(s)’ in the name, but I don’t know if the same thing should apply here. Maybe we want both? Maybe these should be called filterByKeys, filterByCells, filterByValues, and filterDenseByValues?

from framian.

mrvisser avatar mrvisser commented on July 3, 2024

Unless there are a lot of existing cases where we would want to filter out values while retaining sparse cells, my preference would be to continue to let filterByValues implicitly strip the Series of all non-value cells, similar to series.values. Filtering by values while maintaining sparse cells could be easily implemented with filterByCells.

If there are sufficient cases to filter a series down by its keys, then I'd be inclined to merge filterByKeys and filterByCells together into something like filterEntries:

def filterEntries(p: (K, Cell[V]) => Boolean): Series[K, V]

from framian.

mrvisser avatar mrvisser commented on July 3, 2024

OTOH, if there are not many use-cases to filter a Series while implicitly removing non-value cells, then I'd be in favour of not having filterValues at all, and just sticking with one filter method: filterEntries.

from framian.

dwhjames avatar dwhjames commented on July 3, 2024

I think you’ve convinced me that filterByValues should act as though the series is dense, and that filterEntries is probably also worth having. However, I think we definitely want to keep filterByKeys, filterByCells, and filterByValues. The reason being is that all of these can have more specialized implementations that are more efficient than filterEntries.

One major optimization we can make to all of these filter methods is to take advantage of the fact that our traversal is in index order when building the output series. At the moment, SeriesBuilder is only designed for the general case, so it assumes nothing about the order that key–cell pairs are appended to the builder. And this means that SeriesBuilder produces a Series with an unordered Index (UnorderedIndex). Instead, all of these filter methods should produce series backed by ordered indexes. That has the nice consequence that filtering any series (with an (un)ordered index) will always produce an efficient index.

filterValues can do even better. If the underlying input column is dense, then it never needs to box primitives to retrieve them from the column, and an even more efficient version of SeriesBuilder can be used that never needs to deal with cells and can produce a DenseColumn as the output. So any series filtered with filterValues will get a series with an ordered index and a dense column.

from framian.

dwhjames avatar dwhjames commented on July 3, 2024

I’ve completed this in #32

from framian.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.