Comments (6)
I can add that to #32, or others can open PRs against that branch.
from framian.
There seems to be a design choice in the implementation of filterValues
.
def filterValues(p: V => Boolean): Series[K, V] = {
val b = new SeriesBuilder[K, V]
b.sizeHint(this.size)
cfor(0)(_ < index.size, _ + 1) { i =>
val ix = index.indexAt(i)
if (column.isValueAt(ix)) {
val v = column.valueAt(ix)
if (p(v)) {
b += (index.keyAt(i), Value(v))
}
}
}
b.result()
}
def filterValues(p: V => Boolean): Series[K, V] = {
val b = new SeriesBuilder[K, V]
b.sizeHint(this.size)
cfor(0)(_ < index.size, _ + 1) { i =>
val ix = index.indexAt(i)
if (column.isValueAt(ix)) {
val v = column.valueAt(ix)
if (p(v)) {
b += (index.keyAt(i), Value(v))
}
} else {
b += (index.keyAt(i), column.nonValueAt(ix))
}
}
b.result()
}
- treat the series as though it is dense, so filter out all non values as well as the values that don’t satisfy the predicate.
- treat the series as though it is sparse, so preserve all cells other than the values that don’t satisfy the predicate.
Option 1, does seem somewhat consistent with other methods with ‘value(s)’ in the name, but I don’t know if the same thing should apply here. Maybe we want both? Maybe these should be called filterByKeys
, filterByCells
, filterByValues
, and filterDenseByValues
?
from framian.
Unless there are a lot of existing cases where we would want to filter out values while retaining sparse cells, my preference would be to continue to let filterByValues
implicitly strip the Series of all non-value cells, similar to series.values
. Filtering by values while maintaining sparse cells could be easily implemented with filterByCells
.
If there are sufficient cases to filter a series down by its keys, then I'd be inclined to merge filterByKeys
and filterByCells
together into something like filterEntries
:
def filterEntries(p: (K, Cell[V]) => Boolean): Series[K, V]
from framian.
OTOH, if there are not many use-cases to filter a Series while implicitly removing non-value cells, then I'd be in favour of not having filterValues
at all, and just sticking with one filter method: filterEntries
.
from framian.
I think you’ve convinced me that filterByValues
should act as though the series is dense, and that filterEntries
is probably also worth having. However, I think we definitely want to keep filterByKeys
, filterByCells
, and filterByValues
. The reason being is that all of these can have more specialized implementations that are more efficient than filterEntries
.
One major optimization we can make to all of these filter methods is to take advantage of the fact that our traversal is in index order when building the output series. At the moment, SeriesBuilder
is only designed for the general case, so it assumes nothing about the order that key–cell pairs are appended to the builder. And this means that SeriesBuilder
produces a Series
with an unordered Index
(UnorderedIndex
). Instead, all of these filter methods should produce series backed by ordered indexes. That has the nice consequence that filtering any series (with an (un)ordered index) will always produce an efficient index.
filterValues
can do even better. If the underlying input column is dense, then it never needs to box primitives to retrieve them from the column, and an even more efficient version of SeriesBuilder
can be used that never needs to deal with cells and can produce a DenseColumn
as the output. So any series filtered with filterValues
will get a series with an ordered index and a dense column.
from framian.
I’ve completed this in #32
from framian.
Related Issues (15)
- Series.firstValue .lastValue HOT 1
- Add from/to to Index and Series HOT 1
- Allow multiple Cols in Frame#sortBy
- Add joinBy method to Frame HOT 6
- Add isEmpty method to Series
- Add method to iterate over all method in a Frame
- extract case class field names to columnnames HOT 1
- Convert Frame/Seres result to JSON
- Add an introductory Read Me HOT 3
- update shapeless HOT 1
- Save a Frame to CSV HOT 3
- Poor Performance on Larger CSV's HOT 1
- setting column labels to frame from unlabeled csv
- Is this project alive? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from framian.