Comments (6)
I have two variations on the same approach. Both produce a frame with the original keys...however:
- The first has to drop any keys from the "that" frame since type A != Row.
- The second includes keys from both frames, but changes the method to use Row and Col types for all frames to do it.
Not sure how to have my cake and eat it too here:
// Original version that only has keys from "this" frame:
def joinBy1[A: Order: ClassTag](by: Cols[Col, A])(that: Frame[A, Col]): Frame[Row, Col] = {
val reIdx = reindex(by)
val thatReIdx = that.reindex(by)
val joined = reIdx.merge(thatReIdx)(Merge.Outer)
val (k, v) = joined.rowIndex.flatMap{ case (r, i) =>
reIdx.rowIndex.getAll(r).indices.map(k => (rowIndex.keyAt(k), i))
}.unzip
joined.withRowIndex(Index.apply(k, v))
}
// Version where types match and keys from both are in result:
def joinBy2(by: Cols[Col, Row])(that: Frame[Row, Col]): Frame[Row, Col] = {
val reIdx = reindex(by)
val thatReIdx = that.reindex(by)
val joined = reIdx.merge(thatReIdx)(Merge.Outer)
val (k, v) = joined.rowIndex.flatMap{ case (r, i) =>
reIdx.rowIndex.getAll(r).indices.map(k => (rowIndex.keyAt(k), i)) ++
thatReIdx.rowIndex.getAll(r).indices.map(k => (that.rowIndex.keyAt(k), i))
}.unzip
joined.withRowIndex(Index.apply(k, v))
}
thoughts?
from framian.
I prefer joinBy1
. It would be nice if we had the option of Inner
or Left
(Right
and Outer
are Inner
and Left
because we drop the right keys). Perhaps, innerJoinBy(...)(...)
and leftJoinBy(...)(...)
? What do you think?
from framian.
I like it. But does it make sense to be consistent with the other methods and take the Join as a param? Even though Right
= Inner
and Outer
= Left
they are all valid arguments right?
from framian.
My only issue is that while they are technically valid, they would be confusing. I'd rather remove that point of confusion... However, I am not against making a sub-trait of Join
that only works for Left/Inner and Right/Inner... so, something like this:
sealed trait Join {
val leftOuter: Boolean
val rightOuter: Boolean
}
sealed trait LeftBiasedJoin extends Join {
val rightOuter: Boolean = false
}
sealed trait RightBiasedJoin extends Join {
val leftOuter: Boolean = false
}
object Join {
case object Inner extends LeftBiasedJoin with RightBiasedJoin
case object Left extends LeftBiasedJoin { val leftOuter = true }
case object Right extends RightBiasedJoin { val rightOuter = true }
case object Outer extends Join { val leftOuter = true; val rightOuter = true }
}
And then we have:
def joinBy(by: Cols[Col, A])(that: Frame[_, Col])(join: LeftBiasedJoin): Frame[Row, Col]
Of course, the major issue here is we break binary compatibility, so it would have to be in the 0.4.0 release.
from framian.
Probably not a hard breaking change to handle when released. I have this now:
def joinBy[A: Order: ClassTag](by: Cols[Col, A])(that: Frame[A, Col])(join: LeftBiasedJoin): Frame[Row, Col] = {
val reIdx = reindex(by)
val joined = reIdx.join(that.reindex(by))(join)
joined.withRowIndex(
joined.rowIndex.flatMap{ case (r, i) =>
reIdx.rowIndex.getAll(r).indices.map(k => (rowIndex.keyAt(k), i))
}.unzip match {
case (k, v) => Index.ordered(k, v)
}
)
}
I can clean it up and add the PR if you like.
from framian.
Please - just make the rows parameter on that
a wildcard (Frame[_, Col]
), since I don't think you meant to put the A
from Cols[Col, A]
there ;)
from framian.
Related Issues (15)
- Series.firstValue .lastValue HOT 1
- Ensure consistency in use of value/cell names in method in Series. HOT 6
- Add from/to to Index and Series HOT 1
- Allow multiple Cols in Frame#sortBy
- Add isEmpty method to Series
- Add method to iterate over all method in a Frame
- extract case class field names to columnnames HOT 1
- Convert Frame/Seres result to JSON
- Add an introductory Read Me HOT 3
- update shapeless HOT 1
- Save a Frame to CSV HOT 3
- Poor Performance on Larger CSV's HOT 1
- setting column labels to frame from unlabeled csv
- Is this project alive? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from framian.