Array manipulation, ordering, searching, summarizing, etc.
d3 / d3-array Goto Github PK
View Code? Open in Web Editor NEWArray manipulation, ordering, searching, summarizing, etc.
Home Page: https://d3js.org/d3-array
License: ISC License
Array manipulation, ordering, searching, summarizing, etc.
Home Page: https://d3js.org/d3-array
License: ISC License
Array manipulation, ordering, searching, summarizing, etc.
What if d3.nest returned an empty nest object, on which you can define some keys (which would nest existing values, if any), and then you added objects to the nest object and they were automatically slotted into the correct position? Perhaps you could remove objects, too. It’s not clear how nest.rollup would work in this context, though.
It would be useful to have a "blur" function to approximate kernel density estimation.
This function would accept the output from histogram
, conceptually looking something like:
blur(histogram(data))
Original discussion in d3/d3-contour#7 (comment)
The implementation could be similar to blurX or blurY in d3-contour.
This article may be relevant: Convolve n Square Pulses to Gaussian.
Related #48. This would be a breaking change, though, presumably.
While the new range() function is improved, it still produces an extra element for some fractional step sizes.
range(0, 1, 1/49).length;
50
Also see d3/d3#2524
A fix may have to take into account limited floating point precision, in a similar way to pull request
d3/d3#2526
The problem is
(1-0)/(1/49)
49.00000000000001
which is upped to 50 by Math.ceil().
what can i do for controll the min height less then 1px of the bar
bar.append("rect")
.attr("y", 1)
.attr("height", function(d) { console.log("===",y(d.x1) - y(d.x0) - 1); return y(d.x1) - y(d.x0) - 1; })
.attr("width", function(d) { return height - x(d.length); });
怎么样让这个高度小于1px的bar不隐藏,而是显示小于1px呢?我的这个是横向的bar
Formerly known as multimap, as proposed here:
I made https://github.com/interactivethings/d3-comparator a while back and want to use it again. It uses the old d3 global extension mechanism though, so I'd need to update it. I wondered if I should update it to work properly with npm/rollup, decouple it from d3 entirely or if I should make a PR to this library? Do you think it would be a good fit for d3-array?
Related d3/d3-scale#81, the fact that d3.tickStep can returns a floating point number can cause cascading problems in computing nice domains and ticks.
But I suspect there’s an easy fix, because the tick step is always a power of ten, optionally multiplied by 2 or 5. If the power of ten is nonnegative, the existing behavior is fine; but if it’s negative, we return the inverse tick step instead, which is likewise guaranteed to be an integer. Let’s call this the tick “increment” (or perhaps the tick “interval”). We can introduce d3.tickIncrement and deprecate d3.tickStep.
So, if the tick step is 0.05, then the tick increment would be -20. Here’s the implementation, which now requires that start ≤ stop:
var e10 = Math.sqrt(50),
e5 = Math.sqrt(10),
e2 = Math.sqrt(2);
function tickIncrement(start, stop, count) {
var step = (stop - start) / Math.max(0, count),
power = Math.floor(Math.log(step) / Math.LN10),
error = step / Math.pow(10, power);
return power >= 0
? (error >= e10 ? 10 : error >= e5 ? 5 : error >= e2 ? 2 : 1) * Math.pow(10, power)
: -Math.pow(10, -power) / (error >= e10 ? 10 : error >= e5 ? 5 : error >= e2 ? 2 : 1);
}
Note that this is guaranteed to return an integer because powers of ten are always integer multiples of 2 and 5.
To use it to nice, the scale would do something like:
var step = tickIncrement(start, stop, count);
if (step >= 0) {
start = Math.floor(start / step) * step;
stop = Math.ceil(stop / step) * step;
} else {
start = Math.ceil(start * step) / step;
stop = Math.floor(stop * step) / step;
}
Which in the case of d3/d3-scale#81 produces the result of [5.8, 6.2]. 👍
You’d need something similar in d3.ticks (ignoring descending intervals):
function ticks(start, stop, count) {
var step = tickIncrement(start, stop, count);
return step >= 0 ? range(
Math.ceil(start / step) * step,
Math.floor(stop / step) * step + step / 2, // inclusive
step
) : range(
Math.floor(start * step) / step,
(2 * Math.ceil(stop * step) - 1) / (2 * step), // inclusive
1 / -step
);
}
Which results in [5.8, 5.85, 5.8999999999999995, 5.95…6.05, 6.1, 6.1499999999999995, 6.2], which seems reasonable.
Hello!
When working with multiple series of data, it's a common task to calculate extent of extents, and currently I see no clear way of doing this.
I propose to change d3.extent
, d3.min
, etc. to treat arrays differently (e.g. flatten a passed array) so that code like
d3.extent(data, d => d3.extent(d))
yields a single pair of values.
Or maybe, if changing d3.extent
&Cº seems confusing, it might be good add a second accessor parameter to the d3.merge
so that it looks like
d3.extent(d3.merge(data, d => d3.extent(d)))
Because this
d3.extent(d3.merge(data.map(d => d3.extent(d))))
looks terrible for me and if you want to calculate an extent of extent of extents, this starts to look even more terrible.
Related d3/d3#2567.
Proposed:
function pair(a, b) {
return [a, b];
}
d3.cross = function(a, b, f) {
var na = a.length, nb = b.length, c = new Array(na * nb), ia, ib, ic, va;
if (f == null) f = pair;
for (ia = ic = 0; ia < na; ++ia) for (va = a[ia], ib = 0; ib < nb; ++ib, ++ic) c[ic] = f(va, b[ib]);
return c;
};
For example, given the following CSV data:
Year,Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,J-D,D-N,DJF,MAM,JJA,SON
1880,-.30,-.21,-.18,-.27,-.14,-.29,-.24,-.08,-.17,-.16,-.19,-.22,-.20,***,***,-.20,-.20,-.17
1881,-.10,-.14,.01,-.03,-.04,-.28,-.07,-.03,-.09,-.20,-.26,-.16,-.12,-.12,-.15,-.02,-.13,-.19
1882,.09,.08,.01,-.20,-.18,-.25,-.11,.03,-.01,-.23,-.21,-.25,-.10,-.09,.00,-.12,-.11,-.15
1883,-.34,-.42,-.18,-.25,-.26,-.13,-.09,-.14,-.19,-.12,-.21,-.19,-.21,-.22,-.34,-.23,-.12,-.18
1884,-.18,-.13,-.36,-.36,-.32,-.38,-.35,-.27,-.24,-.22,-.30,-.30,-.28,-.28,-.17,-.35,-.33,-.25
1885,-.66,-.30,-.24,-.45,-.42,-.50,-.29,-.27,-.19,-.20,-.22,-.07,-.32,-.33,-.42,-.37,-.35,-.20
1886,-.43,-.46,-.41,-.29,-.27,-.39,-.16,-.31,-.19,-.25,-.26,-.25,-.31,-.29,-.32,-.33,-.29,-.23
1887,-.66,-.48,-.32,-.37,-.33,-.21,-.19,-.28,-.19,-.32,-.25,-.38,-.33,-.32,-.46,-.34,-.22,-.26
…
You could say:
d3.csv("temperatures.csv")
.then(data => d3.cross(data.columns.slice(1, 13), data, (month, d) => ({
date: d.Year + "-" + month,
temperature: d[month]
})))
The nest operator is currently using rollup.call(nest, array)
but rollup(array)
should be sufficient. I don’t see a good reason to set the nest instance as this
. Related:
Imagine you’re joining a TSV file to a GeoJSON feature collection. A typical way of doing that might be to create a Map and then use array.forEach:
var map = new Map(rates.map(d => [d.id, +d.rate]));
collection.features.forEach(f => f.properties.rate = map.get(f.id));
It’d be neat if there was a simple way to join two arrays of objects and invoke a function for each joined row.
Option 1:
d3.join(collection.features, rates, (a, b) => a.properties.rate = +b.rate);
This doesn’t really work because it would assume that d => d.id
is always the key function, and in practice you’d want to be able to specify key functions for both the left and the right arrays. I suppose you could require calling array.map on your arrays before passing them to d3.join, but that makes it increasingly less useful than just using a Map as above.
I think we should avoid too many unnamed arguments to a single function especially with optionals, so the following Option 2 probably isn’t a good idea:
d3.join(collection.features, a => a.id, rates, b => b.id, (a, b) => a.properties.rate = +b.rate);
A verbose option 3, a bit like d3.nest:
d3.join()
.leftKey(a => a.id)
.rightKey(b => b.id)
.reduce((a, b) => a.properties.rate = +b.rate)
(rates, collection.features);
An enhancement of option 3 with a convenience for setting the left and right key to the same function:
d3.join()
.key(d => d.id)
.reduce((a, b) => a.properties.rate = +b.rate)
(rates, collection.features);
But what would join.key with no arguments return?
A further or alternative enhancement of option 3 to specify the left and right key to the constructor:
d3.join(d => d.id)
.reduce((a, b) => a.properties.rate = +b.rate)
(rates, collection.features);
Slightly icky problem here is the default case. Unlike d3.nest, there’s a reasonable default join, but to use it requires extra parens:
d3.join()(rates, collection.features);
Option 4 is immutable closures like d3-interpolate’s interpolate.gamma. These are nice because then you don’t need extra parens in the default case:
d3.join(rates, collection.features);
With a custom reducer:
d3.join.reduce((a, b) => a.properties.rate = +b.rate)(collection.features, rates);
With a custom key and reducer (everything is named!):
d3.join
.key(d => d.id)
.reduce((a, b) => a.properties.rate = +b.rate)
(collection.features, rates)
With this approach join.key can easily take two functions if you wanted separate keys for left and right. (You could have separate join.leftKey and join.rightKey, but I don’t think it’s necessary.) You can’t call join.key as an accessor as you can in option 3 so there’s no issue with what sort of return value makes sense—it always constructs a new join operator.
Also there’s the question of what join(A, B) should return. Nothing? Maybe an array of results returned by the reducer, similar to d3.cross? With the same default reducer of (a, b) => [a, b]
?
I am trying to create a histogram with a specific number of bins which should all have the same width (i.e. the domain should be uniformly divided):
I started by using x.ticks()
(Option 1):
const binCount = 5;
const data = [1, 2, 3, 4, 4.7];
const [min, max] = d3.extent(data);
const x = d3.scaleLinear().domain([min, max]);
// Option 1: Tick-based thresholds
const histogram1 = d3.histogram()
.domain(x.domain())
.thresholds(x.ticks(binCount));
const bins1 = histogram1(data);
console.log("Option 1 bin widths: " + bins1.map(b => (b.x1 - b.x0)));
// 1,1,1,0.7000000000000002
The last bin is narrower than all other bins (to be expected based on how ticks
works).
For histogram.thresholds([count])
, the docs state that:
If a count is specified instead of an array of thresholds, then the domain will be uniformly divided into approximately count bins
This is not what I observe (Option 2):
// Option 2: Count-based thresholds
const histogram2 = d3.histogram()
.domain(x.domain())
.thresholds(binCount);
const bins2 = histogram2(data);
console.log("Option 2 bin widths: " + bins2.map(b => (b.x1 - b.x0)));
// 1,1,1,0.7000000000000002
Q: What's the proper invocation of histogram.thresholds([count])
?
I currently use a manual array of thresholds (Option 3):
// Option 3: Range-based thresholds
const thresholds = d3.range(min, max, (max - min) / binCount);
const histogram3 = d3.histogram()
.domain(x.domain())
.thresholds(thresholds);
const bins3 = histogram3(data);
console.log("Option 3 bin widths: " + bins3.map(b => (b.x1 - b.x0)));
// 0.74,0.74,0.7399999999999998,0.7400000000000002,0.7400000000000002
This works, but seems overly complex for such a simple use case...
My speculation on the reason why creating array.js
given array.map
and array.slice
are available, is to use the code like the following found in histogram.js
slice.call(_)
Is it because array.slice
could not run as slice.call(_)
? If so, what does this .call
do exactly for slice
? Where can I learn more about this kind of usage?
Thanks
Related d3/d3#1091, it might be nice if there were an easy way to count the number of elements matched at each level of the hierarchy. Yes, things like the cluster layout do that for you already, but it’d be nice to do that with a simple nest operator, too.
This feels slightly related to the nest.rollup method, too. Like, instead of replacing the set of nested values with the return value of the rollup function, I just want to decorate the object (say by assigning a count value). But another big difference is that rollup only operates on arrays of siblings, and nest.count should be a recursive operation on the entire tree.
So… it’s almost like you want tree visit methods on the returned nest object. Which makes me wonder if there should be a nest.tree method instead of nest.map, and then have some useful methods on the returned tree instance.
Hi Mike - This example of yours: https://bl.ocks.org/mbostock/4062045 I think used to display the names listed in the <title> tag. Can't seem to get these to display with CSS, and I don't see anything in the JS that is keeping it from appearing - what do you suggest?
For computing the sample skewness of a sample of values.
It’d be nice to have.
It might be nice to take the linear ticks implementation from d3-scale and put it here in d3-array. Then the histogram could take the suggested threshold count and compute nice rounded thresholds, rather than simply dividing the domain uniformly.
(This would be similar to how R automatically uses pretty when the histogram breaks are specified as a count hint.)
d3.ticks(0, 320, 23);
// > Array (33)
// [0, 10, 20, 30, 40, 50, 60, 70, 80, 90,
// 100, 110, 120, 130, 140, 150, 160, 170, 180, 190,
// 200, 210, 220, 230, 240, 250, 260, 270, 280, 290,
// 300, 310, 320]
I would prefer that ticks count would never be exceeded. This will prevent axis ticks overlap, when ticks count is calculated based on displayed tick size.
I was looking the d3 arrays functions source code and só a repeting pattern like:
if (valueof == null) {
while (++i < n) { // Find the first comparable value.
if ((value = values[i]) != null && value >= value) {
min = max = value;
while (++i < n) { // Compare the remaining values.
if ((value = values[i]) != null) {
if (min > value) min = value;
if (max < value) max = value;
}
}
}
}
}
else {
while (++i < n) { // Find the first comparable value.
if ((value = valueof(values[i], i, values)) != null && value >= value) {
min = max = value;
while (++i < n) { // Compare the remaining values.
if ((value = valueof(values[i], i, values)) != null) {
if (min > value) min = value;
if (max < value) max = value;
}
}
}
}
}
Insead, use this:
...
if (valueof == null) valueof = function(x){ return x;}
...
I guess removing the block duplication is cleaner.
Like d3/d3-collection#5 but for histograms.
The callback function for Map.forEach
gets called with 3 arguments (in order) for each item in collection:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map/forEach
The d3.map
gets called with key then value (so backwards).
I understand that you can't rely on browser support for native JavaScript collections (Map and Set) but I think you should mimic their apis as much as possible.
I'm happy to send in a PR with the change if you think such a change should be made.
Top level calls are hard to treeshake because tree-shakers/minifiers are often "afraid" of removing them because of potential side-effects of those calls.
Culprits:
Line 4 in 36edab3
Lines 1 to 3 in 36edab3
Possible solution:
Add #__PURE__
annotations before those calls to reassure UglifyJS that those calls can be dropped if their results stay unused - either manually or with help of https://github.com/Andarist/babel-plugin-annotate-pure-calls
Intent to implement:
yes - would only have to know if you want to tackle this one
Seems like this repo could be a good place for it.
I use the code in https://bl.ocks.org/mbostock/3048450 with this data
data = [10, 11, 8, 17, 2, 17, 6, 5, 3, 14, 16, 2, 1, 238, 5, 96, 5, 23, 2, 1, 17, 72, 9, 3, 63, 16, 10, 2, 10, 6, 39, 2, 1, 12, 4, 4, 9, 10, 9, 14, 8, 2, 76, 3, 15, 23, 18, 6, 6, 37, 13, 25, 25, 3, 20, 10];
The binning is correct with the original code. But when I remove the line of specifying thresholds .thresholds(x.ticks(20))
, the biggest element (238) isn't plotted correctly. I used the latest version of d3 v4 as exactly in the link above.
For computing the sample kurtosis of a sample of values.
We have transpose(matrix) which transposes the rows and columns of a matrix (an array of arrays). But you might want similar functionality to transpose an object whose values are arrays into an array whose elements are objects.
function transpose(object) {
var m = 0;
for (var k in object) m = Math.max(m, object[k].length);
for (var i = -1, transpose = new Array(m); ++i < m;) {
var o = transpose[i] = {};
for (var k in object) {
o[k] = object[k][i];
}
}
return transpose;
}
Where:
transpose({
year: [2001, 2002],
value: [1, 2]
})
Returns:
[
{year: 2001, value: 1},
{year: 2002, value: 2},
]
Should we try to overload d3.transpose to do both?
If so, how? The above implementation sort-of works for array-of-array input such as transpose([[0, 1, 2], [3, 4, 5]])
, but returns an array of objects rather than an array of arrays. You could use Array.isArray to test whether the input is an Array and branch the behavior accordingly.
If not, what name should this new method have? transposeObject?
Also, if transpose(object) given an object whose values are arrays returns an array of objects, then transpose(array) given an array whose elements are objects should return an object whose values are arrays.
Maybe this is too magical.
This is the description:
https://github.com/d3/d3-array#shuffle
# d3.shuffle(array[, lo[, hi]])
Randomizes the order of the specified array using the Fisher–Yates shuffle.
I think it's more often needed to know what args are accepted and how they affect the return value rather than what algorithm is used under the hood.
I am learning d3-array by reading both doc and src. histogram.value
's doc is very detailed but still quite vague for me to grasp. The source code helps make more sense of doc, but I am still not very sure I understand the logic probably.
Here is what I understand about histogram.value(value)
:
value
using, identity
or other functions;identity
functionconstant(_)
in which _
stands for the arg; but this part of code is constant(_), histogram
, I don't understand why these is histogram
, nor constant()
seem do anything meaningful to the arg _
.I don't understand why there is a histogram
inside (value = typeof _ === "function" ? _ : constant(_), histogram)
. Could you explain a little more? Could you make another simpler example setting values for histogram.value
, histogram.domain
, histogram.thresholds
without using scaleLinear
?
by the way, is constant(_), histogram
a use of comma operator ? If it is, what does this line of code mean?
Thanks
When asking for 4-ish points between 0 and 24, I expected to get 0, 5, 10, 15, 20, 25, but ticks
stops at 20:
var a = require('d3-array')
// undefined
a.ticks(0, 24, 4)
// [ 0, 5, 10, 15, 20 ]
I believe this is because when calculating the range's stopping point, you use floor(24/step) * step + step/2
which in this case is 22.5, and won't include 24.
Replacing that floor
with ceil
does the trick in this case, but breaks a couple of your tests… I'll continue poking it and see if I can work out the correct logic.
Hi Mike,
I found myself choosing an approach to learn d3 by understanding its source code. I start with the functions in d3-array
, so far it goes smooth.
I noticed number.js
is used to create mean.js
, but not included as d3.number
. I read the code and it seems Number()
can do the same. So, I wonder what is the necessity of writing number.js
here.
Thanks
run npm install
after cloned this repo, a test fail occured:
# quantile(array, p) coerces values to numbers
ok 529 should be equal
ok 530 should be equal
ok 531 should be equal
ok 532 should be equal
not ok 533 should be equal
---
operator: equal
expected: 1309582800000
actual: 1309579200000
at: Test.<anonymous> (/Users/geekplux/Dropbox/project/github/d3-array/test/quantile-test.js:35:8)
...
ok 534 should be equal
is related to the time zone? 👀
I just fix it by changing:
test.equal(arrays.quantile(dates, 1 / 2), +new Date(2011, 6, 2, 13));
to
test.equal(arrays.quantile(dates, 1 / 2), +new Date(2011, 6, 2, 12));
Lets say you have a
let x = d3.scaleTime()
and you want to use it on an histogram
d3.histogram()
.value(d=> d.date)
.domain(x.domain())
.thresholds(x.ticks(d3.timeYear, 1))
in range.js:
line6 : range = new Array(n);
while (++i < n) {
range[i] = start + i * step;
}
I think it's should be : range = new Array(n + 1);
while (++i < (n + 1)) {
range[i] = start + i * step;
}
for example:
start = 65, stop = 85, step = 5
n = Math.max(0, Math.ceil((stop - start) / step)) | 0 = 4
if range = new Array(n)
return range = [65, 70, 75, 80]
however , range = [65, 70, 75, 80, 85] may be needed.
I wrote a Stack Overflow question about this, which documents the behaviour well. At the request of an answerer, I've created this issue to try and understand this behaviour better (or alternately register it as a bug if it is indeed such).
tl;dr — d3.histogram merges the last two bins when the last value coincides with the upper threshold. However, if you set histogram.domain()
to [extentMin -1, extentMax + 1]
, the problem seems to dissipate.
Example code (courtesy of Gerardo Furtado):
var data = d3.range(100);
const histogram = d3.histogram()
.value(d => d)
.thresholds(data);
var bins = histogram(data);
console.log(bins);
The last bin contains both 98 and 99, whereas the other bins only contain one value.
This isn't the case with the other thresholds:
var data = d3.range(100);
const histogram = d3.histogram()
.value(d => d)
.thresholds(d3.thresholdFreedmanDiaconis(data, d3.min(data), d3.max(data)));
var bins = histogram(data);
console.log(bins)
var data = d3.range(100);
const histogram = d3.histogram()
.value(d => d)
.thresholds(d3.thresholdScott(data, d3.min(data), d3.max(data)));
var bins = histogram(data);
console.log(bins)
var data = d3.range(100);
const histogram = d3.histogram()
.value(d => d)
.thresholds(d3.thresholdSturges(data, d3.min(data), d3.max(data)));
var bins = histogram(data);
console.log(bins)
Any idea what's going on with this?
Thanks!
It would be nice if median supported string values as well.
> d3.median([1,2,3])
2
> d3.median(['a', 'b', 'c'])
undefined
Would it make sense to support this in the same method or should this be a different method since d3.median
is designed for numbers.
CC @jakevdp
You can map(map) but you can’t set(set), which seems bad.
I tend to do something like this, but it’s pretty inefficient and requires integer weights:
var values = d3.merge(samples.map(s => d3.range(weight(s)).map(() => value(s))));
A related question is how to expose weighted quantiles as a scale.
Related d3/d3#1964.
Currently, min and max return the minimum and maximum value from an array of elements, using an optional value accessor.
In some cases, it might be nice to either return the minimum or maximum index of a given element, or the element itself. For example:
var array = [{foo: 42}, {foo: 91}];
min(array, function(d) { return d.foo; }); // 42
minIndex(array, function(d) { return d.foo; }); // 0
minElement(array, function(d) { return d.foo; }); // {foo: 42}
Protovis supported functionality similar to this with pv.min.index and pv.max.index. Perhaps it’d be sufficient to support just the minimum and maximum index, since that could be used to extract the element.
Hello,
I am trying to show a bar chart histogram with two different sets of data, displayed with two different series of bars. I have used this to calculate the bins and data that fits into those bins:
var max = d3.max(data);
var min = d3.min(data);
var x = d3
.scaleLinear()
.domain([min, max])
.range([0, 400]);
var histogram = d3
.histogram()
.domain(x.domain())
.thresholds(x.ticks(10));
return histogram(data);
Using this, calculating the second set of data gives me separate bins. However I would like the second set of data to be in the same bins as the first. Is this possible and if so, how would I go about it? I am having a rough time finding anything for specifying the exact bins I would like to use on the second set of data.
(I didn't realize I posted this in d3-array. Let me know if I should post somewhere else.)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.