Hello community,
here is the log from the commit of package python-dask for openSUSE:Factory checked in at 2018-09-11 17:17:52
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-dask (Old)
and /work/SRC/openSUSE:Factory/.python-dask.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Package is "python-dask"
Tue Sep 11 17:17:52 2018 rev:7 rq:634440 version:0.19.1
Changes:
--------
--- /work/SRC/openSUSE:Factory/python-dask/python-dask.changes 2018-09-04 22:56:24.821050827 +0200
+++ /work/SRC/openSUSE:Factory/.python-dask.new/python-dask.changes 2018-09-11 17:17:59.183346691 +0200
@@ -1,0 +2,31 @@
+Sat Sep 8 04:33:17 UTC 2018 - Arun Persaud
+
+- update to version 0.19.1:
+ * Array
+ + Don't enforce dtype if result has no dtype (:pr:`3928`) Matthew
+ Rocklin
+ + Fix NumPy issubtype deprecation warning (:pr:`3939`) Bruce Merry
+ + Fix arg reduction tokens to be unique with different arguments
+ (:pr:`3955`) Tobias de Jong
+ + Coerce numpy integers to ints in slicing code (:pr:`3944`) Yu
+ Feng
+ + Linalg.norm ndim along axis partial fix (:pr:`3933`) Tobias de
+ Jong
+ * Dataframe
+ + Deterministic DataFrame.set_index (:pr:`3867`) George Sakkis
+ + Fix divisions in read_parquet when dealing with filters #3831
+ #3930 (:pr:`3923`) (:pr:`3931`) @andrethrill
+ + Fixing returning type in categorical.as_known (:pr:`3888`)
+ Sriharsha Hatwar
+ + Fix DataFrame.assign for callables (:pr:`3919`) Tom Augspurger
+ + Include partitions with no width in repartition (:pr:`3941`)
+ Matthew Rocklin
+ + Don't constrict stage/k dtype in dataframe shuffle (:pr:`3942`)
+ Matthew Rocklin
+ * Documentation
+ + DOC: Add hint on how to render task graphs horizontally
+ (:pr:`3922`) Uwe Korn
+ + Add try-now button to main landing page (:pr:`3924`) Matthew
+ Rocklin
+
+-------------------------------------------------------------------
Old:
----
dask-0.19.0.tar.gz
New:
----
dask-0.19.1.tar.gz
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Other differences:
------------------
++++++ python-dask.spec ++++++
--- /var/tmp/diff_new_pack.klQLhF/_old 2018-09-11 17:17:59.943345525 +0200
+++ /var/tmp/diff_new_pack.klQLhF/_new 2018-09-11 17:17:59.947345519 +0200
@@ -22,7 +22,7 @@
# python(2/3)-distributed has a dependency loop with python(2/3)-dask
%bcond_with test_distributed
Name: python-dask
-Version: 0.19.0
+Version: 0.19.1
Release: 0
Summary: Minimal task scheduling abstraction
License: BSD-3-Clause
++++++ dask-0.19.0.tar.gz -> dask-0.19.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/PKG-INFO new/dask-0.19.1/PKG-INFO
--- old/dask-0.19.0/PKG-INFO 2018-08-30 18:41:43.000000000 +0200
+++ new/dask-0.19.1/PKG-INFO 2018-09-06 14:15:04.000000000 +0200
@@ -1,11 +1,12 @@
-Metadata-Version: 2.1
+Metadata-Version: 1.2
Name: dask
-Version: 0.19.0
+Version: 0.19.1
Summary: Parallel PyData with Task Scheduling
Home-page: http://github.com/dask/dask/
-Maintainer: Matthew Rocklin
-Maintainer-email: mrocklin@gmail.com
+Author: Matthew Rocklin
+Author-email: mrocklin@gmail.com
License: BSD
+Description-Content-Type: UNKNOWN
Description: Dask
====
@@ -44,9 +45,3 @@
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
-Provides-Extra: complete
-Provides-Extra: bag
-Provides-Extra: array
-Provides-Extra: delayed
-Provides-Extra: distributed
-Provides-Extra: dataframe
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/_version.py new/dask-0.19.1/dask/_version.py
--- old/dask-0.19.0/dask/_version.py 2018-08-30 18:41:43.000000000 +0200
+++ new/dask-0.19.1/dask/_version.py 2018-09-06 14:15:04.000000000 +0200
@@ -11,8 +11,8 @@
{
"dirty": false,
"error": null,
- "full-revisionid": "546760c5ced47ace30bca21fb125b7258c56035c",
- "version": "0.19.0"
+ "full-revisionid": "40b5d7b07c9db16e7cbd70be1bc8738ce94fe32c",
+ "version": "0.19.1"
}
''' # END VERSION_JSON
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/core.py new/dask-0.19.1/dask/array/core.py
--- old/dask-0.19.0/dask/array/core.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/core.py 2018-09-06 13:45:35.000000000 +0200
@@ -3552,7 +3552,7 @@
function = kwargs.pop('enforce_dtype_function')
result = function(*args, **kwargs)
- if dtype != result.dtype and dtype != object:
+ if hasattr(result, 'dtype') and dtype != result.dtype and dtype != object:
if not np.can_cast(result, dtype, casting='same_kind'):
raise ValueError("Inferred dtype from function %r was %r "
"but got %r, which can't be cast using "
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/creation.py new/dask-0.19.1/dask/array/creation.py
--- old/dask-0.19.0/dask/array/creation.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/creation.py 2018-09-06 13:45:35.000000000 +0200
@@ -943,7 +943,7 @@
result = result.map_blocks(
wrapped_pad_func,
- token="pad",
+ name="pad",
dtype=result.dtype,
pad_func=mode,
iaxis_pad_width=pad_width[d],
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/linalg.py new/dask-0.19.1/dask/array/linalg.py
--- old/dask-0.19.0/dask/array/linalg.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/linalg.py 2018-09-06 13:45:35.000000000 +0200
@@ -111,6 +111,7 @@
" 2. Have only one column of blocks\n\n"
"Note: This function (tsqr) supports QR decomposition in the case of\n"
"tall-and-skinny matrices (single column chunk/block; see qr)"
+ "Current shape: {},\nCurrent chunksize: {}".format(data.shape, data.chunksize)
)
token = '-' + tokenize(data, compute_svd)
@@ -1081,9 +1082,6 @@
@wraps(np.linalg.norm)
def norm(x, ord=None, axis=None, keepdims=False):
- if x.ndim > 2:
- raise ValueError("Improper number of dimensions to norm.")
-
if axis is None:
axis = tuple(range(x.ndim))
elif isinstance(axis, Number):
@@ -1091,6 +1089,9 @@
else:
axis = tuple(axis)
+ if len(axis) > 2:
+ raise ValueError("Improper number of dimensions to norm.")
+
if ord == "fro":
ord = None
if len(axis) == 1:
@@ -1104,6 +1105,8 @@
elif ord == "nuc":
if len(axis) == 1:
raise ValueError("Invalid norm order for vectors.")
+ if x.ndim > 2:
+ raise NotImplementedError("SVD based norm not implemented for ndim > 2")
r = svd(x)[1][None].sum(keepdims=keepdims)
elif ord == np.inf:
@@ -1111,29 +1114,41 @@
if len(axis) == 1:
r = r.max(axis=axis, keepdims=keepdims)
else:
- r = r.sum(axis=axis[1], keepdims=keepdims).max(keepdims=keepdims)
+ r = r.sum(axis=axis[1], keepdims=True).max(axis=axis[0], keepdims=True)
+ if keepdims is False:
+ r = r.squeeze(axis=axis)
elif ord == -np.inf:
r = abs(r)
if len(axis) == 1:
r = r.min(axis=axis, keepdims=keepdims)
else:
- r = r.sum(axis=axis[1], keepdims=keepdims).min(keepdims=keepdims)
+ r = r.sum(axis=axis[1], keepdims=True).min(axis=axis[0], keepdims=True)
+ if keepdims is False:
+ r = r.squeeze(axis=axis)
elif ord == 0:
if len(axis) == 2:
raise ValueError("Invalid norm order for matrices.")
- r = (r != 0).astype(r.dtype).sum(axis=0, keepdims=keepdims)
+ r = (r != 0).astype(r.dtype).sum(axis=axis, keepdims=keepdims)
elif ord == 1:
r = abs(r)
if len(axis) == 1:
r = r.sum(axis=axis, keepdims=keepdims)
else:
- r = r.sum(axis=axis[0], keepdims=keepdims).max(keepdims=keepdims)
+ r = r.sum(axis=axis[0], keepdims=True).max(axis=axis[1], keepdims=True)
+ if keepdims is False:
+ r = r.squeeze(axis=axis)
elif len(axis) == 2 and ord == -1:
- r = abs(r).sum(axis=axis[0], keepdims=keepdims).min(keepdims=keepdims)
+ r = abs(r).sum(axis=axis[0], keepdims=True).min(axis=axis[1], keepdims=True)
+ if keepdims is False:
+ r = r.squeeze(axis=axis)
elif len(axis) == 2 and ord == 2:
+ if x.ndim > 2:
+ raise NotImplementedError("SVD based norm not implemented for ndim > 2")
r = svd(x)[1][None].max(keepdims=keepdims)
elif len(axis) == 2 and ord == -2:
+ if x.ndim > 2:
+ raise NotImplementedError("SVD based norm not implemented for ndim > 2")
r = svd(x)[1][None].min(keepdims=keepdims)
else:
if len(axis) == 2:
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/reductions.py new/dask-0.19.1/dask/array/reductions.py
--- old/dask-0.19.0/dask/array/reductions.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/reductions.py 2018-09-06 13:45:35.000000000 +0200
@@ -614,7 +614,8 @@
"got '{0}'".format(axis))
# Map chunk across all blocks
- name = 'arg-reduce-chunk-{0}'.format(tokenize(chunk, axis))
+ name = 'arg-reduce-{0}'.format(tokenize(axis, x, chunk,
+ combine, split_every))
old = x.name
keys = list(product(*map(range, x.numblocks)))
offsets = list(product(*(accumulate(operator.add, bd[:-1], 0)
@@ -714,7 +715,8 @@
m = x.map_blocks(func, axis=axis, dtype=dtype)
- name = '%s-axis=%d-%s' % (func.__name__, axis, tokenize(x, dtype))
+ name = '{0}-{1}'.format(func.__name__, tokenize(func, axis, binop,
+ ident, x, dtype))
n = x.numblocks[axis]
full = slice(None, None, None)
slc = (full,) * axis + (slice(-1, None),) + (full,) * (x.ndim - axis - 1)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/slicing.py new/dask-0.19.1/dask/array/slicing.py
--- old/dask-0.19.0/dask/array/slicing.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/slicing.py 2018-09-06 13:45:35.000000000 +0200
@@ -69,7 +69,7 @@
return np.asanyarray(nonzero)
elif np.issubdtype(index_array.dtype, np.integer):
return index_array
- elif np.issubdtype(index_array.dtype, float):
+ elif np.issubdtype(index_array.dtype, np.floating):
int_index = index_array.astype(np.intp)
if np.allclose(index_array, int_index):
return int_index
@@ -391,7 +391,7 @@
ind = index - chunk_boundaries[i - 1]
else:
ind = index
- return {i: ind}
+ return {int(i): int(ind)}
assert isinstance(index, slice)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_array_core.py new/dask-0.19.1/dask/array/tests/test_array_core.py
--- old/dask-0.19.0/dask/array/tests/test_array_core.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_array_core.py 2018-09-06 13:45:35.000000000 +0200
@@ -3644,3 +3644,8 @@
da.argmax(Y, axis=0).compute()
assert not record
+
+
+def test_3925():
+ x = da.from_array(np.array(['a', 'b', 'c'], dtype=object), chunks=-1)
+ assert (x[0] == x[0]).compute(scheduler='sync')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_linalg.py new/dask-0.19.1/dask/array/tests/test_linalg.py
--- old/dask-0.19.0/dask/array/tests/test_linalg.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_linalg.py 2018-09-06 13:45:35.000000000 +0200
@@ -659,10 +659,6 @@
[(5,), (2,), 0],
[(5,), (2,), (0,)],
[(5, 6), (2, 2), None],
- [(5, 6), (2, 2), 0],
- [(5, 6), (2, 2), 1],
- [(5, 6), (2, 2), (0, 1)],
- [(5, 6), (2, 2), (1, 0)],
])
@pytest.mark.parametrize("norm", [
None,
@@ -685,6 +681,40 @@
assert_eq(a_r, d_r)
+@pytest.mark.slow
+@pytest.mark.parametrize("shape, chunks", [
+ [(5,), (2,)],
+ [(5, 3), (2, 2)],
+ [(4, 5, 3), (2, 2, 2)],
+ [(4, 5, 2, 3), (2, 2, 2, 2)],
+ [(2, 5, 2, 4, 3), (2, 2, 2, 2, 2)],
+])
+@pytest.mark.parametrize("norm", [
+ None,
+ 1,
+ -1,
+ np.inf,
+ -np.inf,
+])
+@pytest.mark.parametrize("keepdims", [
+ False,
+ True,
+])
+def test_norm_any_slice(shape, chunks, norm, keepdims):
+ a = np.random.random(shape)
+ d = da.from_array(a, chunks=chunks)
+
+ for firstaxis in range(len(shape)):
+ for secondaxis in range(len(shape)):
+ if firstaxis != secondaxis:
+ axis = (firstaxis, secondaxis)
+ else:
+ axis = firstaxis
+ a_r = np.linalg.norm(a, ord=norm, axis=axis, keepdims=keepdims)
+ d_r = da.linalg.norm(d, ord=norm, axis=axis, keepdims=keepdims)
+ assert_eq(a_r, d_r)
+
+
@pytest.mark.parametrize("shape, chunks, axis", [
[(5,), (2,), None],
[(5,), (2,), 0],
@@ -730,9 +760,30 @@
# Need one chunk on last dimension for svd.
if norm == "nuc" or norm == 2 or norm == -2:
- d = d.rechunk((d.chunks[0], d.shape[1]))
+ d = d.rechunk({-1: -1})
a_r = np.linalg.norm(a, ord=norm, axis=axis, keepdims=keepdims)
d_r = da.linalg.norm(d, ord=norm, axis=axis, keepdims=keepdims)
assert_eq(a_r, d_r)
+
+
+@pytest.mark.parametrize("shape, chunks, axis", [
+ [(3, 2, 4), (2, 2, 2), (1, 2)],
+ [(2, 3, 4, 5), (2, 2, 2, 2), (-1, -2)],
+])
+@pytest.mark.parametrize("norm", [
+ "nuc",
+ 2,
+ -2
+])
+@pytest.mark.parametrize("keepdims", [
+ False,
+ True,
+])
+def test_norm_implemented_errors(shape, chunks, axis, norm, keepdims):
+ a = np.random.random(shape)
+ d = da.from_array(a, chunks=chunks)
+ if len(shape) > 2 and len(axis) == 2:
+ with pytest.raises(NotImplementedError):
+ da.linalg.norm(d, ord=norm, axis=axis, keepdims=keepdims)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_optimization.py new/dask-0.19.1/dask/array/tests/test_optimization.py
--- old/dask-0.19.0/dask/array/tests/test_optimization.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_optimization.py 2018-09-06 13:45:35.000000000 +0200
@@ -273,3 +273,15 @@
assert dask.get(a, y.__dask_keys__()) == dask.get(b, y.__dask_keys__())
assert len(a) < len(b)
+
+
+def test_gh3937():
+ # test for github issue #3937
+ x = da.from_array([1, 2, 3.], (2,))
+ x = da.concatenate((x, [x[-1]]))
+ y = x.rechunk((2,))
+ # This will produce Integral type indices that are not ints (np.int64), failing
+ # the optimizer
+ y = da.coarsen(np.sum, y, {0: 2})
+ # How to trigger the optimizer explicitly?
+ y.compute()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/array/tests/test_reductions.py new/dask-0.19.1/dask/array/tests/test_reductions.py
--- old/dask-0.19.0/dask/array/tests/test_reductions.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/array/tests/test_reductions.py 2018-09-06 13:45:35.000000000 +0200
@@ -6,7 +6,7 @@
import dask.array as da
from dask.array.utils import assert_eq as _assert_eq, same_keys
from dask.core import get_deps
-from dask.context import set_options
+import dask.config as config
def assert_eq(a, b):
@@ -139,7 +139,7 @@
assert_eq(dfunc(a, 0), func(x, 0))
assert_eq(dfunc(a, 1), func(x, 1))
assert_eq(dfunc(a, 2), func(x, 2))
- with set_options(split_every=2):
+ with config.set(split_every=2):
assert_eq(dfunc(a), func(x))
assert_eq(dfunc(a, 0), func(x, 0))
assert_eq(dfunc(a, 1), func(x, 1))
@@ -368,7 +368,7 @@
def test_tree_reduce_set_options():
x = da.from_array(np.arange(242).reshape((11, 22)), chunks=(3, 4))
- with set_options(split_every={0: 2, 1: 3}):
+ with config.set(split_every={0: 2, 1: 3}):
assert_max_deps(x.sum(), 2 * 3)
assert_max_deps(x.sum(axis=0), 2)
@@ -487,3 +487,14 @@
da.topk(a, 5, axis=1, split_every=2))
assert_eq(a.argtopk(5, axis=1, split_every=2),
da.argtopk(a, 5, axis=1, split_every=2))
+
+
+@pytest.mark.parametrize('func', [da.cumsum, da.cumprod,
+ da.argmin, da.argmax,
+ da.min, da.max,
+ da.nansum, da.nanmax])
+def test_regres_3940(func):
+ a = da.ones((5,2), chunks=(2,2))
+ assert func(a).name != func(a + 1).name
+ assert func(a, axis=0).name != func(a).name
+ assert func(a, axis=0).name != func(a, axis=1).name
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/categorical.py new/dask-0.19.1/dask/dataframe/categorical.py
--- old/dask-0.19.0/dask/dataframe/categorical.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/categorical.py 2018-09-06 13:45:35.000000000 +0200
@@ -184,7 +184,7 @@
Keywords to pass on to the call to `compute`.
"""
if self.known:
- return self
+ return self._series
categories = self._property_map('categories').unique().compute(**kwargs)
return self.set_categories(categories.values)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/core.py new/dask-0.19.1/dask/dataframe/core.py
--- old/dask-0.19.0/dask/dataframe/core.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/core.py 2018-09-06 13:45:35.000000000 +0200
@@ -2527,6 +2527,9 @@
pd.compat.isidentifier(c)))
return list(o)
+ def _ipython_key_completions_(self):
+ return self.columns.tolist()
+
@property
def ndim(self):
""" Return dimensionality """
@@ -2678,6 +2681,9 @@
callable(v) or pd.api.types.is_scalar(v)):
raise TypeError("Column assignment doesn't support type "
"{0}".format(type(v).__name__))
+ if callable(v):
+ kwargs[k] = v(self)
+
pairs = list(sum(kwargs.items(), ()))
# Figure out columns of the output
@@ -4078,8 +4084,9 @@
else:
d[(out1, k)] = (methods.boundary_slice, (name, i - 1), low, b[j], False)
low = b[j]
+ if len(a) == i + 1 or a[i] < a[i + 1]:
+ j += 1
i += 1
- j += 1
c.append(low)
k += 1
@@ -4113,7 +4120,7 @@
while c[i] < b[j]:
tmp.append((out1, i))
i += 1
- if last_elem and c[i] == b[-1] and (b[-1] != b[-2] or j == len(b) - 1) and i < k:
+ while last_elem and c[i] == b[-1] and (b[-1] != b[-2] or j == len(b) - 1) and i < k:
# append if last split is not included
tmp.append((out1, i))
i += 1
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/io/parquet.py new/dask-0.19.1/dask/dataframe/io/parquet.py
--- old/dask-0.19.0/dask/dataframe/io/parquet.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/io/parquet.py 2018-09-06 13:45:35.000000000 +0200
@@ -285,12 +285,15 @@
if index_names and infer_divisions is not False:
index_name = meta.index.name
- minmax = fastparquet.api.sorted_partitioned_columns(pf)
+ try:
+ # is https://github.com/dask/fastparquet/pull/371 available in
+ # current fastparquet installation?
+ minmax = fastparquet.api.sorted_partitioned_columns(pf, filters)
+ except TypeError:
+ minmax = fastparquet.api.sorted_partitioned_columns(pf)
if index_name in minmax:
- divisions = (list(minmax[index_name]['min']) +
- [minmax[index_name]['max'][-1]])
- divisions = [divisions[i] for i, rg in enumerate(pf.row_groups)
- if rg in rgs] + [divisions[-1]]
+ divisions = minmax[index_name]
+ divisions = divisions['min'] + [divisions['max'][-1]]
else:
if infer_divisions is True:
raise ValueError(
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/io/tests/test_parquet.py new/dask-0.19.1/dask/dataframe/io/tests/test_parquet.py
--- old/dask-0.19.0/dask/dataframe/io/tests/test_parquet.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/io/tests/test_parquet.py 2018-09-06 13:45:35.000000000 +0200
@@ -819,6 +819,52 @@
assert len(ddf2) > 0
+def test_divisions_read_with_filters(tmpdir):
+ check_fastparquet()
+ tmpdir = str(tmpdir)
+ #generate dataframe
+ size = 100
+ categoricals = []
+ for value in ['a', 'b', 'c', 'd']:
+ categoricals += [value] * int(size / 4)
+ df = pd.DataFrame({'a': categoricals,
+ 'b': np.random.random(size=size),
+ 'c': np.random.randint(1, 5, size=size)})
+ d = dd.from_pandas(df, npartitions=4)
+ #save it
+ d.to_parquet(tmpdir, partition_on=['a'], engine='fastparquet')
+ #read it
+ out = dd.read_parquet(tmpdir,
+ engine='fastparquet',
+ filters=[('a', '==', 'b')])
+ #test it
+ expected_divisions = (25, 49)
+ assert out.divisions == expected_divisions
+
+
+def test_divisions_are_known_read_with_filters(tmpdir):
+ check_fastparquet()
+ tmpdir = str(tmpdir)
+ #generate dataframe
+ df = pd.DataFrame({'unique': [0, 0, 1, 1, 2, 2, 3, 3],
+ 'id': ['id1', 'id2',
+ 'id1', 'id2',
+ 'id1', 'id2',
+ 'id1', 'id2']},
+ index=[0, 0, 1, 1, 2, 2, 3, 3])
+ d = dd.from_pandas(df, npartitions=2)
+ #save it
+ d.to_parquet(tmpdir, partition_on=['id'], engine='fastparquet')
+ #read it
+ out = dd.read_parquet(tmpdir,
+ engine='fastparquet',
+ filters=[('id', '==', 'id1')])
+ #test it
+ assert out.known_divisions
+ expected_divisions = (0, 2, 3)
+ assert out.divisions == expected_divisions
+
+
def test_read_from_fastparquet_parquetfile(tmpdir):
check_fastparquet()
fn = str(tmpdir)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/partitionquantiles.py new/dask-0.19.1/dask/dataframe/partitionquantiles.py
--- old/dask-0.19.0/dask/dataframe/partitionquantiles.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/partitionquantiles.py 2018-09-06 13:45:35.000000000 +0200
@@ -436,7 +436,7 @@
qs = np.linspace(0, 1, npartitions + 1)
token = tokenize(df, qs, upsample)
if random_state is None:
- random_state = hash(token) % np.iinfo(np.int32).max
+ random_state = int(token, 16) % np.iinfo(np.int32).max
state_data = random_state_data(df.npartitions, random_state)
df_keys = df.__dask_keys__()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/shuffle.py new/dask-0.19.1/dask/dataframe/shuffle.py
--- old/dask-0.19.0/dask/dataframe/shuffle.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/shuffle.py 2018-09-06 13:45:35.000000000 +0200
@@ -456,12 +456,9 @@
c = ind._values
typ = np.min_scalar_type(npartitions * 2)
- npartitions, k, stage = [np.array(x, dtype=np.min_scalar_type(x))[()]
- for x in [npartitions, k, stage]]
-
c = np.mod(c, npartitions).astype(typ, copy=False)
- c = np.floor_divide(c, k ** stage, out=c)
- c = np.mod(c, k, out=c)
+ np.floor_divide(c, k ** stage, out=c)
+ np.mod(c, k, out=c)
indexer, locations = groupsort_indexer(c.astype(np.int64), k)
df2 = df.take(indexer)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_categorical.py new/dask-0.19.1/dask/dataframe/tests/test_categorical.py
--- old/dask-0.19.0/dask/dataframe/tests/test_categorical.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_categorical.py 2018-09-06 13:45:35.000000000 +0200
@@ -271,6 +271,14 @@
assert_eq(left, pd.Index(right) if isinstance(right, np.ndarray) else right)
+def test_return_type_known_categories():
+ df = pd.DataFrame({"A": ['a', 'b', 'c']})
+ df['A'] = df['A'].astype('category')
+ dask_df = dd.from_pandas(df, 2)
+ ret_type = dask_df.A.cat.as_known()
+ assert isinstance(ret_type, dd.core.Series)
+
+
class TestCategoricalAccessor:
@pytest.mark.parametrize('series', cat_series)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_dataframe.py new/dask-0.19.1/dask/dataframe/tests/test_dataframe.py
--- old/dask-0.19.0/dask/dataframe/tests/test_dataframe.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_dataframe.py 2018-09-06 13:45:35.000000000 +0200
@@ -925,6 +925,13 @@
d.assign(foo=d_unknown.a)
+def test_assign_callable():
+ df = dd.from_pandas(pd.DataFrame({"A": range(10)}), npartitions=2)
+ a = df.assign(B=df.A.shift())
+ b = df.assign(B=lambda x: x.A.shift())
+ assert_eq(a, b)
+
+
def test_map():
assert_eq(d.a.map(lambda x: x + 1), full.a.map(lambda x: x + 1))
lk = dict((v, v + 1) for v in full.a.values)
@@ -2718,6 +2725,16 @@
assert_eq(df[cols], ddf[cols])
+def test_ipython_completion():
+ df = pd.DataFrame({'a': [1], 'b': [2]})
+ ddf = dd.from_pandas(df, npartitions=1)
+
+ completions = ddf._ipython_key_completions_()
+ assert 'a' in completions
+ assert 'b' in completions
+ assert 'c' not in completions
+
+
def test_diff():
df = pd.DataFrame(np.random.randn(100, 5), columns=list('abcde'))
ddf = dd.from_pandas(df, 5)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_multi.py new/dask-0.19.1/dask/dataframe/tests/test_multi.py
--- old/dask-0.19.0/dask/dataframe/tests/test_multi.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_multi.py 2018-09-06 13:45:35.000000000 +0200
@@ -1308,3 +1308,29 @@
joined = ddf2.join(ddf2, rsuffix='r')
assert joined.divisions == (1, 1)
joined.compute()
+
+
+def test_repartition_repeated_divisions():
+ df = pd.DataFrame({'x': [0, 0, 0, 0]})
+ ddf = dd.from_pandas(df, npartitions=2).set_index('x')
+
+ ddf2 = ddf.repartition(divisions=(0, 0), force=True)
+ assert_eq(ddf2, df.set_index('x'))
+
+
+def test_multi_duplicate_divisions():
+ df1 = pd.DataFrame({'x': [0, 0, 0, 0]})
+ df2 = pd.DataFrame({'x': [0]})
+
+ ddf1 = dd.from_pandas(df1, npartitions=2).set_index('x')
+ ddf2 = dd.from_pandas(df2, npartitions=1).set_index('x')
+ assert ddf1.npartitions == 2
+ assert len(ddf1) == len(df1)
+
+ r1 = ddf1.merge(ddf2, how='left', left_index=True, right_index=True)
+
+ sf1 = df1.set_index('x')
+ sf2 = df2.set_index('x')
+ r2 = sf1.merge(sf2, how='left', left_index=True, right_index=True)
+
+ assert_eq(r1, r2)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask/dataframe/tests/test_shuffle.py new/dask-0.19.1/dask/dataframe/tests/test_shuffle.py
--- old/dask-0.19.0/dask/dataframe/tests/test_shuffle.py 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/dask/dataframe/tests/test_shuffle.py 2018-09-06 13:45:35.000000000 +0200
@@ -1,9 +1,11 @@
import os
+import sys
import pandas as pd
import pytest
import pickle
import numpy as np
import string
+import multiprocessing as mp
from copy import copy
import pandas.util.testing as tm
@@ -358,6 +360,28 @@
ddf.set_index('y', divisions=['a', 'b', 'd', 'c'], sorted=True)
+@pytest.mark.slow
+@pytest.mark.skipif(sys.version_info < (3, 4),
+ reason="multiprocessing spawn only after Py3.4")
+def test_set_index_consistent_divisions():
+ # See https://github.com/dask/dask/issues/3867
+ df = pd.DataFrame({'x': np.random.random(100),
+ 'y': np.random.random(100) // 0.2},
+ index=np.random.random(100))
+ ddf = dd.from_pandas(df, npartitions=4)
+ ddf = ddf.clear_divisions()
+
+ ctx = mp.get_context('spawn')
+ pool = ctx.Pool(processes=8)
+ results = [pool.apply_async(_set_index, (ddf, 'x')) for _ in range(100)]
+ divisions_set = set(result.get() for result in results)
+ assert len(divisions_set) == 1
+
+
+def _set_index(df, *args, **kwargs):
+ return df.set_index(*args, **kwargs).divisions
+
+
@pytest.mark.parametrize('shuffle', ['disk', 'tasks'])
def test_set_index_reduces_partitions_small(shuffle):
df = pd.DataFrame({'x': np.random.random(100)})
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/dask.egg-info/PKG-INFO new/dask-0.19.1/dask.egg-info/PKG-INFO
--- old/dask-0.19.0/dask.egg-info/PKG-INFO 2018-08-30 18:41:43.000000000 +0200
+++ new/dask-0.19.1/dask.egg-info/PKG-INFO 2018-09-06 14:15:04.000000000 +0200
@@ -1,11 +1,12 @@
-Metadata-Version: 2.1
+Metadata-Version: 1.2
Name: dask
-Version: 0.19.0
+Version: 0.19.1
Summary: Parallel PyData with Task Scheduling
Home-page: http://github.com/dask/dask/
-Maintainer: Matthew Rocklin
-Maintainer-email: mrocklin@gmail.com
+Author: Matthew Rocklin
+Author-email: mrocklin@gmail.com
License: BSD
+Description-Content-Type: UNKNOWN
Description: Dask
====
@@ -44,9 +45,3 @@
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Python: >=2.7, !=3.0.*, !=3.1.*, !=3.2.*, !=3.3.*, !=3.4.*
-Provides-Extra: complete
-Provides-Extra: bag
-Provides-Extra: array
-Provides-Extra: delayed
-Provides-Extra: distributed
-Provides-Extra: dataframe
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/docs/source/_static/main-page.css new/dask-0.19.1/docs/source/_static/main-page.css
--- old/dask-0.19.0/docs/source/_static/main-page.css 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/docs/source/_static/main-page.css 2018-09-06 13:45:35.000000000 +0200
@@ -22,10 +22,10 @@
border-radius: 0.3rem;
}
.navbar li:hover {
- background-color: #ECB172;
+ background-color: #FDA061;
}
.navbar li .nav-link{
- color: #ECB172;
+ color: #FDA061;
}
.navbar li:hover .nav-link{
color: #212529;
@@ -36,11 +36,11 @@
}
.dropdown-item {
- color: #ECB172;
+ color: #FDA061;
}
.dropdown-item:hover {
- background-color: #ECB172D0;
+ background-color: #FDA061D0;
}
.hero {
@@ -56,15 +56,26 @@
.outline-dask {
- color: #ECB172;
+ color: #FDA061;
background-color: transparent;
- border-color: #ECB172;
+ border-color: #FDA061;
}
+
.outline-dask:hover {
color: #212529;
- background-color: #ECB172;
- border-color: #ECB172;
+ background-color: #FDA061;
+ border-color: #FDA061;
+}
+
+.solid-dask {
+ color: #212529;
+ background-color: #FDA061;
+}
+
+.solid-dask:hover {
+ color: #212529;
+ background-color: #EC9050;
}
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/docs/source/changelog.rst new/dask-0.19.1/docs/source/changelog.rst
--- old/dask-0.19.0/docs/source/changelog.rst 2018-08-30 18:39:37.000000000 +0200
+++ new/dask-0.19.1/docs/source/changelog.rst 2018-09-06 14:12:47.000000000 +0200
@@ -1,7 +1,7 @@
Changelog
=========
-0.19.1 / YYYY-MM-DD
+0.19.2 / YYYY-MM-DD
-------------------
Array
@@ -25,6 +25,35 @@
-
+0.19.1 / 2018-09-06
+-------------------
+
+Array
++++++
+
+- Don't enforce dtype if result has no dtype (:pr:`3928`) `Matthew Rocklin`_
+- Fix NumPy issubtype deprecation warning (:pr:`3939`) `Bruce Merry`_
+- Fix arg reduction tokens to be unique with different arguments (:pr:`3955`) `Tobias de Jong`_
+- Coerce numpy integers to ints in slicing code (:pr:`3944`) `Yu Feng`_
+- Linalg.norm ndim along axis partial fix (:pr:`3933`) `Tobias de Jong`_
+
+Dataframe
++++++++++
+
+- Deterministic DataFrame.set_index (:pr:`3867`) `George Sakkis`_
+- Fix divisions in read_parquet when dealing with filters #3831 #3930 (:pr:`3923`) (:pr:`3931`) `@andrethrill`_
+- Fixing returning type in categorical.as_known (:pr:`3888`) `Sriharsha Hatwar`_
+- Fix DataFrame.assign for callables (:pr:`3919`) `Tom Augspurger`_
+- Include partitions with no width in repartition (:pr:`3941`) `Matthew Rocklin`_
+- Don't constrict stage/k dtype in dataframe shuffle (:pr:`3942`) `Matthew Rocklin`_
+
+Documentation
++++++++++++++
+
+- DOC: Add hint on how to render task graphs horizontally (:pr:`3922`) `Uwe Korn`_
+- Add try-now button to main landing page (:pr:`3924`) `Matthew Rocklin`_
+
+
0.19.0 / 2018-08-29
-------------------
@@ -32,7 +61,7 @@
+++++
- Fix argtopk split_every bug (:pr:`3810`) `Guido Imperiale`_
-- Ensure result computing dask.array.isnull(`) always gives a numpy array (:pr:`3825`) `Stephan Hoyer`_
+- Ensure result computing dask.array.isnull() always gives a numpy array (:pr:`3825`) `Stephan Hoyer`_
- Support concatenate for scipy.sparse in dask array (:pr:`3836`) `Matthew Rocklin`_
- Fix argtopk on 32-bit systems. (:pr:`3823`) `Elliott Sales de Andrade`_
- Normalize keys in rechunk (:pr:`3820`) `Matthew Rocklin`_
@@ -1366,3 +1395,8 @@
.. _`Hans Moritz Günther`: https://github.com/hamogu
.. _`@rtobar`: https://github.com/rtobar
.. _`Julia Signell`: https://github.com/jsignell
+.. _`Sriharsha Hatwar`: https://github.com/Sriharsha-hatwar
+.. _`Bruce Merry`: https://github.com/bmerry
+.. _`Joe Hamman`: https://github.com/jhamman
+.. _`Robert Sare`: https://github.com/rmsare
+.. _`Jeremy Chan`: https://github.com/convexset
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/docs/source/graphviz.rst new/dask-0.19.1/docs/source/graphviz.rst
--- old/dask-0.19.0/docs/source/graphviz.rst 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/docs/source/graphviz.rst 2018-09-06 13:45:35.000000000 +0200
@@ -18,6 +18,10 @@
except that rather than computing the result,
they produce an image of the task graph.
+By default the task graph is rendered from top to bottom.
+In the case that you prefer to visualize it from left to right, pass
+``rankdir="LR"`` as a keyword argument to ``.visualize``.
+
.. code-block:: python
import dask.array as da
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-0.19.0/docs/source/index.html new/dask-0.19.1/docs/source/index.html
--- old/dask-0.19.0/docs/source/index.html 2018-08-30 18:28:02.000000000 +0200
+++ new/dask-0.19.1/docs/source/index.html 2018-09-06 13:45:35.000000000 +0200
@@ -67,6 +67,7 @@
enabling performance at scale for the tools you love
</p>
<a class="btn outline-dask btn-lg" href="docs.html">Learn More</a>
+ <a class="btn solid-dask btn-lg" href="https://mybinder.org/v2/gh/dask/dask-examples/master" role="button">Try Now »</a>
</div>
<div class="product-device box-shadow d-none d-md-block"></div>
<div class="product-device product-device-2 box-shadow d-none d-md-block"></div>