commit python-dask for openSUSE:Factory

30 Nov 2019

Hello community,

here is the log from the commit of package python-dask for openSUSE:Factory checked in at 2019-11-30 10:39:24
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/python-dask (Old)
 and      /work/SRC/openSUSE:Factory/.python-dask.new.26869 (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Package is "python-dask"

Sat Nov 30 10:39:24 2019 rev:24 rq:751835 version:2.8.1

Changes:
--------

--- /work/SRC/openSUSE:Factory/python-dask/python-dask.changes	2019-11-17 19:23:28.898857695 +0100
+++ /work/SRC/openSUSE:Factory/.python-dask.new.26869/python-dask.changes	2019-11-30 10:40:48.708147466 +0100
@@ -1,0 +2,29 @@
+Sun Nov 24 17:35:04 UTC 2019 - Arun Persaud 
+
+- update to version 2.8.1:
+  * Array
+    + Use auto rechunking in da.rechunk if no value given (:pr:`5605`)
+      Matthew Rocklin
+  * Core
+    + Add simple action to activate GH actions (:pr:`5619`) James
+      Bourbeau
+  * DataFrame
+    + Fix "file_path_0" bug in aggregate_row_groups (:pr:`5627`)
+      Richard J Zamora
+    + Add chunksize argument to read_parquet (:pr:`5607`) Richard J
+      Zamora
+    + Change test_repartition_npartitions to support arch64
+      architecture (:pr:`5620`) ossdev07
+    + Categories lost after groupby + agg (:pr:`5423`) Oliver Hofkens
+    + Fixed relative path issue with parquet metadata file
+      (:pr:`5608`) Nuno Gomes Silva
+    + Enable gpu-backed covariance/correlation in dataframes
+      (:pr:`5597`) Richard J Zamora
+  * Documentation
+    + Fix institutional faq and unknown doc warnings (:pr:`5616`)
+      James Bourbeau
+    + Add doc for some utils (:pr:`5609`) Tom Augspurger
+    + Removes html_extra_path (:pr:`5614`) James Bourbeau
+    + Fixed See Also referencence (:pr:`5612`) Tom Augspurger
+
+-------------------------------------------------------------------

Old:
----
  dask-2.8.0.tar.gz

New:
----
  dask-2.8.1.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ python-dask.spec ++++++
--- /var/tmp/diff_new_pack.cjxRzs/_old	2019-11-30 10:40:49.172147411 +0100
+++ /var/tmp/diff_new_pack.cjxRzs/_new	2019-11-30 10:40:49.172147411 +0100
@@ -1,7 +1,7 @@
 #
 # spec file for package python-dask
 #
-# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
+# Copyright (c) 2019 SUSE LLC
 #
 # All modifications and additions to the file contributed by third parties
 # remain the property of their copyright owners, unless otherwise agreed
@@ -27,7 +27,7 @@
 %endif
 %define         skip_python2 1
 Name:           python-dask%{psuffix}
-Version:        2.8.0
+Version:        2.8.1
 Release:        0
 Summary:        Minimal task scheduling abstraction
 License:        BSD-3-Clause

++++++ dask-2.8.0.tar.gz -> dask-2.8.1.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/PKG-INFO new/dask-2.8.1/PKG-INFO
--- old/dask-2.8.0/PKG-INFO	2019-11-14 23:57:18.000000000 +0100
+++ new/dask-2.8.1/PKG-INFO	2019-11-23 05:31:55.000000000 +0100
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: dask
-Version: 2.8.0
+Version: 2.8.1
 Summary: Parallel PyData with Task Scheduling
 Home-page: https://github.com/dask/dask/
 Maintainer: Matthew Rocklin
@@ -43,10 +43,10 @@
 Classifier: Programming Language :: Python :: 3.6
 Classifier: Programming Language :: Python :: 3.7
 Requires-Python: >=3.6
-Provides-Extra: complete
 Provides-Extra: array
-Provides-Extra: diagnostics
-Provides-Extra: dataframe
 Provides-Extra: bag
-Provides-Extra: delayed
+Provides-Extra: dataframe
 Provides-Extra: distributed
+Provides-Extra: diagnostics
+Provides-Extra: delayed
+Provides-Extra: complete
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/_version.py new/dask-2.8.1/dask/_version.py
--- old/dask-2.8.0/dask/_version.py	2019-11-14 23:57:18.000000000 +0100
+++ new/dask-2.8.1/dask/_version.py	2019-11-23 05:31:55.000000000 +0100
@@ -11,8 +11,8 @@
 {
  "dirty": false,
  "error": null,
- "full-revisionid": "539d1e27a8ccce01de5f3d49f1748057c27552f2",
- "version": "2.8.0"
+ "full-revisionid": "eee9b78da60c24897e1df984f01dd9f36245fcb1",
+ "version": "2.8.1"
 }
 '''  # END VERSION_JSON
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/array/core.py new/dask-2.8.1/dask/array/core.py
--- old/dask-2.8.0/dask/array/core.py	2019-11-13 21:17:45.000000000 +0100
+++ new/dask-2.8.1/dask/array/core.py	2019-11-23 05:10:40.000000000 +0100
@@ -2164,7 +2164,7 @@
 
         return squeeze(self, axis)
 
-    def rechunk(self, chunks, threshold=None, block_size_limit=None):
+    def rechunk(self, chunks="auto", threshold=None, block_size_limit=None):
         """ See da.rechunk for docstring """
         from . import rechunk  # avoid circular import
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/array/linalg.py new/dask-2.8.1/dask/array/linalg.py
--- old/dask-2.8.0/dask/array/linalg.py	2019-11-05 22:48:29.000000000 +0100
+++ new/dask-2.8.1/dask/array/linalg.py	2019-11-20 16:56:28.000000000 +0100
@@ -96,9 +96,12 @@
 
     See Also
     --------
-    dask.array.linalg.qr - Powered by this algorithm
-    dask.array.linalg.svd - Powered by this algorithm
-    dask.array.linalg.sfqr - Variant for short-and-fat arrays
+    dask.array.linalg.qr
+        Powered by this algorithm
+    dask.array.linalg.svd
+        Powered by this algorithm
+    dask.array.linalg.sfqr
+        Variant for short-and-fat arrays
     """
     nr, nc = len(data.chunks[0]), len(data.chunks[1])
     cr_max, cc = max(data.chunks[0]), data.chunks[1][0]
@@ -519,8 +522,10 @@
 
     See Also
     --------
-    dask.array.linalg.qr - Main user API that uses this function
-    dask.array.linalg.tsqr - Variant for tall-and-skinny case
+    dask.array.linalg.qr
+        Main user API that uses this function
+    dask.array.linalg.tsqr
+        Variant for tall-and-skinny case
     """
     nr, nc = len(data.chunks[0]), len(data.chunks[1])
     cr, cc = data.chunks[0][0], data.chunks[1][0]
@@ -742,21 +747,22 @@
     """
     Compute the qr factorization of a matrix.
 
-    Examples
-    --------
-
-    >>> q, r = da.linalg.qr(x)  # doctest: +SKIP
+    Parameters
+    ----------
+    a : Array
 
     Returns
     -------
-
     q:  Array, orthonormal
     r:  Array, upper-triangular
 
-    See Also
+    Examples
     --------
+    >>> q, r = da.linalg.qr(x)  # doctest: +SKIP
 
-    np.linalg.qr: Equivalent NumPy Operation
+    See Also
+    --------
+    numpy.linalg.qr: Equivalent NumPy Operation
     dask.array.linalg.tsqr: Implementation for tall-and-skinny arrays
     dask.array.linalg.sfqr: Implementation for short-and-fat arrays
     """
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/array/rechunk.py new/dask-2.8.1/dask/array/rechunk.py
--- old/dask-2.8.0/dask/array/rechunk.py	2019-10-11 05:14:07.000000000 +0200
+++ new/dask-2.8.1/dask/array/rechunk.py	2019-11-20 16:14:05.000000000 +0100
@@ -181,7 +181,7 @@
     return cross
 
 
-def rechunk(x, chunks, threshold=None, block_size_limit=None):
+def rechunk(x, chunks="auto", threshold=None, block_size_limit=None):
     """
     Convert blocks in dask array x for new chunks.
 
@@ -189,13 +189,14 @@
     ----------
     x: dask array
         Array to be rechunked.
-    chunks:  int, tuple or dict
+    chunks:  int, tuple, dict or str, optional
         The new block dimensions to create. -1 indicates the full size of the
-        corresponding dimension.
-    threshold: int
+        corresponding dimension. Default is "auto" which automatically
+        determines chunk sizes.
+    threshold: int, optional
         The graph growth factor under which we don't bother introducing an
         intermediate step.
-    block_size_limit: int
+    block_size_limit: int, optional
         The maximum block size (in bytes) we want to produce
         Defaults to the configuration value ``array.chunk-size``
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/array/tests/test_array_core.py new/dask-2.8.1/dask/array/tests/test_array_core.py
--- old/dask-2.8.0/dask/array/tests/test_array_core.py	2019-11-13 21:17:45.000000000 +0100
+++ new/dask-2.8.1/dask/array/tests/test_array_core.py	2019-11-20 16:14:05.000000000 +0100
@@ -4224,3 +4224,10 @@
         y[:3, :]
     y.compute_chunk_sizes()
     y[:3, :]
+
+
+def test_rechunk_auto():
+    x = da.ones(10, chunks=(1,))
+    y = x.rechunk()
+
+    assert y.npartitions == 1
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/core.py new/dask-2.8.1/dask/dataframe/core.py
--- old/dask-2.8.0/dask/dataframe/core.py	2019-11-13 18:07:07.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/core.py	2019-11-23 05:10:40.000000000 +0100
@@ -43,7 +43,7 @@
     skip_doctest,
 )
 from ..array.core import Array, normalize_arg
-from ..array.utils import empty_like_safe
+from ..array.utils import empty_like_safe, zeros_like_safe
 from ..blockwise import blockwise, Blockwise
 from ..base import DaskMethodsMixin, tokenize, dont_optimize, is_dask_collection
 from ..delayed import delayed, Delayed, unpack_collections
@@ -5125,9 +5125,9 @@
     """Chunk part of a covariance or correlation computation
     """
     shape = (df.shape[1], df.shape[1])
-    sums = np.zeros(shape)
-    counts = np.zeros(shape)
     df = df.astype("float64", copy=False)
+    sums = zeros_like_safe(df.values, shape=shape)
+    counts = zeros_like_safe(df.values, shape=shape)
     for idx, col in enumerate(df):
         mask = df.iloc[:, idx].notnull()
         sums[idx] = df[mask].sum().values
@@ -5138,27 +5138,34 @@
         with warnings.catch_warnings(record=True):
             warnings.simplefilter("always")
             mu = (sums / counts).T
-        m = np.zeros(shape)
+        m = zeros_like_safe(df.values, shape=shape)
         mask = df.isnull().values
         for idx, x in enumerate(df):
-            # Use .values to get the ndarray for the ufunc.
-            mu_discrepancy = np.subtract.outer(df.iloc[:, idx].values, mu[idx]) ** 2
+            # Avoid using ufunc.outer (not supported by cupy)
+            mu_discrepancy = (
+                np.subtract(df.iloc[:, idx].values[:, None], mu[idx][None, :]) ** 2
+            )
             mu_discrepancy[mask] = np.nan
             m[idx] = np.nansum(mu_discrepancy, axis=0)
         m = m.T
         dtype.append(("m", m.dtype))
 
-    out = np.empty(counts.shape, dtype=dtype)
-    out["sum"] = sums
-    out["count"] = counts
-    out["cov"] = cov * (counts - 1)
+    out = {"sum": sums, "count": counts, "cov": cov * (counts - 1)}
     if corr:
         out["m"] = m
     return out
 
 
-def cov_corr_combine(data, corr=False):
-    data = np.concatenate(data).reshape((len(data),) + data[0].shape)
+def cov_corr_combine(data_in, corr=False):
+
+    data = {"sum": None, "count": None, "cov": None}
+    if corr:
+        data["m"] = None
+
+    for k in data.keys():
+        data[k] = [d[k] for d in data_in]
+        data[k] = np.concatenate(data[k]).reshape((len(data[k]),) + data[k][0].shape)
+
     sums = np.nan_to_num(data["sum"])
     counts = data["count"]
 
@@ -5175,10 +5182,7 @@
             (n1 * n2) / (n1 + n2) * (d * d.transpose((0, 2, 1))), 0
         ) + np.nansum(data["cov"], 0)
 
-    out = np.empty(C.shape, dtype=data.dtype)
-    out["sum"] = cum_sums[-1]
-    out["count"] = cum_counts[-1]
-    out["cov"] = C
+    out = {"sum": cum_sums[-1], "count": cum_counts[-1], "cov": C}
 
     if corr:
         nobs = np.where(cum_counts[-1], cum_counts[-1], np.nan)
@@ -5202,7 +5206,7 @@
     with np.errstate(invalid="ignore", divide="ignore"):
         mat = C / den
     if scalar:
-        return mat[0, 1]
+        return float(mat[0, 1])
     return pd.DataFrame(mat, columns=cols, index=cols)
 
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/io/parquet/arrow.py new/dask-2.8.1/dask/dataframe/io/parquet/arrow.py
--- old/dask-2.8.0/dask/dataframe/io/parquet/arrow.py	2019-11-05 22:48:30.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/io/parquet/arrow.py	2019-11-23 05:12:29.000000000 +0100
@@ -77,8 +77,7 @@
             )
         else:
             base, fns = _analyze_paths(paths, fs)
-            relpaths = [path.replace(base, "").lstrip("/") for path in paths]
-            if "_metadata" in relpaths:
+            if "_metadata" in fns:
                 # We have a _metadata file, lets use it
                 dataset = pq.ParquetDataset(
                     base + fs.sep + "_metadata",
@@ -91,36 +90,32 @@
                 # Will need to pass a list of paths to read_partition
                 dataset = pq.ParquetDataset(paths[0], filesystem=fs, **dataset_kwargs)
                 parts = [base + fs.sep + fn for fn in fns]
-    else:
-        if fs.isdir(paths[0]):
-            # This is a directory, check for _metadata, then _common_metadata
-            allpaths = fs.glob(paths[0] + fs.sep + "*")
-            base, fns = _analyze_paths(allpaths, fs)
-            relpaths = [path.replace(base, "").lstrip("/") for path in allpaths]
-            if "_metadata" in relpaths and "validate_schema" not in dataset_kwargs:
-                dataset_kwargs["validate_schema"] = False
-            if "_metadata" in relpaths or gather_statistics is not False:
-                # Let arrow do its thing (use _metadata or scan files)
+    elif fs.isdir(paths[0]):
+        # This is a directory, check for _metadata, then _common_metadata
+        allpaths = fs.glob(paths[0] + fs.sep + "*")
+        base, fns = _analyze_paths(allpaths, fs)
+        if "_metadata" in fns and "validate_schema" not in dataset_kwargs:
+            dataset_kwargs["validate_schema"] = False
+        if "_metadata" in fns or gather_statistics is not False:
+            # Let arrow do its thing (use _metadata or scan files)
+            dataset = pq.ParquetDataset(
+                paths, filesystem=fs, filters=filters, **dataset_kwargs
+            )
+        else:
+            # Use _common_metadata file if it is available.
+            # Otherwise, just use 0th file
+            if "_common_metadata" in fns:
                 dataset = pq.ParquetDataset(
-                    paths, filesystem=fs, filters=filters, **dataset_kwargs
+                    base + fs.sep + "_common_metadata", filesystem=fs, **dataset_kwargs
                 )
             else:
-                # Use _common_metadata file if it is available.
-                # Otherwise, just use 0th file
-                if "_common_metadata" in relpaths:
-                    dataset = pq.ParquetDataset(
-                        base + fs.sep + "_common_metadata",
-                        filesystem=fs,
-                        **dataset_kwargs,
-                    )
-                else:
-                    dataset = pq.ParquetDataset(
-                        allpaths[0], filesystem=fs, **dataset_kwargs
-                    )
-                parts = [base + fs.sep + fn for fn in fns]
-        else:
-            # There is only one file to read
-            dataset = pq.ParquetDataset(paths, filesystem=fs, **dataset_kwargs)
+                dataset = pq.ParquetDataset(
+                    allpaths[0], filesystem=fs, **dataset_kwargs
+                )
+            parts = [base + fs.sep + fn for fn in fns]
+    else:
+        # There is only one file to read
+        dataset = pq.ParquetDataset(paths, filesystem=fs, **dataset_kwargs)
     return parts, dataset
 
 
@@ -268,6 +263,7 @@
                                 }
                             )
                         s["columns"].append(d)
+                s["total_byte_size"] = row_group.total_byte_size
                 stats.append(s)
         else:
             stats = None
@@ -294,10 +290,15 @@
             if split_row_groups and row_groups_per_piece:
                 # TODO: This block can be removed after ARROW-2801
                 parts = []
+                rg_tot = 0
                 for i, piece in enumerate(pieces):
                     num_row_groups = row_groups_per_piece[i]
                     for rg in range(num_row_groups):
                         parts.append((piece.path, rg, piece.partition_keys))
+                        # Setting file_path here, because it may be
+                        # missing from the row-group/column-chunk stats
+                        stats[rg_tot]["file_path_0"] = piece.path
+                        rg_tot += 1
             else:
                 parts = [
                     (piece.path, piece.row_group, piece.partition_keys)
@@ -326,12 +327,14 @@
             )
         else:
             # `piece` contains (path, row_group, partition_keys)
+            (path, row_group, partition_keys) = piece
             piece = pq.ParquetDatasetPiece(
-                piece[0],
-                row_group=piece[1],
-                partition_keys=piece[2],
+                path,
+                row_group=row_group,
+                partition_keys=partition_keys,
                 open_file_func=partial(fs.open, mode="rb"),
             )
+
         df = piece.read(
             columns=columns,
             partitions=partitions,
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/io/parquet/core.py new/dask-2.8.1/dask/dataframe/io/parquet/core.py
--- old/dask-2.8.0/dask/dataframe/io/parquet/core.py	2019-11-05 22:48:30.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/io/parquet/core.py	2019-11-23 05:14:40.000000000 +0100
@@ -8,8 +8,9 @@
 
 from ...core import DataFrame, new_dd_object
 from ....base import tokenize
-from ....utils import import_required, natural_sort_key
+from ....utils import import_required, natural_sort_key, parse_bytes
 from collections.abc import Mapping
+from ...methods import concat
 
 
 try:
@@ -62,16 +63,18 @@
             raise KeyError(key)
 
         part = self.parts[i]
+        if not isinstance(part, list):
+            part = [part]
 
         return (
             read_parquet_part,
             self.engine.read_partition,
             self.fs,
             self.meta,
-            part["piece"],
+            [p["piece"] for p in part],
             self.columns,
             self.index,
-            toolz.merge(part["kwargs"], self.kwargs or {}),
+            toolz.merge(part[0]["kwargs"], self.kwargs or {}),
         )
 
     def __len__(self):
@@ -92,6 +95,7 @@
     engine="auto",
     gather_statistics=None,
     split_row_groups=True,
+    chunksize=None,
     **kwargs
 ):
     """
@@ -144,6 +148,10 @@
         to parquet-file row-groups (when enough row-group metadata is
         available). Otherwise, partitions correspond to distinct files.
         Only the "pyarrow" engine currently supports this argument.
+    chunksize : int, str
+        The target task partition size.  If set, consecutive row-groups
+        from the same file will be aggregated into the same output
+        partition until the aggregate size reaches this value.
     **kwargs: dict (of dicts)
         Passthrough key-word arguments for read backend.
         The top-level keys correspond to the appropriate operation type, and
@@ -218,116 +226,16 @@
     if meta.index.name is not None:
         index = meta.index.name
 
-    ignore_index_column_intersection = False
-    if columns is None:
-        # User didn't specify columns, so ignore any intersection
-        # of auto-detected values with the index (if necessary)
-        ignore_index_column_intersection = True
-        columns = [c for c in meta.columns]
-
-    if not set(columns).issubset(set(meta.columns)):
-        raise ValueError(
-            "The following columns were not found in the dataset %s\n"
-            "The following columns were found %s"
-            % (set(columns) - set(meta.columns), meta.columns)
-        )
-
     # Parse dataset statistics from metadata (if available)
-    index_in_columns = False
-    if statistics:
-        result = list(
-            zip(
-                *[
-                    (part, stats)
-                    for part, stats in zip(parts, statistics)
-                    if stats["num-rows"] > 0
-                ]
-            )
-        )
-        parts, statistics = result or [[], []]
-        if filters:
-            parts, statistics = apply_filters(parts, statistics, filters)
-
-        out = sorted_columns(statistics)
-
-        if index and isinstance(index, str):
-            index = [index]
-        if index and out:
-            # Only one valid column
-            out = [o for o in out if o["name"] in index]
-        if index is not False and len(out) == 1:
-            # Use only sorted column with statistics as the index
-            divisions = out[0]["divisions"]
-            if index is None:
-                index_in_columns = True
-                index = [out[0]["name"]]
-            elif index != [out[0]["name"]]:
-                raise ValueError("Specified index is invalid.\nindex: {}".format(index))
-        elif index is not False and len(out) > 1:
-            if any(o["name"] == "index" for o in out):
-                # Use sorted column named "index" as the index
-                [o] = [o for o in out if o["name"] == "index"]
-                divisions = o["divisions"]
-                if index is None:
-                    index = [o["name"]]
-                    index_in_columns = True
-                elif index != [o["name"]]:
-                    raise ValueError(
-                        "Specified index is invalid.\nindex: {}".format(index)
-                    )
-            else:
-                # Multiple sorted columns found, cannot autodetect the index
-                warnings.warn(
-                    "Multiple sorted columns found %s, cannot\n "
-                    "autodetect index. Will continue without an index.\n"
-                    "To pick an index column, use the index= keyword; to \n"
-                    "silence this warning use index=False."
-                    "" % [o["name"] for o in out],
-                    RuntimeWarning,
-                )
-                index = False
-                divisions = [None] * (len(parts) + 1)
-        else:
-            divisions = [None] * (len(parts) + 1)
-    else:
-        divisions = [None] * (len(parts) + 1)
-
-    if index:
-        if isinstance(index, str):
-            index = [index]
-        if isinstance(columns, str):
-            columns = [columns]
-
-        if ignore_index_column_intersection:
-            columns = [col for col in columns if col not in index]
-        if set(index).intersection(columns):
-            if auto_index_allowed:
-                raise ValueError(
-                    "Specified index and column arguments must not intersect"
-                    " (set index=False or remove the detected index from columns).\n"
-                    "index: {} | column: {}".format(index, columns)
-                )
-            else:
-                raise ValueError(
-                    "Specified index and column arguments must not intersect.\n"
-                    "index: {} | column: {}".format(index, columns)
-                )
-
-        # Leaving index as a column in `meta`, because the index
-        # will be reset below (in case the index was detected after
-        # meta was created)
-        if index_in_columns:
-            meta = meta[columns + index]
-        else:
-            meta = meta[columns]
-
-    else:
-        meta = meta[list(columns)]
+    parts, divisions, index, index_in_columns = process_statistics(
+        parts, statistics, filters, index, chunksize
+    )
 
-    def _merge_kwargs(x, y):
-        z = x.copy()
-        z.update(y)
-        return z
+    # Account for index and columns arguments.
+    # Modify `meta` dataframe accordingly
+    meta, index, columns = set_index_columns(
+        meta, index, columns, index_in_columns, auto_index_allowed
+    )
 
     subgraph = ParquetSubgraph(name, engine, fs, meta, columns, index, parts, kwargs)
 
@@ -347,7 +255,12 @@
     """ Read a part of a parquet dataset
 
     This function is used by `read_parquet`."""
-    df = func(fs, part, columns, index, **kwargs)
+    if isinstance(part, list):
+        dfs = [func(fs, rg, columns.copy(), index, **kwargs) for rg in part]
+        df = concat(dfs, axis=0)
+    else:
+        df = func(fs, part, columns, index, **kwargs)
+
     if meta.columns.name:
         df.columns.name = meta.columns.name
     columns = columns or []
@@ -684,4 +597,166 @@
     return parts, statistics
 
 
+def process_statistics(parts, statistics, filters, index, chunksize):
+    """Process row-group column statistics in metadata
+    Used in read_parquet.
+    """
+    index_in_columns = False
+    if statistics:
+        result = list(
+            zip(
+                *[
+                    (part, stats)
+                    for part, stats in zip(parts, statistics)
+                    if stats["num-rows"] > 0
+                ]
+            )
+        )
+        parts, statistics = result or [[], []]
+        if filters:
+            parts, statistics = apply_filters(parts, statistics, filters)
+
+        # Aggregate parts/statistics if we are splitting by row-group
+        if chunksize:
+            parts, statistics = aggregate_row_groups(parts, statistics, chunksize)
+
+        out = sorted_columns(statistics)
+
+        if index and isinstance(index, str):
+            index = [index]
+        if index and out:
+            # Only one valid column
+            out = [o for o in out if o["name"] in index]
+        if index is not False and len(out) == 1:
+            # Use only sorted column with statistics as the index
+            divisions = out[0]["divisions"]
+            if index is None:
+                index_in_columns = True
+                index = [out[0]["name"]]
+            elif index != [out[0]["name"]]:
+                raise ValueError("Specified index is invalid.\nindex: {}".format(index))
+        elif index is not False and len(out) > 1:
+            if any(o["name"] == "index" for o in out):
+                # Use sorted column named "index" as the index
+                [o] = [o for o in out if o["name"] == "index"]
+                divisions = o["divisions"]
+                if index is None:
+                    index = [o["name"]]
+                    index_in_columns = True
+                elif index != [o["name"]]:
+                    raise ValueError(
+                        "Specified index is invalid.\nindex: {}".format(index)
+                    )
+            else:
+                # Multiple sorted columns found, cannot autodetect the index
+                warnings.warn(
+                    "Multiple sorted columns found %s, cannot\n "
+                    "autodetect index. Will continue without an index.\n"
+                    "To pick an index column, use the index= keyword; to \n"
+                    "silence this warning use index=False."
+                    "" % [o["name"] for o in out],
+                    RuntimeWarning,
+                )
+                index = False
+                divisions = [None] * (len(parts) + 1)
+        else:
+            divisions = [None] * (len(parts) + 1)
+    else:
+        divisions = [None] * (len(parts) + 1)
+
+    return parts, divisions, index, index_in_columns
+
+
+def set_index_columns(meta, index, columns, index_in_columns, auto_index_allowed):
+    """Handle index/column arguments, and modify `meta`
+    Used in read_parquet.
+    """
+    ignore_index_column_intersection = False
+    if columns is None:
+        # User didn't specify columns, so ignore any intersection
+        # of auto-detected values with the index (if necessary)
+        ignore_index_column_intersection = True
+        columns = [c for c in meta.columns]
+
+    if not set(columns).issubset(set(meta.columns)):
+        raise ValueError(
+            "The following columns were not found in the dataset %s\n"
+            "The following columns were found %s"
+            % (set(columns) - set(meta.columns), meta.columns)
+        )
+
+    if index:
+        if isinstance(index, str):
+            index = [index]
+        if isinstance(columns, str):
+            columns = [columns]
+
+        if ignore_index_column_intersection:
+            columns = [col for col in columns if col not in index]
+        if set(index).intersection(columns):
+            if auto_index_allowed:
+                raise ValueError(
+                    "Specified index and column arguments must not intersect"
+                    " (set index=False or remove the detected index from columns).\n"
+                    "index: {} | column: {}".format(index, columns)
+                )
+            else:
+                raise ValueError(
+                    "Specified index and column arguments must not intersect.\n"
+                    "index: {} | column: {}".format(index, columns)
+                )
+
+        # Leaving index as a column in `meta`, because the index
+        # will be reset below (in case the index was detected after
+        # meta was created)
+        if index_in_columns:
+            meta = meta[columns + index]
+        else:
+            meta = meta[columns]
+
+    else:
+        meta = meta[list(columns)]
+
+    return meta, index, columns
+
+
+def aggregate_row_groups(parts, stats, chunksize):
+    if not stats[0].get("file_path_0", None):
+        return parts, stats
+
+    parts_agg = []
+    stats_agg = []
+    chunksize = parse_bytes(chunksize)
+    next_part, next_stat = [parts[0].copy()], stats[0].copy()
+    for i in range(1, len(parts)):
+        stat, part = stats[i], parts[i]
+        if (stat["file_path_0"] == next_stat["file_path_0"]) and (
+            (next_stat["total_byte_size"] + stat["total_byte_size"]) <= chunksize
+        ):
+            # Update part list
+            next_part.append(part)
+
+            # Update Statistics
+            next_stat["total_byte_size"] += stat["total_byte_size"]
+            next_stat["num-rows"] += stat["num-rows"]
+            for col, col_add in zip(next_stat["columns"], stat["columns"]):
+                if col["name"] != col_add["name"]:
+                    raise ValueError("Columns are different!!")
+                if "null_count" in col:
+                    col["null_count"] += col_add["null_count"]
+                if "min" in col:
+                    col["min"] = min(col["min"], col_add["min"])
+                if "max" in col:
+                    col["max"] = max(col["max"], col_add["max"])
+        else:
+            parts_agg.append(next_part)
+            stats_agg.append(next_stat)
+            next_part, next_stat = [part.copy()], stat.copy()
+
+    parts_agg.append(next_part)
+    stats_agg.append(next_stat)
+
+    return parts_agg, stats_agg
+
+
 DataFrame.to_parquet.__doc__ = to_parquet.__doc__
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/io/parquet/fastparquet.py new/dask-2.8.1/dask/dataframe/io/parquet/fastparquet.py
--- old/dask-2.8.0/dask/dataframe/io/parquet/fastparquet.py	2019-11-05 22:48:30.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/io/parquet/fastparquet.py	2019-11-23 05:12:29.000000000 +0100
@@ -91,17 +91,16 @@
     fast_metadata = True
     if len(paths) > 1:
         base, fns = _analyze_paths(paths, fs)
-        relpaths = [path.replace(base, "").lstrip("/") for path in paths]
         if gather_statistics is not False:
             # This scans all the files, allowing index/divisions
             # and filtering
             pf = ParquetFile(
                 paths, open_with=fs.open, sep=fs.sep, **kwargs.get("file", {})
             )
-            if "_metadata" not in relpaths:
+            if "_metadata" not in fns:
                 fast_metadata = False
         else:
-            if "_metadata" in relpaths:
+            if "_metadata" in fns:
                 # We have a _metadata file, lets use it
                 pf = ParquetFile(
                     base + fs.sep + "_metadata",
@@ -117,49 +116,45 @@
                 pf.file_scheme = scheme
                 pf.cats = _paths_to_cats(fns, scheme)
                 parts = paths.copy()
-    else:
-        if fs.isdir(paths[0]):
-            # This is a directory, check for _metadata, then _common_metadata
-            paths = fs.glob(paths[0] + fs.sep + "*")
-            base, fns = _analyze_paths(paths, fs)
-            relpaths = [path.replace(base, "").lstrip("/") for path in paths]
-            if "_metadata" in relpaths:
-                # Using _metadata file (best-case scenario)
+    elif fs.isdir(paths[0]):
+        # This is a directory, check for _metadata, then _common_metadata
+        paths = fs.glob(paths[0] + fs.sep + "*")
+        base, fns = _analyze_paths(paths, fs)
+        if "_metadata" in fns:
+            # Using _metadata file (best-case scenario)
+            pf = ParquetFile(
+                base + fs.sep + "_metadata",
+                open_with=fs.open,
+                sep=fs.sep,
+                **kwargs.get("file", {})
+            )
+            if gather_statistics is None:
+                gather_statistics = True
+
+        elif gather_statistics is not False:
+            # Scan every file
+            pf = ParquetFile(paths, open_with=fs.open, **kwargs.get("file", {}))
+            fast_metadata = False
+        else:
+            # Use _common_metadata file if it is available.
+            # Otherwise, just use 0th file
+            if "_common_metadata" in fns:
                 pf = ParquetFile(
-                    base + fs.sep + "_metadata",
+                    base + fs.sep + "_common_metadata",
                     open_with=fs.open,
-                    sep=fs.sep,
                     **kwargs.get("file", {})
                 )
-                if gather_statistics is None:
-                    gather_statistics = True
-
-            elif gather_statistics is not False:
-                # Scan every file
-                pf = ParquetFile(paths, open_with=fs.open, **kwargs.get("file", {}))
-                fast_metadata = False
             else:
-                # Use _common_metadata file if it is available.
-                # Otherwise, just use 0th file
-                if "_common_metadata" in relpaths:
-                    pf = ParquetFile(
-                        base + fs.sep + "_common_metadata",
-                        open_with=fs.open,
-                        **kwargs.get("file", {})
-                    )
-                else:
-                    pf = ParquetFile(
-                        paths[0], open_with=fs.open, **kwargs.get("file", {})
-                    )
-                scheme = get_file_scheme(fns)
-                pf.file_scheme = scheme
-                pf.cats = _paths_to_cats(fns, scheme)
-                parts = paths.copy()
-        else:
-            # There is only one file to read
-            pf = ParquetFile(
-                paths[0], open_with=fs.open, sep=fs.sep, **kwargs.get("file", {})
-            )
+                pf = ParquetFile(paths[0], open_with=fs.open, **kwargs.get("file", {}))
+            scheme = get_file_scheme(fns)
+            pf.file_scheme = scheme
+            pf.cats = _paths_to_cats(fns, scheme)
+            parts = paths.copy()
+    else:
+        # There is only one file to read
+        pf = ParquetFile(
+            paths[0], open_with=fs.open, sep=fs.sep, **kwargs.get("file", {})
+        )
 
     return parts, pf, gather_statistics, fast_metadata
 
@@ -296,6 +291,8 @@
                         s["columns"].append(d)
                 # Need this to filter out partitioned-on categorical columns
                 s["filter"] = fastparquet.api.filter_out_cats(row_group, filters)
+                s["total_byte_size"] = row_group.total_byte_size
+                s["file_path_0"] = row_group.columns[0].file_path  # 0th column only
                 stats.append(s)
 
         else:
@@ -345,20 +342,18 @@
             pf.file_scheme = scheme
             pf.cats = _paths_to_cats(fns, scheme)
             pf.fn = base
-            df = pf.to_pandas(columns, categories, index=index)
+            return pf.to_pandas(columns, categories, index=index)
         else:
             if isinstance(pf, tuple):
                 pf = _determine_pf_parts(fs, pf[0], pf[1], **kwargs)[1]
                 pf._dtypes = lambda *args: pf.dtypes  # ugly patch, could be fixed
                 pf.fmd.row_groups = None
-            piece = pf.row_groups[piece]
+            rg_piece = pf.row_groups[piece]
             pf.fmd.key_value_metadata = None
-            df = pf.read_row_group_file(
-                piece, columns, categories, index=index, **kwargs.get("read", {})
+            return pf.read_row_group_file(
+                rg_piece, columns, categories, index=index, **kwargs.get("read", {})
             )
 
-        return df
-
     @staticmethod
     def initialize_write(
         df,
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/io/tests/test_parquet.py new/dask-2.8.1/dask/dataframe/io/tests/test_parquet.py
--- old/dask-2.8.0/dask/dataframe/io/tests/test_parquet.py	2019-11-05 22:48:30.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/io/tests/test_parquet.py	2019-11-23 05:14:40.000000000 +0100
@@ -14,7 +14,7 @@
 from dask.dataframe.io.parquet.utils import _parse_pandas_metadata
 from dask.dataframe.optimize import optimize_read_parquet_getitem
 from dask.dataframe.io.parquet.core import ParquetSubgraph
-from dask.utils import natural_sort_key
+from dask.utils import natural_sort_key, parse_bytes
 
 try:
     import fastparquet
@@ -1915,7 +1915,10 @@
 
 
 def test_timeseries_nulls_in_schema(tmpdir, engine):
-    tmp_path = str(tmpdir)
+    # GH#5608: relative path failing _metadata/_common_metadata detection.
+    tmp_path = str(tmpdir.mkdir("files"))
+    tmp_path = os.path.join(tmp_path, "../", "files")
+
     ddf2 = (
         dask.datasets.timeseries(start="2000-01-01", end="2000-01-03", freq="1h")
         .reset_index()
@@ -2079,7 +2082,11 @@
     )
 
     ddf3 = dd.read_parquet(
-        tmp, engine="pyarrow", gather_statistics=True, split_row_groups=True
+        tmp,
+        engine="pyarrow",
+        gather_statistics=True,
+        split_row_groups=True,
+        chunksize=1,
     )
     assert ddf3.npartitions == 4
 
@@ -2093,7 +2100,11 @@
     )
 
     ddf3 = dd.read_parquet(
-        tmp, engine="pyarrow", gather_statistics=True, split_row_groups=True
+        tmp,
+        engine="pyarrow",
+        gather_statistics=True,
+        split_row_groups=True,
+        chunksize=1,
     )
     assert ddf3.npartitions == 12
 
@@ -2101,3 +2112,80 @@
         tmp, engine="pyarrow", gather_statistics=True, split_row_groups=False
     )
     assert ddf3.npartitions == 4
+
+
+@pytest.mark.parametrize("metadata", [True, False])
+@pytest.mark.parametrize("chunksize", [None, 1024, 4096, "1MiB"])
+def test_chunksize(tmpdir, chunksize, engine, metadata):
+    check_pyarrow()  # Need pyarrow for write phase in this test
+
+    nparts = 2
+    df_size = 100
+    row_group_size = 5
+    row_group_byte_size = 451  # Empirically measured
+
+    df = pd.DataFrame(
+        {
+            "a": np.random.choice(["apple", "banana", "carrot"], size=df_size),
+            "b": np.random.random(size=df_size),
+            "c": np.random.randint(1, 5, size=df_size),
+            "index": np.arange(0, df_size),
+        }
+    ).set_index("index")
+
+    ddf1 = dd.from_pandas(df, npartitions=nparts)
+    ddf1.to_parquet(
+        str(tmpdir),
+        engine="pyarrow",
+        row_group_size=row_group_size,
+        write_metadata_file=metadata,
+    )
+
+    if metadata:
+        path = str(tmpdir)
+    else:
+        dirname = str(tmpdir)
+        files = os.listdir(dirname)
+        assert "_metadata" not in files
+        path = os.path.join(dirname, "*.parquet")
+
+    ddf2 = dd.read_parquet(
+        path,
+        engine=engine,
+        chunksize=chunksize,
+        split_row_groups=True,
+        gather_statistics=True,
+        index="index",
+    )
+
+    assert_eq(ddf1, ddf2, check_divisions=False)
+
+    num_row_groups = df_size // row_group_size
+    if not chunksize:
+        assert ddf2.npartitions == num_row_groups
+    else:
+        # Check that we are really aggregating
+        df_byte_size = row_group_byte_size * num_row_groups
+        expected = df_byte_size // parse_bytes(chunksize)
+        remainder = (df_byte_size % parse_bytes(chunksize)) > 0
+        expected += int(remainder) * nparts
+        assert ddf2.npartitions == max(nparts, expected)
+
+
+@write_read_engines()
+def test_roundtrip_pandas_chunksize(tmpdir, write_engine, read_engine):
+    path = str(tmpdir.join("test.parquet"))
+    pdf = df.copy()
+    pdf.index.name = "index"
+    pdf.to_parquet(path, engine=write_engine)
+
+    ddf_read = dd.read_parquet(
+        path,
+        engine=read_engine,
+        chunksize="10 kiB",
+        gather_statistics=True,
+        split_row_groups=True,
+        index="index",
+    )
+
+    assert_eq(pdf, ddf_read)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/tests/test_dataframe.py new/dask-2.8.1/dask/dataframe/tests/test_dataframe.py
--- old/dask-2.8.0/dask/dataframe/tests/test_dataframe.py	2019-11-05 22:48:30.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/tests/test_dataframe.py	2019-11-22 04:35:27.000000000 +0100
@@ -1734,7 +1734,7 @@
 def test_repartition_npartitions(use_index, n, k, dtype, transform):
     df = pd.DataFrame(
         {"x": [1, 2, 3, 4, 5, 6] * 10, "y": list("abdabd") * 10},
-        index=pd.Series([10, 20, 30, 40, 50, 60] * 10, dtype=dtype),
+        index=pd.Series([1, 2, 3, 4, 5, 6] * 10, dtype=dtype),
     )
     df = transform(df)
     a = dd.from_pandas(df, npartitions=n, sort=use_index)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask/dataframe/tests/test_groupby.py new/dask-2.8.1/dask/dataframe/tests/test_groupby.py
--- old/dask-2.8.0/dask/dataframe/tests/test_groupby.py	2019-11-13 18:07:11.000000000 +0100
+++ new/dask-2.8.1/dask/dataframe/tests/test_groupby.py	2019-11-22 04:35:27.000000000 +0100
@@ -2189,6 +2189,48 @@
         )
 
 
+@pytest.mark.parametrize(
+    "grouping,agg",
+    [
+        (
+            lambda df: df.drop(columns="category_2").groupby("category_1"),
+            lambda grp: grp.mean(),
+        ),
+        (
+            lambda df: df.drop(columns="category_2").groupby("category_1"),
+            lambda grp: grp.agg("mean"),
+        ),
+        (lambda df: df.groupby(["category_1", "category_2"]), lambda grp: grp.mean()),
+        pytest.param(
+            lambda df: df.groupby(["category_1", "category_2"]),
+            lambda grp: grp.agg("mean"),
+            marks=pytest.mark.xfail(
+                not dask.dataframe.utils.PANDAS_GT_100,
+                reason=(
+                    "Should work starting from pandas 1.0.0: "
+                    "https://github.com/dask/dask/pull/5423"
+                ),
+            ),
+        ),
+    ],
+)
+def test_groupby_aggregate_categoricals(grouping, agg):
+    pdf = pd.DataFrame(
+        {
+            "category_1": pd.Categorical(list("AABBCC")),
+            "category_2": pd.Categorical(list("ABCABC")),
+            "value": np.random.uniform(size=6),
+        }
+    )
+    ddf = dd.from_pandas(pdf, 2)
+
+    # DataFrameGroupBy
+    assert_eq(agg(grouping(pdf)), agg(grouping(ddf)))
+
+    # SeriesGroupBy
+    assert_eq(agg(grouping(pdf)["value"]), agg(grouping(ddf)["value"]))
+
+
 @pytest.mark.xfail(reason="dropna kwarg not supported in pandas groupby.")
 @pytest.mark.parametrize("dropna", [False, True])
 def test_groupby_dropna_pandas(dropna):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/dask.egg-info/PKG-INFO new/dask-2.8.1/dask.egg-info/PKG-INFO
--- old/dask-2.8.0/dask.egg-info/PKG-INFO	2019-11-14 23:57:18.000000000 +0100
+++ new/dask-2.8.1/dask.egg-info/PKG-INFO	2019-11-23 05:31:54.000000000 +0100
@@ -1,6 +1,6 @@
 Metadata-Version: 2.1
 Name: dask
-Version: 2.8.0
+Version: 2.8.1
 Summary: Parallel PyData with Task Scheduling
 Home-page: https://github.com/dask/dask/
 Maintainer: Matthew Rocklin
@@ -43,10 +43,10 @@
 Classifier: Programming Language :: Python :: 3.6
 Classifier: Programming Language :: Python :: 3.7
 Requires-Python: >=3.6
-Provides-Extra: complete
 Provides-Extra: array
-Provides-Extra: diagnostics
-Provides-Extra: dataframe
 Provides-Extra: bag
-Provides-Extra: delayed
+Provides-Extra: dataframe
 Provides-Extra: distributed
+Provides-Extra: diagnostics
+Provides-Extra: delayed
+Provides-Extra: complete
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/api.rst new/dask-2.8.1/docs/source/api.rst
--- old/dask-2.8.0/docs/source/api.rst	2019-10-11 05:14:07.000000000 +0200
+++ new/dask-2.8.1/docs/source/api.rst	2019-11-21 03:32:04.000000000 +0100
@@ -52,7 +52,27 @@
 .. autofunction:: persist
 .. autofunction:: visualize
 
-Finally, Dask has a few helpers for generating demo datasets
+Datasets
+--------
 
-.. autofunction:: datasets.make_people
-.. autofunction:: datasets.timeseries
+Dask has a few helpers for generating demo datasets
+
+.. currentmodule:: dask.datasets
+
+.. autofunction:: make_people
+.. autofunction:: timeseries
+
+.. _api.utilities:
+
+Utilities
+---------
+
+Dask has some public utility methods. These are primarily used for parsing
+configuration values.
+
+.. currentmodule:: dask.utils
+
+.. autofunction:: format_bytes
+.. autofunction:: format_time
+.. autofunction:: parse_bytes
+.. autofunction:: parse_timedelta
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/changelog.rst new/dask-2.8.1/docs/source/changelog.rst
--- old/dask-2.8.0/docs/source/changelog.rst	2019-11-14 23:55:24.000000000 +0100
+++ new/dask-2.8.1/docs/source/changelog.rst	2019-11-23 05:27:27.000000000 +0100
@@ -1,6 +1,34 @@
 Changelog
 =========
 
+2.8.1 / 2019-11-22
+------------------
+
+Array
++++++
+- Use auto rechunking in ``da.rechunk`` if no value given (:pr:`5605`) `Matthew Rocklin`_
+
+Core
+++++
+- Add simple action to activate GH actions (:pr:`5619`) `James Bourbeau`_
+
+DataFrame
++++++++++
+- Fix "file_path_0" bug in ``aggregate_row_groups`` (:pr:`5627`) `Richard J Zamora`_
+- Add ``chunksize`` argument to ``read_parquet`` (:pr:`5607`) `Richard J Zamora`_
+- Change ``test_repartition_npartitions`` to support arch64 architecture (:pr:`5620`) `ossdev07`_
+- Categories lost after groupby + agg (:pr:`5423`) `Oliver Hofkens`_
+- Fixed relative path issue with parquet metadata file (:pr:`5608`) `Nuno Gomes Silva`_
+- Enable gpu-backed covariance/correlation in dataframes (:pr:`5597`) `Richard J Zamora`_
+
+Documentation
++++++++++++++
+- Fix institutional faq and unknown doc warnings (:pr:`5616`) `James Bourbeau`_
+- Add doc for some utils (:pr:`5609`) `Tom Augspurger`_
+- Removes ``html_extra_path`` (:pr:`5614`) `James Bourbeau`_
+- Fixed See Also referencence (:pr:`5612`) `Tom Augspurger`_
+
+
 2.8.0 / 2019-11-14
 ------------------
 
@@ -2700,3 +2728,5 @@
 .. _`Prithvi MK`: https://github.com/pmk21
 .. _`Eric Dill`: https://github.com/ericdill
 .. _`Gina Helfrich`: https://github.com/Dr-G
+.. _`ossdev07`: https://github.com/ossdev07
+.. _`Nuno Gomes Silva`: https://github.com/mgsnuno
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/conf.py new/dask-2.8.1/docs/source/conf.py
--- old/dask-2.8.0/docs/source/conf.py	2019-10-11 05:14:07.000000000 +0200
+++ new/dask-2.8.1/docs/source/conf.py	2019-11-20 18:05:56.000000000 +0100
@@ -48,7 +48,6 @@
 
 # The master toctree document.
 master_doc = "index"
-html_extra_path = ["index.html"]
 
 # General information about the project.
 project = u"Dask"
@@ -359,10 +358,9 @@
 """
 
 html_context = {
-    'css_files': [
-        '_static/theme_overrides.css',  # override wide tables in RTD theme
-        ],
-     }
+    "css_files": ["_static/theme_overrides.css"]  # override wide tables in RTD theme
+}
+
 
 def copy_legacy_redirects(app, docname):
     if app.builder.name == "html":
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/configuration.rst new/dask-2.8.1/docs/source/configuration.rst
--- old/dask-2.8.0/docs/source/configuration.rst	2019-10-11 05:14:07.000000000 +0200
+++ new/dask-2.8.1/docs/source/configuration.rst	2019-11-21 03:32:04.000000000 +0100
@@ -66,6 +66,8 @@
 For example, ``dask.config.get('num_workers')`` is equivalent to
 ``dask.config.get('num-workers')``.
 
+Values like ``"128 MiB"`` and ``"10s"`` are parsed using the functions in
+:ref:`api.utilities`.
 
 Specify Configuration
 ---------------------
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/dataframe-create.rst new/dask-2.8.1/docs/source/dataframe-create.rst
--- old/dask-2.8.0/docs/source/dataframe-create.rst	2019-10-11 02:10:15.000000000 +0200
+++ new/dask-2.8.1/docs/source/dataframe-create.rst	2019-11-21 03:32:04.000000000 +0100
@@ -7,7 +7,7 @@
 File System (HDFS), and Amazon's S3 (excepting HDF, which is only available on
 POSIX like file systems).
 
-See the :doc:`Overview section <dataframe-overview>` for an in depth
+See the :doc:`DataFrame overview page <dataframe>` for an in depth
 discussion of ``dask.dataframe`` scope, use, and limitations.
 
 API
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/institutional-faq.rst new/dask-2.8.1/docs/source/institutional-faq.rst
--- old/dask-2.8.0/docs/source/institutional-faq.rst	2019-11-05 22:48:30.000000000 +0100
+++ new/dask-2.8.1/docs/source/institutional-faq.rst	2019-11-21 03:32:04.000000000 +0100
@@ -154,10 +154,10 @@
     SLLURM, PBS, LSF, Torque, Condor, or other job batch queuing systems, then
     users can launch Dask on these systems today using either:
 
-    -  `Dask Jobqueue https://jobqueue.dask.org`_ , which uses typical
+    - `Dask Jobqueue https://jobqueue.dask.org`_ , which uses typical
       ``qsub``, ``sbatch``, ``bsub`` or other submission tools in interactive
       settings.
-    -  `Dask MPI https://mpi.dask.org`_ which uses MPI for deployment in
+    - `Dask MPI https://mpi.dask.org`_ which uses MPI for deployment in
       batch settings
 
     For more information see :doc:`setup/hpc`
@@ -167,10 +167,10 @@
     all of which provide hosted Kubernetes as a service.  People today use Dask
     on Kubernetes using either of the following:
 
-    -  **Helm**: an easy way to stand up a long-running Dask cluster and
+    - **Helm**: an easy way to stand up a long-running Dask cluster and
       Jupyter notebook
 
-    -  **Dask-Kubernetes**: for native Kubernetes integration for fast moving
+    - **Dask-Kubernetes**: for native Kubernetes integration for fast moving
       or ephemeral deployments.
 
     For more information see :doc:`setup/kubernetes`
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/dask-2.8.0/docs/source/setup/custom-startup.rst new/dask-2.8.1/docs/source/setup/custom-startup.rst
--- old/dask-2.8.0/docs/source/setup/custom-startup.rst	2019-10-11 05:14:07.000000000 +0200
+++ new/dask-2.8.1/docs/source/setup/custom-startup.rst	2019-11-21 03:32:04.000000000 +0100
@@ -28,7 +28,8 @@
 
 
 As an example, consider the following file that creates a
-:doc:`scheduler plugin <plugins>` and registers it with the scheduler
+`scheduler plugin https://distributed.dask.org/en/latest/plugins.html`_
+and registers it with the scheduler
 
 .. code-block:: python

    

root

tags

participants (1)