commit urlwatch for openSUSE:Factory

4 Sep 2018

Hello community,

here is the log from the commit of package urlwatch for openSUSE:Factory checked in at 2018-09-04 22:57:50
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Comparing /work/SRC/openSUSE:Factory/urlwatch (Old)
 and      /work/SRC/openSUSE:Factory/.urlwatch.new (New)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Package is "urlwatch"

Tue Sep  4 22:57:50 2018 rev:12 rq:632968 version:2.14

Changes:
--------

--- /work/SRC/openSUSE:Factory/urlwatch/urlwatch.changes	2018-06-08 23:16:24.210031361 +0200
+++ /work/SRC/openSUSE:Factory/.urlwatch.new/urlwatch.changes	2018-09-04 22:58:07.189400169 +0200
@@ -1,0 +2,14 @@
+Tue Sep  4 06:34:45 UTC 2018 - mvetter@suse.com
+
+- Update to 2.14:
+  * Added filter to pretty-print JSON data: format-json (by Niko Böckerman, PR#250)
+  * Added list active Telegram chats using --telegram-chats (with fixes by Georg Pichler, PR#270)
+  * Added support for HTTP ETag header in URL jobs and If-None-Match (by Karol Babioch, PR#256)
+  * Added xupport for filtering HTML using XPath expressions, with lxml (PR#274, Fixes #226)
+  * Added install_dependencies to setup.py commands for easy installing of dependencies
+  * Added ignore_connection_errors per-job configuration option (by Karol Babioch, PR#261)
+  * Improved code (HTTP status codes, by Karol Babioch PR#258)
+  * Improved documentation for setting up Telegram chat bots
+  * Allow multiple chats for Telegram reporting (by Georg Pichler, PR#271)
+
+-------------------------------------------------------------------

Old:
----
  urlwatch-2.13.tar.gz

New:
----
  urlwatch-2.14.tar.gz

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Other differences:
------------------
++++++ urlwatch.spec ++++++
--- /var/tmp/diff_new_pack.wj9slx/_old	2018-09-04 22:58:07.541401369 +0200
+++ /var/tmp/diff_new_pack.wj9slx/_new	2018-09-04 22:58:07.541401369 +0200
@@ -17,7 +17,7 @@
 
 
 Name:           urlwatch
-Version:        2.13
+Version:        2.14
 Release:        0
 Summary:        A tool for monitoring webpages for updates
 License:        BSD-3-Clause

++++++ urlwatch-2.13.tar.gz -> urlwatch-2.14.tar.gz ++++++
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/.travis.yml new/urlwatch-2.14/.travis.yml
--- old/urlwatch-2.13/.travis.yml	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/.travis.yml	2018-08-30 10:36:16.000000000 +0200
@@ -4,5 +4,5 @@
   - "3.5"
   - "3.6"
 install:
-  - pip install pyyaml minidb requests keyring pycodestyle appdirs
+  - python setup.py install_dependencies
 script: nosetests -v
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/CHANGELOG.md new/urlwatch-2.14/CHANGELOG.md
--- old/urlwatch-2.13/CHANGELOG.md	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/CHANGELOG.md	2018-08-30 10:36:16.000000000 +0200
@@ -4,6 +4,22 @@
 
 The format mostly follows [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
+## [2.14] -- 2018-08-30
+
+### Added
+- Filter to pretty-print JSON data: `format-json` (by Niko Böckerman, PR#250)
+- List active Telegram chats using `--telegram-chats` (with fixes by Georg Pichler, PR#270)
+- Support for HTTP `ETag` header in URL jobs and `If-None-Match` (by Karol Babioch, PR#256)
+- Support for filtering HTML using XPath expressions, with `lxml` (PR#274, Fixes #226)
+- Added `install_dependencies` to `setup.py` commands for easy installing of dependencies
+- Added `ignore_connection_errors` per-job configuration option (by Karol Babioch, PR#261)
+
+### Changed
+- Improved code (HTTP status codes, by Karol Babioch PR#258)
+- Improved documentation for setting up Telegram chat bots
+- Allow multiple chats for Telegram reporting (by Georg Pichler, PR#271)
+
+
 ## [2.13] -- 2018-06-03
 
 ### Added
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/README.md new/urlwatch-2.14/README.md
--- old/urlwatch-2.13/README.md	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/README.md	2018-08-30 10:36:16.000000000 +0200
@@ -26,28 +26,19 @@
   * [requests](http://python-requests.org/)
   * [keyring](https://github.com/jaraco/keyring/)
   * [appdirs](https://github.com/ActiveState/appdirs)
-  * [chump](https://github.com/karanlyons/chump/) (for Pushover support)
-  * [pushbullet.py](https://github.com/randomchars/pushbullet.py) (for Pushbullet support)
+  * [lxml](https://lxml.de)
 
 The dependencies can be installed with (add `--user` to install to `$HOME`):
 
-`python3 -m pip install pyyaml minidb requests keyring appdirs`
+`python3 -m pip install pyyaml minidb requests keyring appdirs lxml`
 
-For optional pushover support the chump package is required:
 
-`python3 -m pip install chump`
+Optional dependencies (install via `python3 -m pip install <packagename>`):
 
-For optional pushbullet support the pushbullet.py package is required:
-
-`python3 -m pip install pushbullet.py`
-
-For optional support for the "browser" job kind, Requests-HTML is needed:
-
-`python3 -m pip install requests-html`
-
-For unit tests, you also need to install pycodestyle:
-
-`python3 -m pip install pycodestyle`
+  * Pushover reporter: [chump](https://github.com/karanlyons/chump/)
+  * Pushbullet reporter: [pushbullet.py](https://github.com/randomchars/pushbullet.py)
+  * "browser" job kind: [requests-html](https://html.python-requests.org)
+  * Unit testing: [pycodestyle](http://pycodestyle.pycqa.org/en/latest/)
 
 
 MIGRATION FROM URLWATCH 1.x
@@ -144,6 +135,30 @@
 `brew install wdiff` on macOS). Coloring is supported for `wdiff`-style
 output, but potentially not for other diff tools.
 
+To filter based on an [XPath](https://www.w3.org/TR/1999/REC-xpath-19991116/)
+expression, you can use the `xpath` filter like so (see Microsoft's
+[XPath Examples](https://msdn.microsoft.com/en-us/library/ms256086(v=vs.110).aspx)
+page for some other examples):
+
+```yaml
+url: https://example.net/
+filter: xpath:/body
+```
+
+This filters only the `<body>` element of the HTML document, stripping
+out everything else.
+
+In some cases, it might be useful to ignore (temporary) network errors to
+avoid notifications being sent. While there is a `display.error` config
+option (defaulting to `True`) to control reporting of errors globally, to
+ignore network errors for specific jobs only, you can use the
+`ignore_connection_errors` key in the job list configuration file:
+
+```yaml
+url: https://example.com/
+ignore_connection_errors: true
+```
+
 PUSHOVER
 --------
 
@@ -168,6 +183,7 @@
 Telegram notifications are configured using the Telegram Bot API.
 For this, you'll need a Bot API token and a chat id (see https://core.telegram.org/bots).
 Sample configuration:
+
 ```yaml
 telegram:
   bot_token: '999999999:3tOhy2CuZE0pTaCtszRfKpnagOG8IQbP5gf' # your bot api token
@@ -175,6 +191,28 @@
   enabled: true
 ```
 
+To set up Telegram, from your Telegram app, chat up BotFather (New Message,
+Search, "BotFather"), then say `/newbot` and follow the instructions.
+Eventually it will tell you the bot token (in the form seen above,
+`<number>:<random string>`) - add this to your config file.
+
+You can then click on the link of your bot, which will send the message `/start`.
+At this point, you can use the command `urlwatch --telegram-chats` to list the
+private chats the bot is involved with. This is the chat ID that you need to put
+into the config file as `chat_id`. You may add multiple chat IDs as a YAML list:
+```yaml
+telegram:
+  bot_token: '999999999:3tOhy2CuZE0pTaCtszRfKpnagOG8IQbP5gf' # your bot api token
+  chat_id:
+    - '11111111'
+    - '22222222'
+  enabled: true
+```
+
+Don't forget to also enable the reporter.
+
+
+
 BROWSER
 -------
 
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/__init__.py new/urlwatch-2.14/lib/urlwatch/__init__.py
--- old/urlwatch-2.13/lib/urlwatch/__init__.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/__init__.py	2018-08-30 10:36:16.000000000 +0200
@@ -12,5 +12,5 @@
 __author__ = 'Thomas Perl '
 __license__ = 'BSD'
 __url__ = 'https://thp.io/2008/urlwatch/'
-__version__ = '2.13'
+__version__ = '2.14'
 __user_agent__ = '%s/%s (+https://thp.io/2008/urlwatch/info.html)' % (pkgname, __version__)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/command.py new/urlwatch-2.14/lib/urlwatch/command.py
--- old/urlwatch-2.13/lib/urlwatch/command.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/command.py	2018-08-30 10:36:16.000000000 +0200
@@ -33,6 +33,7 @@
 import os
 import shutil
 import sys
+import requests
 
 from .filters import FilterBase
 from .handler import JobState
@@ -175,6 +176,41 @@
         if self.urlwatch_config.edit_config:
             sys.exit(self.urlwatcher.config_storage.edit())
 
+    def check_telegram_chats(self):
+        if self.urlwatch_config.telegram_chats:
+            config = self.urlwatcher.config_storage.config['report'].get('telegram', None)
+            if not config:
+                print('You need to configure telegram in your config first (see README.md)')
+                sys.exit(1)
+
+            bot_token = config.get('bot_token', None)
+            if not bot_token:
+                print('You need to set up your bot token first (see README.md)')
+                sys.exit(1)
+
+            info = requests.get('https://api.telegram.org/bot{}/getMe'.format(bot_token)).json()
+
+            chats = {}
+            for chat_info in requests.get('https://api.telegram.org/bot{}/getUpdates'.format(bot_token)).json()['result']:
+                chat = chat_info['message']['chat']
+                if chat['type'] == 'private':
+                    chats[str(chat['id'])] = ' '.join((chat['first_name'], chat['last_name'])) if 'last_name' in chat else chat['first_name']
+
+            if not chats:
+                print('No chats found. Say hello to your bot at https://t.me/{}'.format(info['result']['username']))
+                sys.exit(1)
+
+            headers = ('Chat ID', 'Name')
+            maxchat = max(len(headers[0]), max((len(k) for k, v in chats.items()), default=0))
+            maxname = max(len(headers[1]), max((len(v) for k, v in chats.items()), default=0))
+            fmt = '%-' + str(maxchat) + 's  %s'
+            print(fmt % headers)
+            print(fmt % ('-' * maxchat, '-' * maxname))
+            for k, v in sorted(chats.items(), key=lambda kv: kv[1]):
+                print(fmt % (k, v))
+            print('\nChat up your bot here: https://t.me/{}'.format(info['result']['username']))
+            sys.exit(0)
+
     def check_smtp_login(self):
         if self.urlwatch_config.smtp_login:
             config = self.urlwatcher.config_storage.config['report']['email']
@@ -222,6 +258,7 @@
     def run(self):
         self.check_edit_config()
         self.check_smtp_login()
+        self.check_telegram_chats()
         self.handle_actions()
         self.urlwatcher.run_jobs()
         self.urlwatcher.close()
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/config.py new/urlwatch-2.14/lib/urlwatch/config.py
--- old/urlwatch-2.13/lib/urlwatch/config.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/config.py	2018-08-30 10:36:16.000000000 +0200
@@ -89,6 +89,7 @@
 
         group = parser.add_argument_group('Authentication')
         group.add_argument('--smtp-login', action='store_true', help='Enter password for SMTP (store in keyring)')
+        group.add_argument('--telegram-chats', action='store_true', help='List telegram chats the bot is joined to')
 
         group = parser.add_argument_group('job list management')
         group.add_argument('--list', action='store_true', help='list jobs')
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/filters.py new/urlwatch-2.14/lib/urlwatch/filters.py
--- old/urlwatch-2.13/lib/urlwatch/filters.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/filters.py	2018-08-30 10:36:16.000000000 +0200
@@ -32,11 +32,14 @@
 import logging
 import itertools
 import os
+import io
 import imp
 import html.parser
 import hashlib
+import json
 
 from enum import Enum
+from lxml import etree
 
 from .util import TrackSubClasses
 
@@ -183,6 +186,19 @@
         return ical2text(data)
 
 
+class JsonFormatFilter(FilterBase):
+    """Convert to formatted json"""
+
+    __kind__ = 'format-json'
+
+    def filter(self, data, subfilter=None):
+        indentation = 4
+        if subfilter is not None:
+            indentation = int(subfilter)
+        parsed_json = json.loads(data)
+        return json.dumps(parsed_json, sort_keys=True, indent=indentation)
+
+
 class GrepFilter(FilterBase):
     """Filter only lines matching a regular expression"""
 
@@ -349,3 +365,18 @@
         return '\n'.join('%s  %s' % (' '.join('%02x' % c for c in block),
                                      ''.join((chr(c) if (c > 31 and c < 127) else '.')
                                              for c in block)) for block in blocks)
+
+
+class XPathFilter(FilterBase):
+    """Filter XML/HTML using XPath expressions"""
+
+    __kind__ = 'xpath'
+
+    def filter(self, data, subfilter=None):
+        if subfilter is None:
+            raise ValueError('Need an XPath expression for filtering')
+
+        parser = etree.HTMLParser()
+        tree = etree.parse(io.StringIO(data), parser)
+        return '\n'.join(etree.tostring(element, pretty_print=True, method='html', encoding='unicode')
+                         for element in tree.xpath(subfilter))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/handler.py new/urlwatch-2.14/lib/urlwatch/handler.py
--- old/urlwatch-2.13/lib/urlwatch/handler.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/handler.py	2018-08-30 10:36:16.000000000 +0200
@@ -51,9 +51,10 @@
         self.exception = None
         self.traceback = None
         self.tries = 0
+        self.etag = None
 
     def load(self):
-        self.old_data, self.timestamp, self.tries = self.cache_storage.load(self.job, self.job.get_guid())
+        self.old_data, self.timestamp, self.tries, self.etag = self.cache_storage.load(self.job, self.job.get_guid())
         if self.tries is None:
             self.tries = 0
 
@@ -62,7 +63,7 @@
             # If no new data has been retrieved due to an exception, use the old job data
             self.new_data = self.old_data
 
-        self.cache_storage.save(self.job, self.job.get_guid(), self.new_data, time.time(), self.tries)
+        self.cache_storage.save(self.job, self.job.get_guid(), self.new_data, time.time(), self.tries, self.etag)
 
     def process(self):
         logger.info('Processing: %s', self.job)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/jobs.py new/urlwatch-2.14/lib/urlwatch/jobs.py
--- old/urlwatch-2.13/lib/urlwatch/jobs.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/jobs.py	2018-08-30 10:36:16.000000000 +0200
@@ -180,7 +180,7 @@
 
     __required__ = ('url',)
     __optional__ = ('cookies', 'data', 'method', 'ssl_no_verify', 'ignore_cached', 'http_proxy', 'https_proxy',
-                    'headers')
+                    'headers', 'ignore_connection_errors')
 
     CHARSET_RE = re.compile('text/(html|plain); charset=([^;]*)')
 
@@ -197,10 +197,14 @@
             'https': os.getenv('HTTPS_PROXY'),
         }
 
+        if job_state.etag is not None:
+            headers['If-None-Match'] = job_state.etag
+
         if job_state.timestamp is not None:
             headers['If-Modified-Since'] = email.utils.formatdate(job_state.timestamp)
 
         if self.ignore_cached:
+            headers['If-None-Match'] = None
             headers['If-Modified-Since'] = email.utils.formatdate(0)
             headers['Cache-Control'] = 'max-age=172800'
             headers['Expires'] = email.utils.formatdate()
@@ -234,9 +238,12 @@
                                     proxies=proxies)
 
         response.raise_for_status()
-        if response.status_code == 304:
+        if response.status_code == requests.codes.not_modified:
             raise NotModifiedError()
 
+        # Save ETag from response into job_state, which will be saved in cache
+        job_state.etag = response.headers.get('ETag')
+
         # If we can't find the encoding in the headers, requests gets all
         # old-RFC-y and assumes ISO-8859-1 instead of UTF-8. Use the old
         # urlwatch behavior and try UTF-8 decoding first.
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/reporters.py new/urlwatch-2.14/lib/urlwatch/reporters.py
--- old/urlwatch-2.13/lib/urlwatch/reporters.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/reporters.py	2018-08-30 10:36:16.000000000 +0200
@@ -485,7 +485,7 @@
         try:
             json_res = result.json()
 
-            if (result.status_code == 200):
+            if (result.status_code == requests.codes.ok):
                 logger.info("Mailgun response: id '{0}'. {1}".format(json_res['id'], json_res['message']))
             else:
                 logger.error("Mailgun error: {0}".format(json_res['message']))
@@ -506,7 +506,8 @@
     def submit(self):
 
         bot_token = self.config['bot_token']
-        chat_id = self.config['chat_id']
+        chat_ids = self.config['chat_id']
+        chat_ids = [chat_ids] if isinstance(chat_ids, str) else chat_ids
 
         text = '\n'.join(super().submit())
 
@@ -515,9 +516,11 @@
             return
 
         result = None
-
         for chunk in self.chunkstring(text, self.MAX_LENGTH):
-            result = self.submitToTelegram(bot_token, chat_id, chunk)
+            for chat_id in chat_ids:
+                res = self.submitToTelegram(bot_token, chat_id, chunk)
+                if res.status_code != requests.codes.ok or res is None:
+                    result = res
 
         return result
 
@@ -529,7 +532,7 @@
         try:
             json_res = result.json()
 
-            if (result.status_code == 200):
+            if (result.status_code == requests.codes.ok):
                 logger.info("Telegram response: ok '{0}'. {1}".format(json_res['ok'], json_res['result']))
             else:
                 logger.error("Telegram error: {0}".format(json_res['description']))
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/storage.py new/urlwatch-2.14/lib/urlwatch/storage.py
--- old/urlwatch-2.13/lib/urlwatch/storage.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/storage.py	2018-08-30 10:36:16.000000000 +0200
@@ -97,6 +97,11 @@
             'enabled': False,
             'api_key': '',
         },
+        'telegram': {
+            'enabled': False,
+            'bot_token': '',
+            'chat-id': '',
+        },
         'mailgun': {
             'enabled': False,
             'api_key': '',
@@ -359,7 +364,7 @@
         ...
 
     @abstractmethod
-    def save(self, job, guid, data, timestamp, tries):
+    def save(self, job, guid, data, timestamp, tries, etag=None):
         ...
 
     @abstractmethod
@@ -372,12 +377,12 @@
 
     def backup(self):
         for guid in self.get_guids():
-            data, timestamp, tries = self.load(None, guid)
-            yield guid, data, timestamp, tries
+            data, timestamp, tries, etag = self.load(None, guid)
+            yield guid, data, timestamp, tries, etag
 
     def restore(self, entries):
-        for guid, data, timestamp, tries in entries:
-            self.save(None, guid, data, timestamp, tries)
+        for guid, data, timestamp, tries, etag in entries:
+            self.save(None, guid, data, timestamp, tries, etag)
 
     def gc(self, known_guids):
         for guid in set(self.get_guids()) - set(known_guids):
@@ -420,10 +425,10 @@
 
         timestamp = os.stat(filename)[stat.ST_MTIME]
 
-        return data, timestamp
+        return data, timestamp, None
 
-    def save(self, job, guid, data, timestamp):
-        # Timestamp is always ignored
+    def save(self, job, guid, data, timestamp, etag=None):
+        # Timestamp and ETag are always ignored
         filename = self._get_filename(guid)
         with open(filename, 'w+') as fp:
             fp.write(data)
@@ -443,6 +448,7 @@
     timestamp = int
     data = str
     tries = int
+    etag = str
 
 
 class CacheMiniDBStorage(CacheStorage):
@@ -464,15 +470,15 @@
         return (guid for guid, in CacheEntry.query(self.db, minidb.Function('distinct', CacheEntry.c.guid)))
 
     def load(self, job, guid):
-        for data, timestamp, tries in CacheEntry.query(self.db, CacheEntry.c.data // CacheEntry.c.timestamp // CacheEntry.c.tries,
-                                                       order_by=minidb.columns(CacheEntry.c.timestamp.desc, CacheEntry.c.tries.desc),
-                                                       where=CacheEntry.c.guid == guid, limit=1):
-            return data, timestamp, tries
+        for data, timestamp, tries, etag in CacheEntry.query(self.db, CacheEntry.c.data // CacheEntry.c.timestamp // CacheEntry.c.tries // CacheEntry.c.etag,
+                                                             order_by=minidb.columns(CacheEntry.c.timestamp.desc, CacheEntry.c.tries.desc),
+                                                             where=CacheEntry.c.guid == guid, limit=1):
+            return data, timestamp, tries, etag
 
-        return None, None, 0
+        return None, None, 0, None
 
-    def save(self, job, guid, data, timestamp, tries):
-        self.db.save(CacheEntry(guid=guid, timestamp=timestamp, data=data, tries=tries))
+    def save(self, job, guid, data, timestamp, tries, etag=None):
+        self.db.save(CacheEntry(guid=guid, timestamp=timestamp, data=data, tries=tries, etag=etag))
         self.db.commit()
 
     def delete(self, guid):
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/lib/urlwatch/worker.py new/urlwatch-2.14/lib/urlwatch/worker.py
--- old/urlwatch-2.13/lib/urlwatch/worker.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/lib/urlwatch/worker.py	2018-08-30 10:36:16.000000000 +0200
@@ -70,6 +70,8 @@
             if isinstance(job_state.exception, NotModifiedError):
                 logger.info('Job %s has not changed (HTTP 304)', job_state.job)
                 report.unchanged(job_state)
+            elif isinstance(job_state.exception, requests.exceptions.ConnectionError) and job_state.job.ignore_connection_errors:
+                logger.info('Connection error while executing job %s, ignored due to ignore_connection_errors', job_state.job)
             elif job_state.tries < max_tries:
                 logger.debug('This was try %i of %i for job %s', job_state.tries,
                              max_tries, job_state.job)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/setup.cfg new/urlwatch-2.14/setup.cfg
--- old/urlwatch-2.13/setup.cfg	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/setup.cfg	2018-08-30 10:36:16.000000000 +0200
@@ -1,2 +1,2 @@
-[pep8]
+[pycodestyle]
 max-line-length = 120
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/setup.py new/urlwatch-2.14/setup.py
--- old/urlwatch-2.13/setup.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/setup.py	2018-08-30 10:36:16.000000000 +0200
@@ -1,6 +1,7 @@
 #!/usr/bin/env python3
 
 from setuptools import setup
+from distutils import cmd
 
 import os
 import re
@@ -16,7 +17,7 @@
 m['name'] = 'urlwatch'
 m['author'], m['author_email'] = re.match(r'(.*) <(.*)>', m['author']).groups()
 m['description'], m['long_description'] = docs[0].strip().split('\n\n', 1)
-m['install_requires'] = ['minidb', 'PyYAML', 'requests', 'keyring', 'pycodestyle', 'appdirs']
+m['install_requires'] = ['minidb', 'PyYAML', 'requests', 'keyring', 'pycodestyle', 'appdirs', 'lxml']
 m['scripts'] = ['urlwatch']
 m['package_dir'] = {'': 'lib'}
 m['packages'] = ['urlwatch']
@@ -29,5 +30,29 @@
     ]),
 ]
 
+
+class InstallDependencies(cmd.Command):
+    """Install dependencies only"""
+
+    description = 'Only install required packages using pip'
+    user_options = []
+
+    def initialize_options(self):
+        ...
+
+    def finalize_options(self):
+        ...
+
+    def run(self):
+        global m
+        try:
+            from pip._internal import main
+        except ImportError:
+            from pip import main
+        main(['install', '--upgrade'] + m['install_requires'])
+
+
+m['cmdclass'] = {'install_dependencies': InstallDependencies}
+
 del m['copyright']
 setup(**m)
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/test/test_filters.py new/urlwatch-2.14/test/test_filters.py
--- old/urlwatch-2.13/test/test_filters.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/test/test_filters.py	2018-08-30 10:36:16.000000000 +0200
@@ -1,5 +1,6 @@
 from urlwatch.filters import GetElementById
 from urlwatch.filters import GetElementByTag
+from urlwatch.filters import JsonFormatFilter
 
 from nose.tools import eq_
 
@@ -35,3 +36,29 @@
     """, 'div')
     print(result)
     eq_(result, """<div>foo</div><div>bar</div>""")
+
+
+def test_json_format_filter():
+    json_format_filter = JsonFormatFilter(None, None)
+    result = json_format_filter.filter(
+        """{"field1": {"f1.1": "value"},"field2": "value"}""")
+    print(result)
+    eq_(result, """{
+    "field1": {
+        "f1.1": "value"
+    },
+    "field2": "value"
+}""")
+
+
+def test_json_format_filter_subfilter():
+    json_format_filter = JsonFormatFilter(None, None)
+    result = json_format_filter.filter(
+        """{"field1": {"f1.1": "value"},"field2": "value"}""", "2")
+    print(result)
+    eq_(result, """{
+  "field1": {
+    "f1.1": "value"
+  },
+  "field2": "value"
+}""")
diff -urN '--exclude=CVS' '--exclude=.cvsignore' '--exclude=.svn' '--exclude=.svnignore' old/urlwatch-2.13/test/test_handler.py new/urlwatch-2.14/test/test_handler.py
--- old/urlwatch-2.13/test/test_handler.py	2018-06-03 14:42:56.000000000 +0200
+++ new/urlwatch-2.14/test/test_handler.py	2018-08-30 10:36:16.000000000 +0200
@@ -161,14 +161,14 @@
 def test_number_of_tries_in_cache_is_increased():
     urlwatcher, cache_storage = prepare_retry_test()
     job = urlwatcher.jobs[0]
-    old_data, timestamp, tries = cache_storage.load(job, job.get_guid())
+    old_data, timestamp, tries, etag = cache_storage.load(job, job.get_guid())
     assert tries == 0
 
     urlwatcher.run_jobs()
     urlwatcher.run_jobs()
 
     job = urlwatcher.jobs[0]
-    old_data, timestamp, tries = cache_storage.load(job, job.get_guid())
+    old_data, timestamp, tries, etag = cache_storage.load(job, job.get_guid())
 
     assert tries == 2
     assert urlwatcher.report.job_states[-1].verb == 'error'
@@ -179,7 +179,7 @@
     urlwatcher, cache_storage = prepare_retry_test()
 
     job = urlwatcher.jobs[0]
-    old_data, timestamp, tries = cache_storage.load(job, job.get_guid())
+    old_data, timestamp, tries, etag = cache_storage.load(job, job.get_guid())
     assert tries == 0
 
     urlwatcher.run_jobs()
@@ -194,13 +194,13 @@
     urlwatcher, cache_storage = prepare_retry_test()
 
     job = urlwatcher.jobs[0]
-    old_data, timestamp, tries = cache_storage.load(job, job.get_guid())
+    old_data, timestamp, tries, etag = cache_storage.load(job, job.get_guid())
     assert tries == 0
 
     urlwatcher.run_jobs()
 
     job = urlwatcher.jobs[0]
-    old_data, timestamp, tries = cache_storage.load(job, job.get_guid())
+    old_data, timestamp, tries, etag = cache_storage.load(job, job.get_guid())
     assert tries == 1
 
     # use an url that definitely exists
@@ -210,5 +210,5 @@
     urlwatcher.run_jobs()
 
     job = urlwatcher.jobs[0]
-    old_data, timestamp, tries = cache_storage.load(job, job.get_guid())
+    old_data, timestamp, tries, etag = cache_storage.load(job, job.get_guid())
     assert tries == 0

    

commit urlwatch for openSUSE:Factory

root