extended tests

Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters
fix syntax error
2022-10-28 14:08:29 +02:00 · 2022-10-28 13:50:19 +02:00 · 2022-10-27 12:05:57 -04:00 · 2022-10-27 12:03:20 -04:00 · 2022-10-27 11:57:55 -04:00 · 2022-10-27 17:56:56 +02:00
28 changed files with 741 additions and 128 deletions
--- a/.github/workflows/test-container-build.yml
+++ b/.github/workflows/test-container-build.yml
@@ -1,12 +1,21 @@
 name: ChangeDetection.io Container Build Test

 # Triggers the workflow on push or pull request events
+
+# This line doesnt work, even tho it is the documented one
+#on: [push, pull_request]
+
 on:
  push:
    paths:
      - requirements.txt
      - Dockerfile

+  pull_request:
+    paths:
+      - requirements.txt
+      - Dockerfile
+
  # Changes to requirements.txt packages and Dockerfile may or may not always be compatible with arm etc, so worth testing
  # @todo: some kind of path filter for requirements.txt and Dockerfile
 jobs:
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -6,7 +6,7 @@ Otherwise, it's always best to PR into the `dev` branch.

 Please be sure that all new functionality has a matching test!

-Use `pytest` to validate/test, you can run the existing tests as `pytest tests/test_notifications.py` for example
+Use `pytest` to validate/test, you can run the existing tests as `pytest tests/test_notification.py` for example

 ```
 pip3 install -r requirements-dev
--- a/6
+++ b/6
@@ -26,6 +26,11 @@ RUN pip install --target=/dependencies -r /requirements.txt
 RUN pip install --target=/dependencies playwright~=1.26 \
    || echo "WARN: Failed to install Playwright. The application can still run, but the Playwright option will be disabled."

+
+RUN pip install --target=/dependencies jq~=1.3 \
+    || echo "WARN: Failed to install JQ. The application can still run, but the Jq: filter option will be disabled."
+
+
 # Final image stage
 FROM python:3.8-slim

@@ -59,6 +64,7 @@ EXPOSE 5000

 # The actual flask app
 COPY changedetectionio /app/changedetectionio
+
 # The eventlet server wrapper
 COPY changedetection.py /app/changedetection.py

--- a/MANIFEST.in
+++ b/MANIFEST.in
@@ -2,6 +2,7 @@ recursive-include changedetectionio/api *
 recursive-include changedetectionio/templates *
 recursive-include changedetectionio/static *
 recursive-include changedetectionio/model *
+recursive-include changedetectionio/tests *
 include changedetection.py
 global-exclude *.pyc
 global-exclude node_modules
--- a/README.md
+++ b/README.md
@@ -121,8 +121,8 @@ See the wiki for more information https://github.com/dgtlmoon/changedetection.io


 ## Filters
-XPath, JSONPath, jq, and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools.

+XPath, JSONPath, jq, and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools. 
 (We support LXML `re:test`, `re:math` and `re:replace`.)

 ## Notifications
@@ -161,46 +161,14 @@ This will re-parse the JSON and apply formatting to the text, making it super ea

 ### JSONPath or jq?

-For more complex parsing, filtering, and modifying of JSON data, jq is recommended due to the built-in operators and functions. Refer to the [documentation](https://stedolan.github.io/jq/manual/) for more information on jq.
+For more complex parsing, filtering, and modifying of JSON data, jq is recommended due to the built-in operators and functions. Refer to the [documentation](https://stedolan.github.io/jq/manual/) for more specifc information on jq.

-The example below adds the price in dollars to each item in the JSON data, and then filters to only show items that are greater than 10.
+One big advantage of `jq` is that you can use logic in your JSON filter, such as filters to only show items that have a value greater than/less than etc.

-#### Sample input data from API
-```
-{
-    "items": [
-        {
-           "name": "Product A",
-           "priceInCents": 2500
-        },
-        {
-           "name": "Product B",
-           "priceInCents": 500
-        },
-        {
-           "name": "Product C",
-           "priceInCents": 2000
-        }
-    ]
-}
-```
+See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/JSON-Selector-Filter-help for more information and examples

-#### Sample jq
-`jq:.items[] | . + { "priceInDollars": (.priceInCents / 100) } | select(.priceInDollars > 10)`
+Note: `jq` library must be added separately (`pip3 install jq`)

-#### Sample output data
-```
-{
-  "name": "Product A",
-  "priceInCents": 2500,
-  "priceInDollars": 25
-}
-{
-  "name": "Product C",
-  "priceInCents": 2000,
-  "priceInDollars": 20
-}
-```

 ### Parse JSON embedded in HTML!

@@ -216,9 +184,9 @@ When you enable a `json:` or `jq:` filter, you can even automatically extract an

 `json:$.price` or `jq:.price` would give `23.50`, or you can extract the whole structure

-## Proxy configuration
+## Proxy Configuration

-See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration
+See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration , we also support using [BrightData proxy services where possible]( https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support)

 ## Raspberry Pi support?

--- a/changedetectionio/init.py
+++ b/changedetectionio/init.py
@@ -33,7 +33,7 @@ from flask_wtf import CSRFProtect
 from changedetectionio import html_tools
 from changedetectionio.api import api_v1

-__version__ = '0.39.20.1'
+__version__ = '0.39.20.4'

 datastore = None

@@ -194,6 +194,9 @@ def changedetection_app(config=None, datastore_o=None):
    watch_api.add_resource(api_v1.Watch, '/api/v1/watch/<string:uuid>',
                           resource_class_kwargs={'datastore': datastore, 'update_q': update_q})

+    watch_api.add_resource(api_v1.SystemInfo, '/api/v1/systeminfo',
+                           resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
+



@@ -636,20 +639,27 @@ def changedetection_app(config=None, datastore_o=None):
            # Only works reliably with Playwright
            visualselector_enabled = os.getenv('PLAYWRIGHT_DRIVER_URL', False) and default['fetch_backend'] == 'html_webdriver'

+            # JQ is difficult to install on windows and must be manually added (outside requirements.txt)
+            jq_support = True
+            try:
+                import jq
+            except ModuleNotFoundError:
+                jq_support = False

            output = render_template("edit.html",
-                                     uuid=uuid,
-                                     watch=datastore.data['watching'][uuid],
-                                     form=form,
-                                     has_empty_checktime=using_default_check_time,
-                                     has_default_notification_urls=True if len(datastore.data['settings']['application']['notification_urls']) else False,
-                                     using_global_webdriver_wait=default['webdriver_delay'] is None,
                                     current_base_url=datastore.data['settings']['application']['base_url'],
                                     emailprefix=os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
+                                     form=form,
+                                     has_default_notification_urls=True if len(datastore.data['settings']['application']['notification_urls']) else False,
+                                     has_empty_checktime=using_default_check_time,
+                                     jq_support=jq_support,
+                                     playwright_enabled=os.getenv('PLAYWRIGHT_DRIVER_URL', False),
                                     settings_application=datastore.data['settings']['application'],
+                                     using_global_webdriver_wait=default['webdriver_delay'] is None,
+                                     uuid=uuid,
                                     visualselector_data_is_ready=visualselector_data_is_ready,
                                     visualselector_enabled=visualselector_enabled,
-                                     playwright_enabled=os.getenv('PLAYWRIGHT_DRIVER_URL', False)
+                                     watch=datastore.data['watching'][uuid],
                                     )

        return output
@@ -809,8 +819,10 @@ def changedetection_app(config=None, datastore_o=None):

        newest_file = history[dates[-1]]

+        # Read as binary and force decode as UTF-8
+        # Windows may fail decode in python if we just use 'r' mode (chardet decode exception)
        try:
-            with open(newest_file, 'r') as f:
+            with open(newest_file, 'r', encoding='utf-8', errors='ignore') as f:
                newest_version_file_contents = f.read()
        except Exception as e:
            newest_version_file_contents = "Unable to read {}.\n".format(newest_file)
@@ -823,7 +835,7 @@ def changedetection_app(config=None, datastore_o=None):
            previous_file = history[dates[-2]]

        try:
-            with open(previous_file, 'r') as f:
+            with open(previous_file, 'r', encoding='utf-8', errors='ignore') as f:
                previous_version_file_contents = f.read()
        except Exception as e:
            previous_version_file_contents = "Unable to read {}.\n".format(previous_file)
@@ -900,7 +912,7 @@ def changedetection_app(config=None, datastore_o=None):
        timestamp = list(watch.history.keys())[-1]
        filename = watch.history[timestamp]
        try:
-            with open(filename, 'r') as f:
+            with open(filename, 'r', encoding='utf-8', errors='ignore') as f:
                tmp = f.readlines()

                # Get what needs to be highlighted
@@ -975,9 +987,6 @@ def changedetection_app(config=None, datastore_o=None):

        # create a ZipFile object
        backupname = "changedetection-backup-{}.zip".format(int(time.time()))
-
-        # We only care about UUIDS from the current index file
-        uuids = list(datastore.data['watching'].keys())
        backup_filepath = os.path.join(datastore_o.datastore_path, backupname)

        with zipfile.ZipFile(backup_filepath, "w",
@@ -993,12 +1002,12 @@ def changedetection_app(config=None, datastore_o=None):
            # Add the flask app secret
            zipObj.write(os.path.join(datastore_o.datastore_path, "secret.txt"), arcname="secret.txt")

-            # Add any snapshot data we find, use the full path to access the file, but make the file 'relative' in the Zip.
-            for txt_file_path in Path(datastore_o.datastore_path).rglob('*.txt'):
-                parent_p = txt_file_path.parent
-                if parent_p.name in uuids:
-                    zipObj.write(txt_file_path,
-                                 arcname=str(txt_file_path).replace(datastore_o.datastore_path, ''),
+            # Add any data in the watch data directory.
+            for uuid, w in datastore.data['watching'].items():
+                for f in Path(w.watch_data_dir).glob('*'):
+                    zipObj.write(f,
+                                 # Use the full path to access the file, but make the file 'relative' in the Zip.
+                                 arcname=os.path.join(f.parts[-2], f.parts[-1]),
                                 compress_type=zipfile.ZIP_DEFLATED,
                                 compresslevel=8)

--- a/changedetectionio/api/api_v1.py
+++ b/changedetectionio/api/api_v1.py
@@ -122,3 +122,37 @@ class CreateWatch(Resource):
            return {'status': "OK"}, 200

        return list, 200
+
+class SystemInfo(Resource):
+    def __init__(self, **kwargs):
+        # datastore is a black box dependency
+        self.datastore = kwargs['datastore']
+        self.update_q = kwargs['update_q']
+
+    @auth.check_token
+    def get(self):
+        import time
+        overdue_watches = []
+
+        # Check all watches and report which have not been checked but should have been
+
+        for uuid, watch in self.datastore.data.get('watching', {}).items():
+            # see if now - last_checked is greater than the time that should have been
+            # this is not super accurate (maybe they just edited it) but better than nothing
+            t = watch.threshold_seconds()
+            if not t:
+                # Use the system wide default
+                t = self.datastore.threshold_seconds
+
+            time_since_check = time.time() - watch.get('last_checked')
+
+            # Allow 5 minutes of grace time before we decide it's overdue
+            if time_since_check - (5 * 60) > t:
+                overdue_watches.append(uuid)
+
+        return {
+                   'queue_size': self.update_q.qsize(),
+                   'overdue_watches': overdue_watches,
+                   'uptime': round(time.time() - self.datastore.start_time, 2),
+                   'watch_count': len(self.datastore.data.get('watching', {}))
+               }, 200
--- a/changedetectionio/changedetection.py
+++ b/changedetectionio/changedetection.py
@@ -102,6 +102,14 @@ def main():
                    has_password=datastore.data['settings']['application']['password'] != False
                    )

+    # Monitored websites will not receive a Referer header
+    # when a user clicks on an outgoing link.
+    @app.after_request
+    def hide_referrer(response):
+        if os.getenv("HIDE_REFERER", False):
+            response.headers["Referrer-Policy"] = "no-referrer"
+        return response
+
    # Proxy sub-directory support
    # Set environment var USE_X_SETTINGS=1 on this script
    # And then in your proxy_pass settings
--- a/changedetectionio/fetch_site_status.py
+++ b/changedetectionio/fetch_site_status.py
@@ -2,14 +2,14 @@ import hashlib
 import logging
 import os
 import re
-import time
 import urllib3
+import difflib
+

 from changedetectionio import content_fetcher, html_tools

 urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

-
 # Some common stuff here that can be moved to a base class
 # (set_proxy_from_list)
 class perform_site_check():
@@ -65,7 +65,9 @@ class perform_site_check():
            request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')

        timeout = self.datastore.data['settings']['requests'].get('timeout')
-        url = watch.get('url')
+
+        url = watch.link
+
        request_body = self.datastore.data['watching'][uuid].get('body')
        request_method = self.datastore.data['watching'][uuid].get('method')
        ignore_status_codes = self.datastore.data['watching'][uuid].get('ignore_status_codes', False)
@@ -287,8 +289,23 @@ class perform_site_check():
                else:
                    logging.debug("check_unique_lines: UUID {} had unique content".format(uuid))

-        # Always record the new checksum
+        if changed_detected:
+            if not watch.get("trigger_add", True) or not watch.get("trigger_del", True): # if we are supposed to filter any diff types
+                # get the diff types present in the watch
+                diff_types = watch.get_diff_types(text_content_before_ignored_filter)
+                print("Diff components found: " + str(diff_types))
+
+                # Only Additions (deletions are turned off)
+                if not watch["trigger_del"] and diff_types["del"] and not diff_types["add"]:
+                    changed_detected = False
+
+                # Only Deletions (additions are turned off)
+                elif not watch["trigger_add"] and  diff_types["add"] and not diff_types["del"]:
+                    changed_detected = False
+
+        # Always record the new checksum and the new text
        update_obj["previous_md5"] = fetched_md5
+        watch.save_previous_text(text_content_before_ignored_filter)

        # On the first run of a site, watch['previous_md5'] will be None, set it the current one.
        if not watch.get('previous_md5'):
--- a/changedetectionio/forms.py
+++ b/changedetectionio/forms.py
@@ -303,12 +303,16 @@ class ValidateCSSJSONXPATHInput(object):

                # Re #265 - maybe in the future fetch the page and offer a
                # warning/notice that its possible the rule doesnt yet match anything?
-
-            if 'jq:' in line:
                if not self.allow_json:
                    raise ValidationError("jq not permitted in this field!")

-                import jq
+            if 'jq:' in line:
+                try:
+                    import jq
+                except ModuleNotFoundError:
+                    # `jq` requires full compilation in windows and so isn't generally available
+                    raise ValidationError("jq not support not found")
+
                input = line.replace('jq:', '')

                try:
@@ -319,6 +323,18 @@ class ValidateCSSJSONXPATHInput(object):
                except:
                    raise ValidationError("A system-error occurred when validating your jq expression")

+class ValidateDiffFilters(object):
+    """
+    Validates that at least one filter checkbox is selected
+    """
+    def __init__(self, message=None):
+        self.message = message
+
+    def __call__(self, form, field):
+        if not form.trigger_add.data and not form.trigger_del.data:
+            message = field.gettext('At least one filter checkbox must be selected')
+            raise ValidationError(message)
+

 class quickWatchForm(Form):
    url = fields.URLField('URL', validators=[validateURL()])
@@ -361,6 +377,8 @@ class watchForm(commonSettingsForm):
    check_unique_lines = BooleanField('Only trigger when new lines appear', default=False)
    trigger_text = StringListField('Trigger/wait for text', [validators.Optional(), ValidateListRegex()])
    text_should_not_be_present = StringListField('Block change-detection if text matches', [validators.Optional(), ValidateListRegex()])
+    trigger_add = BooleanField('Additions', [ValidateDiffFilters()], default=True)
+    trigger_del = BooleanField('Deletions', [ValidateDiffFilters()], default=True)

    webdriver_js_execute_code = TextAreaField('Execute JavaScript before change detection', render_kw={"rows": "5"}, validators=[validators.Optional()])

--- a/changedetectionio/html_tools.py
+++ b/changedetectionio/html_tools.py
@@ -1,12 +1,11 @@
-import json
-from typing import List

 from bs4 import BeautifulSoup
-from jsonpath_ng.ext import parse
-import jq
-import re
 from inscriptis import get_text
 from inscriptis.model.config import ParserConfig
+from jsonpath_ng.ext import parse
+from typing import List
+import json
+import re

 class FilterNotFoundInResponse(ValueError):
    def __init__(self, msg):
@@ -85,9 +84,18 @@ def _parse_json(json_data, json_filter):
        jsonpath_expression = parse(json_filter.replace('json:', ''))
        match = jsonpath_expression.find(json_data)
        return _get_stripped_text_from_json_match(match)
+
    if 'jq:' in json_filter:
+
+        try:
+            import jq
+        except ModuleNotFoundError:
+            # `jq` requires full compilation in windows and so isn't generally available
+            raise Exception("jq not support not found")
+
        jq_expression = jq.compile(json_filter.replace('jq:', ''))
        match = jq_expression.input(json_data).all()
+
        return _get_stripped_text_from_json_match(match)

 def _get_stripped_text_from_json_match(match):
--- a/changedetectionio/model/Watch.py
+++ b/changedetectionio/model/Watch.py
@@ -1,6 +1,8 @@
-import os
-import uuid as uuid_builder
 from distutils.util import strtobool
+import logging
+import os
+import time
+import uuid

 minimum_seconds_recheck_time = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 60))
 mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86400 * 7}
@@ -22,7 +24,7 @@ class model(dict):
            #'newest_history_key': 0,
            'title': None,
            'previous_md5': False,
-            'uuid': str(uuid_builder.uuid4()),
+            'uuid': str(uuid.uuid4()),
            'headers': {},  # Extra headers to send
            'body': None,
            'method': 'GET',
@@ -45,6 +47,8 @@ class model(dict):
            'consecutive_filter_failures': 0, # Every time the CSS/xPath filter cannot be located, reset when all is fine.
            'extract_title_as_title': False,
            'check_unique_lines': False, # On change-detected, compare against all history if its something new
+            'trigger_add': True,
+            'trigger_del': True,
            'proxy': None, # Preferred proxy connection
            # Re #110, so then if this is set to None, we know to use the default value instead
            # Requires setting to None on submit if it's the same as the default
@@ -60,7 +64,7 @@ class model(dict):
        self.update(self.__base_config)
        self.__datastore_path = kw['datastore_path']

-        self['uuid'] = str(uuid_builder.uuid4())
+        self['uuid'] = str(uuid.uuid4())

        del kw['datastore_path']

@@ -82,10 +86,19 @@ class model(dict):
        return False

    def ensure_data_dir_exists(self):
-        target_path = os.path.join(self.__datastore_path, self['uuid'])
-        if not os.path.isdir(target_path):
-            print ("> Creating data dir {}".format(target_path))
-            os.mkdir(target_path)
+        if not os.path.isdir(self.watch_data_dir):
+            print ("> Creating data dir {}".format(self.watch_data_dir))
+            os.mkdir(self.watch_data_dir)
+
+    @property
+    def link(self):
+        url = self.get('url', '')
+        if '{%' in url or '{{' in url:
+            from jinja2 import Environment
+            # Jinja2 available in URLs along with https://pypi.org/project/jinja2-time/
+            jinja2_env = Environment(extensions=['jinja2_time.TimeExtension'])
+            return str(jinja2_env.from_string(url).render())
+        return url

    @property
    def label(self):
@@ -109,16 +122,40 @@ class model(dict):

    @property
    def history(self):
+        """History index is just a text file as a list
+            {watch-uuid}/history.txt
+
+            contains a list like
+
+            {epoch-time},{filename}\n
+
+            We read in this list as the history information
+
+        """
        tmp_history = {}
-        import logging
-        import time

        # Read the history file as a dict
-        fname = os.path.join(self.__datastore_path, self.get('uuid'), "history.txt")
+        fname = os.path.join(self.watch_data_dir, "history.txt")
        if os.path.isfile(fname):
            logging.debug("Reading history index " + str(time.time()))
            with open(fname, "r") as f:
-                tmp_history = dict(i.strip().split(',', 2) for i in f.readlines())
+                for i in f.readlines():
+                    if ',' in i:
+                        k, v = i.strip().split(',', 2)
+
+                        # The index history could contain a relative path, so we need to make the fullpath
+                        # so that python can read it
+                        if not '/' in v and not '\'' in v:
+                            v = os.path.join(self.watch_data_dir, v)
+                        else:
+                            # It's possible that they moved the datadir on older versions
+                            # So the snapshot exists but is in a different path
+                            snapshot_fname = v.split('/')[-1]
+                            proposed_new_path = os.path.join(self.watch_data_dir, snapshot_fname)
+                            if not os.path.exists(v) and os.path.exists(proposed_new_path):
+                                v = proposed_new_path
+
+                        tmp_history[k] = v

        if len(tmp_history):
            self.__newest_history_key = list(tmp_history.keys())[-1]
@@ -129,7 +166,7 @@ class model(dict):

    @property
    def has_history(self):
-        fname = os.path.join(self.__datastore_path, self.get('uuid'), "history.txt")
+        fname = os.path.join(self.watch_data_dir, "history.txt")
        return os.path.isfile(fname)

    # Returns the newest key, but if theres only 1 record, then it's counted as not being new, so return 0.
@@ -148,33 +185,58 @@ class model(dict):
    # Save some text file to the appropriate path and bump the history
    # result_obj from fetch_site_status.run()
    def save_history_text(self, contents, timestamp):
-        import uuid
-        import logging
-
-        output_path = "{}/{}".format(self.__datastore_path, self['uuid'])

        self.ensure_data_dir_exists()
+        snapshot_fname = "{}.txt".format(str(uuid.uuid4()))

-        snapshot_fname = "{}/{}.stripped.txt".format(output_path, uuid.uuid4())
-        logging.debug("Saving history text {}".format(snapshot_fname))
-
-        with open(snapshot_fname, 'wb') as f:
+        # in /diff/ and /preview/ we are going to assume for now that it's UTF-8 when reading
+        # most sites are utf-8 and some are even broken utf-8
+        with open(os.path.join(self.watch_data_dir, snapshot_fname), 'wb') as f:
            f.write(contents)
            f.close()

        # Append to index
        # @todo check last char was \n
-        index_fname = "{}/history.txt".format(output_path)
+        index_fname = os.path.join(self.watch_data_dir, "history.txt")
        with open(index_fname, 'a') as f:
            f.write("{},{}\n".format(timestamp, snapshot_fname))
            f.close()

        self.__newest_history_key = timestamp
-        self.__history_n+=1
+        self.__history_n += 1

-        #@todo bump static cache of the last timestamp so we dont need to examine the file to set a proper ''viewed'' status
+        # @todo bump static cache of the last timestamp so we dont need to examine the file to set a proper ''viewed'' status
        return snapshot_fname

+    # Save previous text snapshot for diffing - used for calculating additions and deletions
+    def save_previous_text(self, contents):
+        import logging
+
+        output_path = os.path.join(self.__datastore_path, self['uuid'])
+
+        # Incase the operator deleted it, check and create.
+        self.ensure_data_dir_exists()
+
+        snapshot_fname = os.path.join(self.watch_data_dir, "previous.txt")
+        logging.debug("Saving previous text {}".format(snapshot_fname))
+
+        with open(snapshot_fname, 'wb') as f:
+            f.write(contents)
+
+        return snapshot_fname
+
+    # Get previous text snapshot for diffing - used for calculating additions and deletions
+    def get_previous_text(self):
+
+        snapshot_fname = os.path.join(self.watch_data_dir, "previous.txt")
+        if self.history_n < 1:
+            return ""
+
+        with open(snapshot_fname, 'rb') as f:
+            contents = f.read()
+
+        return contents
+
    @property
    def has_empty_checktime(self):
        # using all() + dictionary comprehension
@@ -204,15 +266,40 @@ class model(dict):
        # if not, something new happened
        return not local_lines.issubset(existing_history)

+    # Get diff types (addition, deletion, modification) from the previous snapshot and new_text
+    # uses similar algorithm to customSequenceMatcher in diff.py
+    # Returns a dict of diff types and wether they are present in the diff
+    def get_diff_types(self, new_text):
+        import difflib
+
+        diff_types = {
+            'add': False,
+            'del': False,
+        }
+
+        # get diff types using difflib
+        cruncher = difflib.SequenceMatcher(isjunk=lambda x: x in " \\t", a=str(self.get_previous_text()), b=str(new_text))
+
+        for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
+            if tag == 'delete':
+                diff_types["del"] = True
+            elif tag == 'insert':
+                diff_types["add"] = True
+            elif tag == 'replace':
+                diff_types["del"] = True
+                diff_types["add"] = True
+
+        return diff_types
+
    def get_screenshot(self):
-        fname = os.path.join(self.__datastore_path, self['uuid'], "last-screenshot.png")
+        fname = os.path.join(self.watch_data_dir, "last-screenshot.png")
        if os.path.isfile(fname):
            return fname

        return False

    def __get_file_ctime(self, filename):
-        fname = os.path.join(self.__datastore_path, self['uuid'], filename)
+        fname = os.path.join(self.watch_data_dir, filename)
        if os.path.isfile(fname):
            return int(os.path.getmtime(fname))
        return False
@@ -237,9 +324,14 @@ class model(dict):
    def snapshot_error_screenshot_ctime(self):
        return self.__get_file_ctime('last-error-screenshot.png')

+    @property
+    def watch_data_dir(self):
+        # The base dir of the watch data
+        return os.path.join(self.__datastore_path, self['uuid'])
+    
    def get_error_text(self):
        """Return the text saved from a previous request that resulted in a non-200 error"""
-        fname = os.path.join(self.__datastore_path, self['uuid'], "last-error.txt")
+        fname = os.path.join(self.watch_data_dir, "last-error.txt")
        if os.path.isfile(fname):
            with open(fname, 'r') as f:
                return f.read()
@@ -247,7 +339,7 @@ class model(dict):

    def get_error_snapshot(self):
        """Return path to the screenshot that resulted in a non-200 error"""
-        fname = os.path.join(self.__datastore_path, self['uuid'], "last-error-screenshot.png")
+        fname = os.path.join(self.watch_data_dir, "last-error-screenshot.png")
        if os.path.isfile(fname):
            return fname
        return False
--- a/changedetectionio/run_all_tests.sh
+++ b/changedetectionio/run_all_tests.sh
@@ -9,6 +9,8 @@
 # exit when any command fails
 set -e

+SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
+
 find tests/test_*py -type f|while read test_name
 do
  echo "TEST RUNNING $test_name"
@@ -23,6 +25,13 @@ export BASE_URL="https://really-unique-domain.io"
 pytest tests/test_notification.py


+## JQ + JSON: filter test
+# jq is not available on windows and we should just test it when the package is installed
+# this will re-test with jq support
+pip3 install jq~=1.3
+pytest tests/test_jsonpath_jq_selector.py
+
+
 # Now for the selenium and playwright/browserless fetchers
 # Note - this is not UI functional tests - just checking that each one can fetch the content

@@ -38,7 +47,9 @@ docker kill $$-test_selenium

 echo "TESTING WEBDRIVER FETCH > PLAYWRIGHT/BROWSERLESS..."
 # Not all platforms support playwright (not ARM/rPI), so it's not packaged in requirements.txt
-pip3 install playwright~=1.24
+PLAYWRIGHT_VERSION=$(grep -i -E "RUN pip install.+" "$SCRIPT_DIR/../Dockerfile" | grep --only-matching -i -E "playwright[=><~+]+[0-9\.]+")
+echo "using $PLAYWRIGHT_VERSION"
+pip3 install "$PLAYWRIGHT_VERSION"
 docker run -d --name $$-test_browserless -e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" --rm  -p 3000:3000  --shm-size="2g"  browserless/chrome:1.53-chrome-stable
 # takes a while to spin up
 sleep 5
--- a/changedetectionio/static/styles/styles.scss
+++ b/changedetectionio/static/styles/styles.scss
@@ -156,7 +156,7 @@ body:after, body:before {

 .fetch-error {
  padding-top: 1em;
-  font-size: 60%;
+  font-size: 80%;
  max-width: 400px;
  display: block;
 }
@@ -803,4 +803,4 @@ ul {
  padding: 0.5rem;
  border-radius: 5px;
  color: #ff3300;
-}
+}
--- a/changedetectionio/store.py
+++ b/changedetectionio/store.py
@@ -30,14 +30,14 @@ class ChangeDetectionStore:
    def __init__(self, datastore_path="/datastore", include_default_watches=True, version_tag="0.0.0"):
        # Should only be active for docker
        # logging.basicConfig(filename='/dev/stdout', level=logging.INFO)
-        self.needs_write = False
+        self.__data = App.model()
        self.datastore_path = datastore_path
        self.json_store_path = "{}/url-watches.json".format(self.datastore_path)
+        self.needs_write = False
        self.proxy_list = None
+        self.start_time = time.time()
        self.stop_thread = False

-        self.__data = App.model()
-
        # Base definition for all watchers
        # deepcopy part of #569 - not sure why its needed exactly
        self.generic_definition = deepcopy(Watch.model(datastore_path = datastore_path, default={}))
@@ -548,6 +548,10 @@ class ChangeDetectionStore:
    # `last_changed` not needed, we pull that information from the history.txt index
    def update_4(self):
        for uuid, watch in self.data['watching'].items():
+            # Be sure it's recalculated
+            p = watch.history
+            if watch.history_n < 2:
+                watch['last_changed'] = 0
            try:
                # Remove it from the struct
                del(watch['last_changed'])
@@ -583,3 +587,23 @@ class ChangeDetectionStore:
        for v in ['User-Agent', 'Accept', 'Accept-Encoding', 'Accept-Language']:
            if self.data['settings']['headers'].get(v):
                del self.data['settings']['headers'][v]
+                
+    # Generate a previous.txt for all watches that do not have one and contain history
+    def update_8(self):
+        for uuid, watch in self.data['watching'].items():
+            # Make sure we actually have history
+            if (watch.history_n == 0):
+                continue
+            latest_file_name = watch.history[watch.newest_history_key]
+
+
+            # Check if the previous.txt exists
+            if not os.path.exists(os.path.join(watch.watch_data_dir, "previous.txt")):
+                # Generate a previous.txt
+                with open(os.path.join(watch.watch_data_dir, "previous.txt"), "wb") as f:
+                    # Fill it with the latest history
+                    latest_file_name = watch.history[watch.newest_history_key]
+                    with open(latest_file_name, "rb") as f2:
+                        f.write(f2.read())
+                        
+
--- a/changedetectionio/templates/edit.html
+++ b/changedetectionio/templates/edit.html
@@ -40,7 +40,8 @@
                <fieldset>
                    <div class="pure-control-group">
                        {{ render_field(form.url, placeholder="https://...", required=true, class="m-d") }}
-                        <span class="pure-form-message-inline">Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a></span>
+                        <span class="pure-form-message-inline">Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a></span><br/>
+                        <span class="pure-form-message-inline">You can use variables in the URL, perfect for inserting the current date and other logic, <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">help and examples here</a></span><br/>
                    </div>
                    <div class="pure-control-group">
                        {{ render_field(form.title, class="m-d") }}
@@ -172,6 +173,16 @@ User-Agent: wonderbra 1.0") }}
                            <span class="pure-form-message-inline">Good for websites that just move the content around, and you want to know when NEW content is added, compares new lines against all history for this watch.</span>
                        </div>
                    </fieldset>
+                    <fieldset>
+                        <div class="pure-control-group">
+                            <label for="trigger-type">Filter and restrict change detection of content to</label>
+                            {{ render_checkbox_field(form.trigger_add, class="trigger-type") }}
+                            {{ render_checkbox_field(form.trigger_del, class="trigger-type") }}
+                            <span class="pure-form-message-inline">
+                                Filters the change-detection of this watch to only this type of content change. <strong>Replacements</strong> (neither additions nor deletions) are always included. The 'diff' will still include all changes.
+                            </span>
+                        </div>
+                    </fieldset>
                    <div class="pure-control-group">
                        {% set field = render_field(form.css_filter,
                            placeholder=".class-name or #some-id, or other CSS selector rule.",
@@ -184,10 +195,14 @@ User-Agent: wonderbra 1.0") }}
                        <span class="pure-form-message-inline">
                    <ul>
                        <li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
-                        <li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a>.
+                        <li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a> (if installed).
                            <ul>
                                <li>JSONPath: Prefix with <code>json:</code>, use <code>json:$</code> to force re-formatting if required,  <a href="https://jsonpath.com/" target="new">test your JSONPath here</a>.</li>
+                                {% if jq_support %}
                                <li>jq: Prefix with <code>jq:</code> and <a href="https://jqplay.org/" target="new">test your jq here</a>. Using <a href="https://stedolan.github.io/jq/" target="new">jq</a> allows for complex filtering and processing of JSON data with built-in functions, regex, filtering, and more. See examples and documentation <a href="https://stedolan.github.io/jq/manual/" target="new">here</a>.</li>
+                                {% else %}
+                                <li>jq support not installed</li>
+                                {% endif %}
                            </ul>
                        </li>
                        <li>XPath - Limit text to this XPath rule, simply start with a forward-slash,
@@ -198,7 +213,7 @@ User-Agent: wonderbra 1.0") }}
                            </ul>
                            </li>
                    </ul>
-                    Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath, or jq selector rules before filing an issue on GitHub! <a
+                    Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath{% if jq_support %}, or jq selector{%endif%} rules before filing an issue on GitHub! <a
                                href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br/>
                </span>
                    </div>
--- a/changedetectionio/templates/watch-overview.html
+++ b/changedetectionio/templates/watch-overview.html
@@ -87,7 +87,7 @@
                    <a class="state-{{'on' if watch.notification_muted}}" href="{{url_for('index', op='mute', uuid=watch.uuid, tag=active_tag)}}"><img src="{{url_for('static_content', group='images', filename='bell-off.svg')}}" alt="Mute notifications" title="Mute notifications"/></a>
                </td>
                <td class="title-col inline">{{watch.title if watch.title is not none and watch.title|length > 0 else watch.url}}
-                    <a class="external" target="_blank" rel="noopener" href="{{ watch.url.replace('source:','') }}"></a>
+                    <a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}"></a>
                    <a href="{{url_for('form_share_put_watch', uuid=watch.uuid)}}"><img style="height: 1em;display:inline-block;" src="{{url_for('static_content', group='images', filename='spread.svg')}}" /></a>

                    {%if watch.fetch_backend == "html_webdriver" %}<img style="height: 1em; display:inline-block;" src="{{url_for('static_content', group='images', filename='Google-Chrome-icon.png')}}" />{% endif %}
--- a/changedetectionio/tests/test_api.py
+++ b/changedetectionio/tests/test_api.py
@@ -147,6 +147,16 @@ def test_api_simple(client, live_server):
    # @todo how to handle None/default global values?
    assert watch['history_n'] == 2, "Found replacement history section, which is in its own API"

+    # basic systeminfo check
+    res = client.get(
+        url_for("systeminfo"),
+        headers={'x-api-key': api_key},
+    )
+    info = json.loads(res.data)
+    assert info.get('watch_count') == 1
+    assert info.get('uptime') > 0.5
+
+
    # Finally delete the watch
    res = client.delete(
        url_for("watch", uuid=watch_uuid),
--- a/changedetectionio/tests/test_backup.py
+++ b/changedetectionio/tests/test_backup.py
@@ -1,18 +1,31 @@
 #!/usr/bin/python3

-import time
+from .util import set_original_response, set_modified_response, live_server_setup
 from flask import url_for
 from urllib.request import urlopen
-from . util import set_original_response, set_modified_response, live_server_setup
+from zipfile import ZipFile
+import re
+import time


 def test_backup(client, live_server):
-
    live_server_setup(live_server)

+    set_original_response()
+
    # Give the endpoint time to spin up
    time.sleep(1)

+    # Add our URL to the import page
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": url_for('test_endpoint', _external=True)},
+        follow_redirects=True
+    )
+
+    assert b"1 Imported" in res.data
+    time.sleep(3)
+
    res = client.get(
        url_for("get_backup"),
        follow_redirects=True
@@ -20,6 +33,19 @@ def test_backup(client, live_server):

    # Should get the right zip content type
    assert res.content_type == "application/zip"
+
    # Should be PK/ZIP stream
    assert res.data.count(b'PK') >= 2

+    # ZipFile from buffer seems non-obvious, just save it instead
+    with open("download.zip", 'wb') as f:
+        f.write(res.data)
+
+    zip = ZipFile('download.zip')
+    l = zip.namelist()
+    uuid4hex = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}.*txt', re.I)
+    newlist = list(filter(uuid4hex.match, l))  # Read Note below
+
+    # Should be three txt files in the archive (history and the snapshot)
+    assert len(newlist) == 3
+
--- a/changedetectionio/tests/test_diff_filter_changes_as_add_delete.py
+++ b/changedetectionio/tests/test_diff_filter_changes_as_add_delete.py
@@ -0,0 +1,107 @@
+#!/usr/bin/python3
+# @NOTE:  THIS RELIES ON SOME MIDDLEWARE TO MAKE CHECKBOXES WORK WITH WTFORMS UNDER TEST CONDITION, see changedetectionio/tests/util.py
+import time
+from flask import url_for
+from .util import live_server_setup
+
+def set_original_response():
+    test_return_data = """
+        Here
+        is
+        some
+        text
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def set_response_with_deleted_word():
+    test_return_data = """
+        Here
+        is
+        text
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def set_response_with_changed_word():
+    test_return_data = """
+        Here
+        ix
+        some
+        text
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def test_diff_filter_changes_as_add_delete(client, live_server):
+    live_server_setup(live_server)
+
+    sleep_time_for_fetch_thread = 3
+
+    set_original_response()
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+
+    assert b"1 Imported" in res.data
+    # Wait for it to read the original version
+    time.sleep(sleep_time_for_fetch_thread)
+
+    #  Make a change that ONLY includes deletes
+    set_response_with_deleted_word()
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={"trigger_add": "y",
+              "trigger_del": "n",
+              "url": test_url,
+              "fetch_backend": "html_requests"},
+        follow_redirects=True
+    )
+    assert b"Updated watch." in res.data
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # We should NOT see a change because we chose to not know about any Deletions
+    res = client.get(url_for("index"))
+    assert b'unviewed' not in res.data
+    # Recheck to be sure
+    client.get(url_for("form_watch_checknow"), follow_redirects=True)
+    time.sleep(sleep_time_for_fetch_thread)
+    res = client.get(url_for("index"))
+    assert b'unviewed' not in res.data
+
+
+    # Now set the original response, which will include the word, which should trigger Added (because trigger_add ==y)
+    set_original_response()
+    client.get(url_for("form_watch_checknow"), follow_redirects=True)
+    time.sleep(sleep_time_for_fetch_thread)
+    res = client.get(url_for("index"))
+    assert b'unviewed' in res.data
+
+    # Now check 'changes' are always going to be triggered
+    set_original_response()
+    client.post(
+        url_for("edit_page", uuid="first"),
+        # Neither trigger add nor del? then we should see changes still
+        data={"trigger_add": "n",
+              "trigger_del": "n",
+              "url": test_url,
+              "fetch_backend": "html_requests"},
+        follow_redirects=True
+    )
+    time.sleep(sleep_time_for_fetch_thread)
+    client.get(url_for("mark_all_viewed"), follow_redirects=True)
+    set_response_with_changed_word()
+    client.get(url_for("form_watch_checknow"), follow_redirects=True)
+    time.sleep(sleep_time_for_fetch_thread)
+    res = client.get(url_for("index"))
+    assert b'unviewed' in res.data
--- a/changedetectionio/tests/test_diff_filter_only_additions.py
+++ b/changedetectionio/tests/test_diff_filter_only_additions.py
@@ -0,0 +1,83 @@
+#!/usr/bin/python3
+
+import time
+from flask import url_for
+from .util import live_server_setup
+
+def set_original_response():
+    test_return_data = """
+        A few new lines
+        Where there is more lines originally
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def set_delete_response():
+    test_return_data = """
+        A few new lines
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def test_diff_filtering_no_del(client, live_server):
+    live_server_setup(live_server)
+
+    sleep_time_for_fetch_thread = 3
+
+    set_original_response()
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+
+    assert b"1 Imported" in res.data
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # Add our URL to the import page
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={"trigger_add": "y",
+              "trigger_del": "n",
+              "url": test_url,
+              "fetch_backend": "html_requests"},
+        follow_redirects=True
+    )
+    assert b"Updated watch." in res.data
+    assert b'unviewed' not in res.data
+
+    #  Make an delete change
+    set_delete_response()
+
+    time.sleep(sleep_time_for_fetch_thread)
+    # Trigger a check
+    client.get(url_for("form_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # We should NOT see the change
+    res = client.get(url_for("index"))
+    assert b'unviewed' not in res.data
+
+    #  Make an delete change
+    set_original_response()
+
+    time.sleep(sleep_time_for_fetch_thread)
+    # Trigger a check
+    client.get(url_for("form_watch_checknow"), follow_redirects=True)
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # We should see the change
+    res = client.get(url_for("index"))
+    assert b'unviewed' in res.data
+
--- a/changedetectionio/tests/test_diff_filter_only_deletions.py
+++ b/changedetectionio/tests/test_diff_filter_only_deletions.py
@@ -0,0 +1,72 @@
+#!/usr/bin/python3
+
+import time
+from flask import url_for
+from .util import live_server_setup
+
+def set_original_response():
+    test_return_data = """
+        A few new lines
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def set_add_response():
+    test_return_data = """
+        A few new lines
+        Where there is more lines than before
+    """
+
+    with open("test-datastore/endpoint-content.txt", "w") as f:
+        f.write(test_return_data)
+
+def test_diff_filtering_no_add(client, live_server):
+    live_server_setup(live_server)
+
+    sleep_time_for_fetch_thread = 3
+
+    set_original_response()
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_endpoint', _external=True)
+    res = client.post(
+        url_for("import_page"),
+        data={"urls": test_url},
+        follow_redirects=True
+    )
+
+    assert b"1 Imported" in res.data
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # Add our URL to the import page
+    res = client.post(
+        url_for("edit_page", uuid="first"),
+        data={"trigger_add": "n",
+              "trigger_del": "y",
+              "url": test_url,
+              "fetch_backend": "html_requests"},
+        follow_redirects=True
+    )
+    assert b"Updated watch." in res.data
+    assert b'unviewed' not in res.data
+
+    #  Make an add change
+    set_add_response()
+
+    time.sleep(sleep_time_for_fetch_thread)
+    # Trigger a check
+
+    # Give the thread time to pick it up
+    time.sleep(sleep_time_for_fetch_thread)
+
+    # We should NOT see the change
+    res = client.get(url_for("index"))
+    # save res.data to a file
+    
+        
+        
+    assert b'unviewed' not in res.data
+
--- a/changedetectionio/tests/test_history_consistency.py
+++ b/changedetectionio/tests/test_history_consistency.py
@@ -81,4 +81,4 @@ def test_consistent_history(client, live_server):



-        assert len(files_in_watch_dir) == 2, "Should be just two files in the dir, history.txt and the snapshot"
+        assert len(files_in_watch_dir) == 3, "Should be just three files in the dir, history.txt, previous.txt, and the snapshot"
--- a/changedetectionio/tests/test_jinja2.py
+++ b/changedetectionio/tests/test_jinja2.py
@@ -0,0 +1,33 @@
+#!/usr/bin/python3
+
+import time
+from flask import url_for
+from .util import live_server_setup
+
+
+# If there was only a change in the whitespacing, then we shouldnt have a change detected
+def test_jinja2_in_url_query(client, live_server):
+    live_server_setup(live_server)
+
+    # Give the endpoint time to spin up
+    time.sleep(1)
+
+    # Add our URL to the import page
+    test_url = url_for('test_return_query', _external=True)
+
+    # because url_for() will URL-encode the var, but we dont here
+    full_url = "{}?{}".format(test_url,
+                              "date={% now 'Europe/Berlin', '%Y' %}.{% now 'Europe/Berlin', '%m' %}.{% now 'Europe/Berlin', '%d' %}", )
+    res = client.post(
+        url_for("form_quick_watch_add"),
+        data={"url": full_url, "tag": "test"},
+        follow_redirects=True
+    )
+    assert b"Watch added" in res.data
+    time.sleep(3)
+    # It should report nothing found (no new 'unviewed' class)
+    res = client.get(
+        url_for("preview_page", uuid="first"),
+        follow_redirects=True
+    )
+    assert b'date=2' in res.data
--- a/changedetectionio/tests/test_jsonpath_jq_selector.py
+++ b/changedetectionio/tests/test_jsonpath_jq_selector.py
@@ -5,7 +5,12 @@ import time
 from flask import url_for, escape
 from . util import live_server_setup
 import pytest
+jq_support = True

+try:
+    import jq
+except ModuleNotFoundError:
+    jq_support = False

 def test_setup(live_server):
    live_server_setup(live_server)
@@ -40,22 +45,24 @@ and it can also be repeated
    assert text == "23.5"

    # also check for jq
-    text = html_tools.extract_json_as_string(content, "jq:.offers.price")
-    assert text == "23.5"
+    if jq_support:
+        text = html_tools.extract_json_as_string(content, "jq:.offers.price")
+        assert text == "23.5"
+
+        text = html_tools.extract_json_as_string('{"id":5}', "jq:.id")
+        assert text == "5"

    text = html_tools.extract_json_as_string('{"id":5}', "json:$.id")
    assert text == "5"

-    text = html_tools.extract_json_as_string('{"id":5}', "jq:.id")
-    assert text == "5"
-
    # When nothing at all is found, it should throw JSONNOTFound
    # Which is caught and shown to the user in the watch-overview table
    with pytest.raises(html_tools.JSONNotFound) as e_info:
        html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "json:$.id")

-    with pytest.raises(html_tools.JSONNotFound) as e_info:
-        html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "jq:.id")
+    if jq_support:
+        with pytest.raises(html_tools.JSONNotFound) as e_info:
+            html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "jq:.id")

 def set_original_ext_response():
    data = """
@@ -271,7 +278,8 @@ def test_check_jsonpath_filter(client, live_server):
    check_json_filter('json:boss.name', client, live_server)

 def test_check_jq_filter(client, live_server):
-    check_json_filter('jq:.boss.name', client, live_server)
+    if jq_support:
+        check_json_filter('jq:.boss.name', client, live_server)

 def check_json_filter_bool_val(json_filter, client, live_server):
    set_original_response()
@@ -329,7 +337,8 @@ def test_check_jsonpath_filter_bool_val(client, live_server):
    check_json_filter_bool_val("json:$['available']", client, live_server)

 def test_check_jq_filter_bool_val(client, live_server):
-    check_json_filter_bool_val("jq:.available", client, live_server)
+    if jq_support:
+        check_json_filter_bool_val("jq:.available", client, live_server)

 # Re #265 - Extended JSON selector test
 # Stuff to consider here
@@ -408,4 +417,5 @@ def test_check_jsonpath_ext_filter(client, live_server):
    check_json_ext_filter('json:$[?(@.status==Sold)]', client, live_server)

 def test_check_jq_ext_filter(client, live_server):
-    check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server)
+    if jq_support:
+        check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server)
--- a/changedetectionio/tests/util.py
+++ b/changedetectionio/tests/util.py
@@ -4,6 +4,12 @@ from flask import make_response, request
 from flask import url_for
 import logging
 import time
+from werkzeug import Request
+import io
+
+# This is a fix for macOS running tests.
+import multiprocessing
+multiprocessing.set_start_method("fork")

 def set_original_response():
    test_return_data = """<html>
@@ -159,5 +165,42 @@ def live_server_setup(live_server):
        ret = " ".join([auth.username, auth.password, auth.type])
        return ret

+    # Make sure any checkboxes that are supposed to be defaulted to true are set during the post request
+    # This is due to the fact that defaults are set in the HTML which we are not using during tests.
+    # This does not affect the server when running outside of a test
+    class DefaultCheckboxMiddleware(object):
+        def __init__(self, app):
+            self.app = app
+
+        def __call__(self, environ, start_response):
+            request = Request(environ)
+            if request.method == "POST" and "/edit" in request.path:
+                body = environ['wsgi.input'].read()
+
+                # if the checkboxes are not set, set them to true
+                if b"trigger_add" not in body:
+                    body += b'&trigger_add=y'
+
+                if b"trigger_del" not in body:
+                    body += b'&trigger_del=y'
+
+                # remove any checkboxes set to "n" so wtforms processes them correctly
+                body = body.replace(b"trigger_add=n", b"")
+                body = body.replace(b"trigger_del=n", b"")
+                body = body.replace(b"&&", b"&")
+
+                new_stream = io.BytesIO(body)
+                environ["CONTENT_LENGTH"] = len(body)
+                environ['wsgi.input'] = new_stream
+
+            return self.app(environ, start_response)
+
+    live_server.app.wsgi_app = DefaultCheckboxMiddleware(live_server.app.wsgi_app)
+
+    # Just return some GET var
+    @live_server.app.route('/test-return-query', methods=['GET'])
+    def test_return_query():
+        return request.query_string
+
    live_server.start()

--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -45,6 +45,9 @@ services:
  #        Respect proxy_pass type settings, `proxy_set_header Host "localhost";` and `proxy_set_header X-Forwarded-Prefix /app;`
  #        More here https://github.com/dgtlmoon/changedetection.io/wiki/Running-changedetection.io-behind-a-reverse-proxy-sub-directory
  #      - USE_X_SETTINGS=1
+  #
+  #        Hides the `Referer` header so that monitored websites can't see the changedetection.io hostname.
+  #      - HIDE_REFERER=true

      # Comment out ports: when using behind a reverse proxy , enable networks: etc.
      ports:
--- a/requirements.txt
+++ b/requirements.txt
@@ -1,8 +1,8 @@
-flask~= 2.0
+flask ~= 2.0
 flask_wtf
-eventlet>=0.31.0
+eventlet >= 0.31.0
 validators
-timeago ~=1.0
+timeago ~= 1.0
 inscriptis ~= 2.2
 feedgen ~= 0.9
 flask-login ~= 0.5
@@ -19,7 +19,8 @@ chardet > 2.3.0

 wtforms ~= 3.0
 jsonpath-ng ~= 1.5.3
-jq ~= 1.3.0
+
+# jq not available on Windows so must be installed manually

 # Notification library
 apprise ~= 1.1.0
@@ -45,4 +46,9 @@ selenium ~= 4.1.0
 # need to revisit flask login versions
 werkzeug ~= 2.0.0

+# Templating, so far just in the URLs but in the future can be for the notifications also
+jinja2 ~= 3.1
+jinja2-time
+
 # playwright is installed at Dockerfile build time because it's not available on all platforms
+
Author	SHA1	Message	Date
dgtlmoon	aef24c42db	extended tests	2022-10-28 14:08:29 +02:00
dgtlmoon	0f6afb9ce8	Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters	2022-10-28 13:50:19 +02:00
Brandon Wees	ea2fcee4ad	fix syntax error	2022-10-27 12:05:57 -04:00
Brandon Wees	bd79c5decd	Update changedetectionio/tests/test_diff_filter_changes_as_add_delete.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-27 12:03:20 -04:00
Brandon Wees	74428372c3	Update changedetectionio/tests/test_diff_filter_only_deletions.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-27 11:57:55 -04:00
dgtlmoon	e6cdb57db0	Merge branch 'master' into diff-filters	2022-10-27 17:56:56 +02:00
dgtlmoon	ac3de58116	Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters	2022-10-27 17:37:26 +02:00
Brandon Wees	e11c6aeb5f	Apply suggestions from code review Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-27 10:59:14 -04:00
Brandon Wees	294bb7be15	remvoe unneeded import	2022-10-27 10:57:50 -04:00
Brandon Wees	c2c8bb4de8	ensure_data_dir_exists call added	2022-10-27 10:54:30 -04:00
Brandon Wees	35d950fa74	Update changedetectionio/model/Watch.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-27 10:52:42 -04:00
Brandon Wees	d24111f3a6	Apply suggestions from code review Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-27 10:52:20 -04:00
Brandon Wees	7011a04399	switching to os.path.join	2022-10-27 10:43:18 -04:00
Sandro	57f604dff1	UI - Make fetch error more readable (#1038 )	2022-10-27 16:40:24 +02:00
dgtlmoon	8499468749	Update README.md	2022-10-27 15:17:14 +02:00
Brandon Wees	4364521cfc	Update changedetectionio/templates/edit.html Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-27 09:11:28 -04:00
Brandon Wees	748328453e	unmerge external header server. Sorry!	2022-10-27 09:03:39 -04:00
Brandon Wees	e867e89303	Update test_backup.py	2022-10-27 08:45:44 -04:00
dgtlmoon	7f6a13ea6c	Re #1052 - Watch 'open' link should use any dynamic/template info (#1063 )	2022-10-27 13:29:24 +02:00
dgtlmoon	9874f0cbc7	Remove accidental files	2022-10-27 12:43:02 +02:00
dgtlmoon	3e7fd9570a	Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters	2022-10-27 12:42:28 +02:00
dgtlmoon	99f3b01013	Merge branch 'master' into diff-filters	2022-10-27 12:38:51 +02:00
dgtlmoon	72834a42fd	Backups and Snapshots - Data directory now fully portable, (all paths are relative) , refactored backup zip export creation	2022-10-27 12:35:26 +02:00
Brandon Wees	43c2e71961	Merge branch 'master' into diff-filters	2022-10-26 08:18:27 -04:00
dgtlmoon	724cb17224	Re #1052 - Dynamic URLs, use variables in the URL (such as the current date, the date in a month, and other logic see https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL ) (#1057 )	2022-10-24 23:20:39 +02:00
Brandon Wees	9946ee66d0	Merge pull request #2 from bwees/external-header-server External header server	2022-10-24 09:08:57 -04:00
Brandon Wees	9f722cc76b	Merge branch 'dgtlmoon:master' into external-header-server	2022-10-24 08:54:22 -04:00
dgtlmoon	62b6645810	Merge branch 'master' into diff-filters	2022-10-24 11:47:08 +02:00
dgtlmoon	e5e8b3bbbd	Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters	2022-10-24 11:47:05 +02:00
dgtlmoon	4eb4b401a1	API - system info - allow 5 minutes grace before watch is considered 'overdue'	2022-10-23 23:12:28 +02:00
dgtlmoon	5d40e16c73	API - Adding basic system info/system state API (#1051 )	2022-10-23 19:15:11 +02:00
dgtlmoon	492bbce6b6	Build - Fix syntax in container build test (#1050 )	2022-10-23 16:02:13 +02:00
dgtlmoon	0394a56be5	Building - Test container build on PR	2022-10-23 15:54:19 +02:00
Entepotenz	7839551d6b	Testing - Use same version of playwright while running tests as in production builds (#1047 )	2022-10-23 11:26:32 +02:00
Entepotenz	9c5588c791	update path for validation in the CONTRIBUTING.md (#1046 )	2022-10-23 11:25:29 +02:00
bwees	852a698629	add optional for field	2022-10-19 19:14:01 -04:00
bwees	76fd27dfab	fix logic error	2022-10-19 19:10:01 -04:00
bwees	83161e4fa3	fixed string None case	2022-10-19 19:03:01 -04:00
bwees	296c7c46cb	fixed empty field errors	2022-10-19 19:00:38 -04:00
bwees	0a2644d0c3	fix tests	2022-10-19 18:58:54 -04:00
bwees	495e322c9e	fixed import errors	2022-10-19 18:55:05 -04:00
bwees	0d5820932f	rename branch	2022-10-19 18:45:43 -04:00
Brandon Wees	408be08a48	Merge branch 'dgtlmoon:master' into external-auth	2022-10-19 18:42:27 -04:00
bwees	bad0909cc2	added external header server	2022-10-19 18:42:04 -04:00
dgtlmoon	5a43a350de	History index safety check - Be sure that only valid history index lines are read (#1042 )	2022-10-19 22:41:13 +02:00
Michael McMillan	3c31f023ce	Option to Hide the Referer header from monitored websites. (#996 )	2022-10-18 09:16:22 +02:00
Brandon Wees	c80f46308a	Update edit.html	2022-10-17 15:10:36 -04:00
dgtlmoon	4cbcc59461	0.39.20.4	2022-10-17 18:36:47 +02:00
dgtlmoon	4be0260381	Better cross platform file handling in diff and preview (#1034 )	2022-10-17 18:36:22 +02:00
dgtlmoon	957a3c1c16	0.39.20.3	2022-10-17 17:43:35 +02:00
dgtlmoon	85897e0bf9	Windows - diff file handling improvements (#1031 )	2022-10-17 17:40:28 +02:00
dgtlmoon	63095f70ea	Also include tests in pip build	2022-10-17 17:13:15 +02:00
dgtlmoon	802daa6296	Merge branch 'master' into diff-filters	2022-10-17 12:10:59 +02:00
Brandon Wees	2f641da182	Apply suggestions from code review Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-14 07:49:28 -04:00
dgtlmoon	8d5b0b5576	Update README.md	2022-10-12 10:51:39 +02:00
dgtlmoon	1b077abd93	0.39.20.2	2022-10-12 09:53:59 +02:00
dgtlmoon	32ea1a8721	Windows - JQ - Make library optional so it doesnt break Windows pip installs (#1009 )	2022-10-12 09:53:16 +02:00
Brandon Wees	4951721286	Update changedetectionio/store.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-10-11 07:59:51 -04:00
dgtlmoon	a50d6db0b2	Merge branch 'master' into diff-filters	2022-10-11 11:17:53 +02:00
dgtlmoon	f55f7967ef	Merge branch 'master' into diff-filters	2022-09-08 20:37:17 +02:00
bwees	13a96e93a2	fix linter errors after merge	2022-08-17 09:33:34 -04:00
dgtlmoon	ed93d51ae8	Merge branch 'master' into diff-filters	2022-08-17 15:26:47 +02:00
bwees	db28b30b1b	add test for situation found in https://github.com/dgtlmoon/changedetection.io/pull/749#issuecomment-1200154861	2022-07-30 09:14:06 -04:00
bwees	6bdcdfbaea	fixed replace bug in get_diff_types	2022-07-30 09:05:55 -04:00
bwees	0efc504c5d	change form wording	2022-07-30 08:47:07 -04:00
bwees	628cb2ad44	added form validation for diff filter checkboxes	2022-07-30 08:30:56 -04:00
Brandon Wees	604f2eaf02	remove unneeded debug statements	2022-07-29 08:40:47 -04:00
bwees	2a649afd22	Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters	2022-07-29 08:39:32 -04:00
bwees	526f8fac45	remove unneeded import	2022-07-29 08:39:30 -04:00
dgtlmoon	e76f5efee3	Merge branch 'master' into diff-filters	2022-07-29 12:54:54 +02:00
bwees	7ac0620099	fixed merge conflict with latest version	2022-07-28 20:52:01 -04:00
bwees	14765b46bd	fix broken logic	2022-07-28 20:48:20 -04:00
bwees	4f3a15e68d	clean up test	2022-07-28 20:48:14 -04:00
bwees	c6207f729d	added middleware to fix broken default checkboxes during tests	2022-07-28 20:37:20 -04:00
bwees	fcc1a72d30	changed tests	2022-07-28 20:37:03 -04:00
bwees	6f2b7ceddb	changed UI to have checkboxes instead of dropdown	2022-07-28 20:36:53 -04:00
bwees	1e265b312e	fix macos test running	2022-07-28 20:33:01 -04:00
Brandon Wees	f379dda13d	Apply suggestions from code review Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-17 11:59:20 -04:00
Brandon Wees	4a88589a27	Update changedetectionio/model/Watch.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-17 11:58:46 -04:00
bwees	cac53a76c0	added antoher step to test to cover case as described https://github.com/dgtlmoon/changedetection.io/pull/749#issuecomment-1186209681	2022-07-16 19:13:20 -04:00
bwees	8dbf2257d3	added datastore migration step	2022-07-16 19:08:57 -04:00
bwees	c0fb051dde	changed get_previous_text to not create the file if it does not exist	2022-07-16 16:02:05 -04:00
bwees	cf09f03d32	fix import statements	2022-07-16 15:54:44 -04:00
Brandon Wees	237cf7db4f	Update changedetectionio/model/Watch.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-16 15:49:03 -04:00
bwees	a8e24dab01	Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters	2022-07-16 15:48:44 -04:00
bwees	5c9b7353d4	fixed difflib import	2022-07-16 15:48:43 -04:00
Brandon Wees	1e22949e3d	Update changedetectionio/model/Watch.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-16 15:48:20 -04:00
Brandon Wees	68e1a64474	Update changedetectionio/model/Watch.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-16 15:46:55 -04:00
Brandon Wees	151c2dab3a	Update changedetectionio/templates/edit.html Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-16 10:38:45 -04:00
Brandon Wees	3e43d7ad1a	Update changedetectionio/templates/edit.html Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-16 10:38:27 -04:00
Brandon Wees	58cb7fbc2a	Update changedetectionio/model/Watch.py Co-authored-by: dgtlmoon <leigh@morresi.net>	2022-07-16 10:37:05 -04:00
Brandon Wees	23452a1599	Remove discord change (look at https://github.com/dgtlmoon/changedetection.io/pull/753 for this change)	2022-07-13 18:05:02 -04:00
bwees	7fb432bf06	Created working tests	2022-07-13 17:58:30 -04:00
bwees	dc3fc6cfdf	used a drop down menu and rewrote checking code to fit GUI description	2022-07-13 17:58:13 -04:00
bwees	8ee42d2403	fixed my breaking change	2022-07-13 17:57:39 -04:00
bwees	8d9cac4c38	remove my tests because they wont run	2022-07-12 21:16:45 -04:00
bwees	374bb3824f	fix test to include the new previous.txt file	2022-07-12 21:11:42 -04:00
bwees	91d8600b19	fixed test naming	2022-07-12 20:53:22 -04:00
bwees	7b0ddc23d3	workaround for diff filter checkboxes getting changed on creation of form object	2022-07-12 20:40:54 -04:00
bwees	ab74377be0	fixed file based text saving system	2022-07-12 18:28:29 -04:00
bwees	2196d120a9	rewrote and broke out tests to simplify	2022-07-12 18:27:51 -04:00
bwees	5dca59a4a0	switched to file handling of previous_text	2022-07-12 17:59:46 -04:00
bwees	ee8042b54e	Fix boolean value being sent to difflib	2022-07-12 16:56:59 -04:00
bwees	4c3f233d21	Made unit test	2022-07-11 20:52:18 -04:00
bwees	159b062cb3	removed modify due to the way difflib reacts to changes	2022-07-11 20:37:01 -04:00
bwees	83565787ae	added logic for filtering based on diff attributes	2022-07-11 20:35:30 -04:00
bwees	bdab4f5e09	added diff compare function to watch class	2022-07-11 20:34:33 -04:00
bwees	69075a81c5	updated data model	2022-07-11 19:27:05 -04:00
bwees	04746cc706	Added initial UI code	2022-07-11 19:26:56 -04:00
Brandon Wees	234494d907	Added character truncation rule to URL starting with https://discord.com/api/webhooks	2022-07-11 18:02:04 -04:00