Compare commits

...

110 Commits

Author SHA1 Message Date
dgtlmoon
aef24c42db extended tests 2022-10-28 14:08:29 +02:00
dgtlmoon
0f6afb9ce8 Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters 2022-10-28 13:50:19 +02:00
Brandon Wees
ea2fcee4ad fix syntax error 2022-10-27 12:05:57 -04:00
Brandon Wees
bd79c5decd Update changedetectionio/tests/test_diff_filter_changes_as_add_delete.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-27 12:03:20 -04:00
Brandon Wees
74428372c3 Update changedetectionio/tests/test_diff_filter_only_deletions.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-27 11:57:55 -04:00
dgtlmoon
e6cdb57db0 Merge branch 'master' into diff-filters 2022-10-27 17:56:56 +02:00
dgtlmoon
ac3de58116 Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters 2022-10-27 17:37:26 +02:00
Brandon Wees
e11c6aeb5f Apply suggestions from code review
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-27 10:59:14 -04:00
Brandon Wees
294bb7be15 remvoe unneeded import 2022-10-27 10:57:50 -04:00
Brandon Wees
c2c8bb4de8 ensure_data_dir_exists call added 2022-10-27 10:54:30 -04:00
Brandon Wees
35d950fa74 Update changedetectionio/model/Watch.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-27 10:52:42 -04:00
Brandon Wees
d24111f3a6 Apply suggestions from code review
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-27 10:52:20 -04:00
Brandon Wees
7011a04399 switching to os.path.join 2022-10-27 10:43:18 -04:00
Sandro
57f604dff1 UI - Make fetch error more readable (#1038) 2022-10-27 16:40:24 +02:00
dgtlmoon
8499468749 Update README.md 2022-10-27 15:17:14 +02:00
Brandon Wees
4364521cfc Update changedetectionio/templates/edit.html
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-27 09:11:28 -04:00
Brandon Wees
748328453e unmerge external header server. Sorry! 2022-10-27 09:03:39 -04:00
Brandon Wees
e867e89303 Update test_backup.py 2022-10-27 08:45:44 -04:00
dgtlmoon
7f6a13ea6c Re #1052 - Watch 'open' link should use any dynamic/template info (#1063) 2022-10-27 13:29:24 +02:00
dgtlmoon
9874f0cbc7 Remove accidental files 2022-10-27 12:43:02 +02:00
dgtlmoon
3e7fd9570a Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters 2022-10-27 12:42:28 +02:00
dgtlmoon
99f3b01013 Merge branch 'master' into diff-filters 2022-10-27 12:38:51 +02:00
dgtlmoon
72834a42fd Backups and Snapshots - Data directory now fully portable, (all paths are relative) , refactored backup zip export creation 2022-10-27 12:35:26 +02:00
Brandon Wees
43c2e71961 Merge branch 'master' into diff-filters 2022-10-26 08:18:27 -04:00
dgtlmoon
724cb17224 Re #1052 - Dynamic URLs, use variables in the URL (such as the current date, the date in a month, and other logic see https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL ) (#1057) 2022-10-24 23:20:39 +02:00
Brandon Wees
9946ee66d0 Merge pull request #2 from bwees/external-header-server
External header server
2022-10-24 09:08:57 -04:00
Brandon Wees
9f722cc76b Merge branch 'dgtlmoon:master' into external-header-server 2022-10-24 08:54:22 -04:00
dgtlmoon
62b6645810 Merge branch 'master' into diff-filters 2022-10-24 11:47:08 +02:00
dgtlmoon
e5e8b3bbbd Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters 2022-10-24 11:47:05 +02:00
dgtlmoon
4eb4b401a1 API - system info - allow 5 minutes grace before watch is considered 'overdue' 2022-10-23 23:12:28 +02:00
dgtlmoon
5d40e16c73 API - Adding basic system info/system state API (#1051) 2022-10-23 19:15:11 +02:00
dgtlmoon
492bbce6b6 Build - Fix syntax in container build test (#1050) 2022-10-23 16:02:13 +02:00
dgtlmoon
0394a56be5 Building - Test container build on PR 2022-10-23 15:54:19 +02:00
Entepotenz
7839551d6b Testing - Use same version of playwright while running tests as in production builds (#1047) 2022-10-23 11:26:32 +02:00
Entepotenz
9c5588c791 update path for validation in the CONTRIBUTING.md (#1046) 2022-10-23 11:25:29 +02:00
bwees
852a698629 add optional for field 2022-10-19 19:14:01 -04:00
bwees
76fd27dfab fix logic error 2022-10-19 19:10:01 -04:00
bwees
83161e4fa3 fixed string None case 2022-10-19 19:03:01 -04:00
bwees
296c7c46cb fixed empty field errors 2022-10-19 19:00:38 -04:00
bwees
0a2644d0c3 fix tests 2022-10-19 18:58:54 -04:00
bwees
495e322c9e fixed import errors 2022-10-19 18:55:05 -04:00
bwees
0d5820932f rename branch 2022-10-19 18:45:43 -04:00
Brandon Wees
408be08a48 Merge branch 'dgtlmoon:master' into external-auth 2022-10-19 18:42:27 -04:00
bwees
bad0909cc2 added external header server 2022-10-19 18:42:04 -04:00
dgtlmoon
5a43a350de History index safety check - Be sure that only valid history index lines are read (#1042) 2022-10-19 22:41:13 +02:00
Michael McMillan
3c31f023ce Option to Hide the Referer header from monitored websites. (#996) 2022-10-18 09:16:22 +02:00
Brandon Wees
c80f46308a Update edit.html 2022-10-17 15:10:36 -04:00
dgtlmoon
4cbcc59461 0.39.20.4 2022-10-17 18:36:47 +02:00
dgtlmoon
4be0260381 Better cross platform file handling in diff and preview (#1034) 2022-10-17 18:36:22 +02:00
dgtlmoon
957a3c1c16 0.39.20.3 2022-10-17 17:43:35 +02:00
dgtlmoon
85897e0bf9 Windows - diff file handling improvements (#1031) 2022-10-17 17:40:28 +02:00
dgtlmoon
63095f70ea Also include tests in pip build 2022-10-17 17:13:15 +02:00
dgtlmoon
802daa6296 Merge branch 'master' into diff-filters 2022-10-17 12:10:59 +02:00
Brandon Wees
2f641da182 Apply suggestions from code review
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-14 07:49:28 -04:00
dgtlmoon
8d5b0b5576 Update README.md 2022-10-12 10:51:39 +02:00
dgtlmoon
1b077abd93 0.39.20.2 2022-10-12 09:53:59 +02:00
dgtlmoon
32ea1a8721 Windows - JQ - Make library optional so it doesnt break Windows pip installs (#1009) 2022-10-12 09:53:16 +02:00
Brandon Wees
4951721286 Update changedetectionio/store.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-10-11 07:59:51 -04:00
dgtlmoon
a50d6db0b2 Merge branch 'master' into diff-filters 2022-10-11 11:17:53 +02:00
dgtlmoon
f55f7967ef Merge branch 'master' into diff-filters 2022-09-08 20:37:17 +02:00
bwees
13a96e93a2 fix linter errors after merge 2022-08-17 09:33:34 -04:00
dgtlmoon
ed93d51ae8 Merge branch 'master' into diff-filters 2022-08-17 15:26:47 +02:00
bwees
db28b30b1b add test for situation found in https://github.com/dgtlmoon/changedetection.io/pull/749#issuecomment-1200154861 2022-07-30 09:14:06 -04:00
bwees
6bdcdfbaea fixed replace bug in get_diff_types 2022-07-30 09:05:55 -04:00
bwees
0efc504c5d change form wording 2022-07-30 08:47:07 -04:00
bwees
628cb2ad44 added form validation for diff filter checkboxes 2022-07-30 08:30:56 -04:00
Brandon Wees
604f2eaf02 remove unneeded debug statements 2022-07-29 08:40:47 -04:00
bwees
2a649afd22 Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters 2022-07-29 08:39:32 -04:00
bwees
526f8fac45 remove unneeded import 2022-07-29 08:39:30 -04:00
dgtlmoon
e76f5efee3 Merge branch 'master' into diff-filters 2022-07-29 12:54:54 +02:00
bwees
7ac0620099 fixed merge conflict with latest version 2022-07-28 20:52:01 -04:00
bwees
14765b46bd fix broken logic 2022-07-28 20:48:20 -04:00
bwees
4f3a15e68d clean up test 2022-07-28 20:48:14 -04:00
bwees
c6207f729d added middleware to fix broken default checkboxes during tests 2022-07-28 20:37:20 -04:00
bwees
fcc1a72d30 changed tests 2022-07-28 20:37:03 -04:00
bwees
6f2b7ceddb changed UI to have checkboxes instead of dropdown 2022-07-28 20:36:53 -04:00
bwees
1e265b312e fix macos test running 2022-07-28 20:33:01 -04:00
Brandon Wees
f379dda13d Apply suggestions from code review
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-17 11:59:20 -04:00
Brandon Wees
4a88589a27 Update changedetectionio/model/Watch.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-17 11:58:46 -04:00
bwees
cac53a76c0 added antoher step to test to cover case as described https://github.com/dgtlmoon/changedetection.io/pull/749#issuecomment-1186209681 2022-07-16 19:13:20 -04:00
bwees
8dbf2257d3 added datastore migration step 2022-07-16 19:08:57 -04:00
bwees
c0fb051dde changed get_previous_text to not create the file if it does not exist 2022-07-16 16:02:05 -04:00
bwees
cf09f03d32 fix import statements 2022-07-16 15:54:44 -04:00
Brandon Wees
237cf7db4f Update changedetectionio/model/Watch.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-16 15:49:03 -04:00
bwees
a8e24dab01 Merge branch 'diff-filters' of https://github.com/bwees/changedetection.io into diff-filters 2022-07-16 15:48:44 -04:00
bwees
5c9b7353d4 fixed difflib import 2022-07-16 15:48:43 -04:00
Brandon Wees
1e22949e3d Update changedetectionio/model/Watch.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-16 15:48:20 -04:00
Brandon Wees
68e1a64474 Update changedetectionio/model/Watch.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-16 15:46:55 -04:00
Brandon Wees
151c2dab3a Update changedetectionio/templates/edit.html
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-16 10:38:45 -04:00
Brandon Wees
3e43d7ad1a Update changedetectionio/templates/edit.html
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-16 10:38:27 -04:00
Brandon Wees
58cb7fbc2a Update changedetectionio/model/Watch.py
Co-authored-by: dgtlmoon <leigh@morresi.net>
2022-07-16 10:37:05 -04:00
Brandon Wees
23452a1599 Remove discord change (look at https://github.com/dgtlmoon/changedetection.io/pull/753 for this change) 2022-07-13 18:05:02 -04:00
bwees
7fb432bf06 Created working tests 2022-07-13 17:58:30 -04:00
bwees
dc3fc6cfdf used a drop down menu and rewrote checking code to fit GUI description 2022-07-13 17:58:13 -04:00
bwees
8ee42d2403 fixed my breaking change 2022-07-13 17:57:39 -04:00
bwees
8d9cac4c38 remove my tests because they wont run 2022-07-12 21:16:45 -04:00
bwees
374bb3824f fix test to include the new previous.txt file 2022-07-12 21:11:42 -04:00
bwees
91d8600b19 fixed test naming 2022-07-12 20:53:22 -04:00
bwees
7b0ddc23d3 workaround for diff filter checkboxes getting changed on creation of form object 2022-07-12 20:40:54 -04:00
bwees
ab74377be0 fixed file based text saving system 2022-07-12 18:28:29 -04:00
bwees
2196d120a9 rewrote and broke out tests to simplify 2022-07-12 18:27:51 -04:00
bwees
5dca59a4a0 switched to file handling of previous_text 2022-07-12 17:59:46 -04:00
bwees
ee8042b54e Fix boolean value being sent to difflib 2022-07-12 16:56:59 -04:00
bwees
4c3f233d21 Made unit test 2022-07-11 20:52:18 -04:00
bwees
159b062cb3 removed modify due to the way difflib reacts to changes 2022-07-11 20:37:01 -04:00
bwees
83565787ae added logic for filtering based on diff attributes 2022-07-11 20:35:30 -04:00
bwees
bdab4f5e09 added diff compare function to watch class 2022-07-11 20:34:33 -04:00
bwees
69075a81c5 updated data model 2022-07-11 19:27:05 -04:00
bwees
04746cc706 Added initial UI code 2022-07-11 19:26:56 -04:00
Brandon Wees
234494d907 Added character truncation rule to URL starting with https://discord.com/api/webhooks 2022-07-11 18:02:04 -04:00
28 changed files with 741 additions and 128 deletions

View File

@@ -1,12 +1,21 @@
name: ChangeDetection.io Container Build Test
# Triggers the workflow on push or pull request events
# This line doesnt work, even tho it is the documented one
#on: [push, pull_request]
on:
push:
paths:
- requirements.txt
- Dockerfile
pull_request:
paths:
- requirements.txt
- Dockerfile
# Changes to requirements.txt packages and Dockerfile may or may not always be compatible with arm etc, so worth testing
# @todo: some kind of path filter for requirements.txt and Dockerfile
jobs:

View File

@@ -6,7 +6,7 @@ Otherwise, it's always best to PR into the `dev` branch.
Please be sure that all new functionality has a matching test!
Use `pytest` to validate/test, you can run the existing tests as `pytest tests/test_notifications.py` for example
Use `pytest` to validate/test, you can run the existing tests as `pytest tests/test_notification.py` for example
```
pip3 install -r requirements-dev

View File

@@ -26,6 +26,11 @@ RUN pip install --target=/dependencies -r /requirements.txt
RUN pip install --target=/dependencies playwright~=1.26 \
|| echo "WARN: Failed to install Playwright. The application can still run, but the Playwright option will be disabled."
RUN pip install --target=/dependencies jq~=1.3 \
|| echo "WARN: Failed to install JQ. The application can still run, but the Jq: filter option will be disabled."
# Final image stage
FROM python:3.8-slim
@@ -59,6 +64,7 @@ EXPOSE 5000
# The actual flask app
COPY changedetectionio /app/changedetectionio
# The eventlet server wrapper
COPY changedetection.py /app/changedetection.py

View File

@@ -2,6 +2,7 @@ recursive-include changedetectionio/api *
recursive-include changedetectionio/templates *
recursive-include changedetectionio/static *
recursive-include changedetectionio/model *
recursive-include changedetectionio/tests *
include changedetection.py
global-exclude *.pyc
global-exclude node_modules

View File

@@ -121,8 +121,8 @@ See the wiki for more information https://github.com/dgtlmoon/changedetection.io
## Filters
XPath, JSONPath, jq, and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools.
XPath, JSONPath, jq, and CSS support comes baked in! You can be as specific as you need, use XPath exported from various XPath element query creation tools.
(We support LXML `re:test`, `re:math` and `re:replace`.)
## Notifications
@@ -161,46 +161,14 @@ This will re-parse the JSON and apply formatting to the text, making it super ea
### JSONPath or jq?
For more complex parsing, filtering, and modifying of JSON data, jq is recommended due to the built-in operators and functions. Refer to the [documentation](https://stedolan.github.io/jq/manual/) for more information on jq.
For more complex parsing, filtering, and modifying of JSON data, jq is recommended due to the built-in operators and functions. Refer to the [documentation](https://stedolan.github.io/jq/manual/) for more specifc information on jq.
The example below adds the price in dollars to each item in the JSON data, and then filters to only show items that are greater than 10.
One big advantage of `jq` is that you can use logic in your JSON filter, such as filters to only show items that have a value greater than/less than etc.
#### Sample input data from API
```
{
"items": [
{
"name": "Product A",
"priceInCents": 2500
},
{
"name": "Product B",
"priceInCents": 500
},
{
"name": "Product C",
"priceInCents": 2000
}
]
}
```
See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/JSON-Selector-Filter-help for more information and examples
#### Sample jq
`jq:.items[] | . + { "priceInDollars": (.priceInCents / 100) } | select(.priceInDollars > 10)`
Note: `jq` library must be added separately (`pip3 install jq`)
#### Sample output data
```
{
"name": "Product A",
"priceInCents": 2500,
"priceInDollars": 25
}
{
"name": "Product C",
"priceInCents": 2000,
"priceInDollars": 20
}
```
### Parse JSON embedded in HTML!
@@ -216,9 +184,9 @@ When you enable a `json:` or `jq:` filter, you can even automatically extract an
`json:$.price` or `jq:.price` would give `23.50`, or you can extract the whole structure
## Proxy configuration
## Proxy Configuration
See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration
See the wiki https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration , we also support using [BrightData proxy services where possible]( https://github.com/dgtlmoon/changedetection.io/wiki/Proxy-configuration#brightdata-proxy-support)
## Raspberry Pi support?

View File

@@ -33,7 +33,7 @@ from flask_wtf import CSRFProtect
from changedetectionio import html_tools
from changedetectionio.api import api_v1
__version__ = '0.39.20.1'
__version__ = '0.39.20.4'
datastore = None
@@ -194,6 +194,9 @@ def changedetection_app(config=None, datastore_o=None):
watch_api.add_resource(api_v1.Watch, '/api/v1/watch/<string:uuid>',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
watch_api.add_resource(api_v1.SystemInfo, '/api/v1/systeminfo',
resource_class_kwargs={'datastore': datastore, 'update_q': update_q})
@@ -636,20 +639,27 @@ def changedetection_app(config=None, datastore_o=None):
# Only works reliably with Playwright
visualselector_enabled = os.getenv('PLAYWRIGHT_DRIVER_URL', False) and default['fetch_backend'] == 'html_webdriver'
# JQ is difficult to install on windows and must be manually added (outside requirements.txt)
jq_support = True
try:
import jq
except ModuleNotFoundError:
jq_support = False
output = render_template("edit.html",
uuid=uuid,
watch=datastore.data['watching'][uuid],
form=form,
has_empty_checktime=using_default_check_time,
has_default_notification_urls=True if len(datastore.data['settings']['application']['notification_urls']) else False,
using_global_webdriver_wait=default['webdriver_delay'] is None,
current_base_url=datastore.data['settings']['application']['base_url'],
emailprefix=os.getenv('NOTIFICATION_MAIL_BUTTON_PREFIX', False),
form=form,
has_default_notification_urls=True if len(datastore.data['settings']['application']['notification_urls']) else False,
has_empty_checktime=using_default_check_time,
jq_support=jq_support,
playwright_enabled=os.getenv('PLAYWRIGHT_DRIVER_URL', False),
settings_application=datastore.data['settings']['application'],
using_global_webdriver_wait=default['webdriver_delay'] is None,
uuid=uuid,
visualselector_data_is_ready=visualselector_data_is_ready,
visualselector_enabled=visualselector_enabled,
playwright_enabled=os.getenv('PLAYWRIGHT_DRIVER_URL', False)
watch=datastore.data['watching'][uuid],
)
return output
@@ -809,8 +819,10 @@ def changedetection_app(config=None, datastore_o=None):
newest_file = history[dates[-1]]
# Read as binary and force decode as UTF-8
# Windows may fail decode in python if we just use 'r' mode (chardet decode exception)
try:
with open(newest_file, 'r') as f:
with open(newest_file, 'r', encoding='utf-8', errors='ignore') as f:
newest_version_file_contents = f.read()
except Exception as e:
newest_version_file_contents = "Unable to read {}.\n".format(newest_file)
@@ -823,7 +835,7 @@ def changedetection_app(config=None, datastore_o=None):
previous_file = history[dates[-2]]
try:
with open(previous_file, 'r') as f:
with open(previous_file, 'r', encoding='utf-8', errors='ignore') as f:
previous_version_file_contents = f.read()
except Exception as e:
previous_version_file_contents = "Unable to read {}.\n".format(previous_file)
@@ -900,7 +912,7 @@ def changedetection_app(config=None, datastore_o=None):
timestamp = list(watch.history.keys())[-1]
filename = watch.history[timestamp]
try:
with open(filename, 'r') as f:
with open(filename, 'r', encoding='utf-8', errors='ignore') as f:
tmp = f.readlines()
# Get what needs to be highlighted
@@ -975,9 +987,6 @@ def changedetection_app(config=None, datastore_o=None):
# create a ZipFile object
backupname = "changedetection-backup-{}.zip".format(int(time.time()))
# We only care about UUIDS from the current index file
uuids = list(datastore.data['watching'].keys())
backup_filepath = os.path.join(datastore_o.datastore_path, backupname)
with zipfile.ZipFile(backup_filepath, "w",
@@ -993,12 +1002,12 @@ def changedetection_app(config=None, datastore_o=None):
# Add the flask app secret
zipObj.write(os.path.join(datastore_o.datastore_path, "secret.txt"), arcname="secret.txt")
# Add any snapshot data we find, use the full path to access the file, but make the file 'relative' in the Zip.
for txt_file_path in Path(datastore_o.datastore_path).rglob('*.txt'):
parent_p = txt_file_path.parent
if parent_p.name in uuids:
zipObj.write(txt_file_path,
arcname=str(txt_file_path).replace(datastore_o.datastore_path, ''),
# Add any data in the watch data directory.
for uuid, w in datastore.data['watching'].items():
for f in Path(w.watch_data_dir).glob('*'):
zipObj.write(f,
# Use the full path to access the file, but make the file 'relative' in the Zip.
arcname=os.path.join(f.parts[-2], f.parts[-1]),
compress_type=zipfile.ZIP_DEFLATED,
compresslevel=8)

View File

@@ -122,3 +122,37 @@ class CreateWatch(Resource):
return {'status': "OK"}, 200
return list, 200
class SystemInfo(Resource):
def __init__(self, **kwargs):
# datastore is a black box dependency
self.datastore = kwargs['datastore']
self.update_q = kwargs['update_q']
@auth.check_token
def get(self):
import time
overdue_watches = []
# Check all watches and report which have not been checked but should have been
for uuid, watch in self.datastore.data.get('watching', {}).items():
# see if now - last_checked is greater than the time that should have been
# this is not super accurate (maybe they just edited it) but better than nothing
t = watch.threshold_seconds()
if not t:
# Use the system wide default
t = self.datastore.threshold_seconds
time_since_check = time.time() - watch.get('last_checked')
# Allow 5 minutes of grace time before we decide it's overdue
if time_since_check - (5 * 60) > t:
overdue_watches.append(uuid)
return {
'queue_size': self.update_q.qsize(),
'overdue_watches': overdue_watches,
'uptime': round(time.time() - self.datastore.start_time, 2),
'watch_count': len(self.datastore.data.get('watching', {}))
}, 200

View File

@@ -102,6 +102,14 @@ def main():
has_password=datastore.data['settings']['application']['password'] != False
)
# Monitored websites will not receive a Referer header
# when a user clicks on an outgoing link.
@app.after_request
def hide_referrer(response):
if os.getenv("HIDE_REFERER", False):
response.headers["Referrer-Policy"] = "no-referrer"
return response
# Proxy sub-directory support
# Set environment var USE_X_SETTINGS=1 on this script
# And then in your proxy_pass settings

View File

@@ -2,14 +2,14 @@ import hashlib
import logging
import os
import re
import time
import urllib3
import difflib
from changedetectionio import content_fetcher, html_tools
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)
# Some common stuff here that can be moved to a base class
# (set_proxy_from_list)
class perform_site_check():
@@ -65,7 +65,9 @@ class perform_site_check():
request_headers['Accept-Encoding'] = request_headers['Accept-Encoding'].replace(', br', '')
timeout = self.datastore.data['settings']['requests'].get('timeout')
url = watch.get('url')
url = watch.link
request_body = self.datastore.data['watching'][uuid].get('body')
request_method = self.datastore.data['watching'][uuid].get('method')
ignore_status_codes = self.datastore.data['watching'][uuid].get('ignore_status_codes', False)
@@ -287,8 +289,23 @@ class perform_site_check():
else:
logging.debug("check_unique_lines: UUID {} had unique content".format(uuid))
# Always record the new checksum
if changed_detected:
if not watch.get("trigger_add", True) or not watch.get("trigger_del", True): # if we are supposed to filter any diff types
# get the diff types present in the watch
diff_types = watch.get_diff_types(text_content_before_ignored_filter)
print("Diff components found: " + str(diff_types))
# Only Additions (deletions are turned off)
if not watch["trigger_del"] and diff_types["del"] and not diff_types["add"]:
changed_detected = False
# Only Deletions (additions are turned off)
elif not watch["trigger_add"] and diff_types["add"] and not diff_types["del"]:
changed_detected = False
# Always record the new checksum and the new text
update_obj["previous_md5"] = fetched_md5
watch.save_previous_text(text_content_before_ignored_filter)
# On the first run of a site, watch['previous_md5'] will be None, set it the current one.
if not watch.get('previous_md5'):

View File

@@ -303,12 +303,16 @@ class ValidateCSSJSONXPATHInput(object):
# Re #265 - maybe in the future fetch the page and offer a
# warning/notice that its possible the rule doesnt yet match anything?
if 'jq:' in line:
if not self.allow_json:
raise ValidationError("jq not permitted in this field!")
import jq
if 'jq:' in line:
try:
import jq
except ModuleNotFoundError:
# `jq` requires full compilation in windows and so isn't generally available
raise ValidationError("jq not support not found")
input = line.replace('jq:', '')
try:
@@ -319,6 +323,18 @@ class ValidateCSSJSONXPATHInput(object):
except:
raise ValidationError("A system-error occurred when validating your jq expression")
class ValidateDiffFilters(object):
"""
Validates that at least one filter checkbox is selected
"""
def __init__(self, message=None):
self.message = message
def __call__(self, form, field):
if not form.trigger_add.data and not form.trigger_del.data:
message = field.gettext('At least one filter checkbox must be selected')
raise ValidationError(message)
class quickWatchForm(Form):
url = fields.URLField('URL', validators=[validateURL()])
@@ -361,6 +377,8 @@ class watchForm(commonSettingsForm):
check_unique_lines = BooleanField('Only trigger when new lines appear', default=False)
trigger_text = StringListField('Trigger/wait for text', [validators.Optional(), ValidateListRegex()])
text_should_not_be_present = StringListField('Block change-detection if text matches', [validators.Optional(), ValidateListRegex()])
trigger_add = BooleanField('Additions', [ValidateDiffFilters()], default=True)
trigger_del = BooleanField('Deletions', [ValidateDiffFilters()], default=True)
webdriver_js_execute_code = TextAreaField('Execute JavaScript before change detection', render_kw={"rows": "5"}, validators=[validators.Optional()])

View File

@@ -1,12 +1,11 @@
import json
from typing import List
from bs4 import BeautifulSoup
from jsonpath_ng.ext import parse
import jq
import re
from inscriptis import get_text
from inscriptis.model.config import ParserConfig
from jsonpath_ng.ext import parse
from typing import List
import json
import re
class FilterNotFoundInResponse(ValueError):
def __init__(self, msg):
@@ -85,9 +84,18 @@ def _parse_json(json_data, json_filter):
jsonpath_expression = parse(json_filter.replace('json:', ''))
match = jsonpath_expression.find(json_data)
return _get_stripped_text_from_json_match(match)
if 'jq:' in json_filter:
try:
import jq
except ModuleNotFoundError:
# `jq` requires full compilation in windows and so isn't generally available
raise Exception("jq not support not found")
jq_expression = jq.compile(json_filter.replace('jq:', ''))
match = jq_expression.input(json_data).all()
return _get_stripped_text_from_json_match(match)
def _get_stripped_text_from_json_match(match):

View File

@@ -1,6 +1,8 @@
import os
import uuid as uuid_builder
from distutils.util import strtobool
import logging
import os
import time
import uuid
minimum_seconds_recheck_time = int(os.getenv('MINIMUM_SECONDS_RECHECK_TIME', 60))
mtable = {'seconds': 1, 'minutes': 60, 'hours': 3600, 'days': 86400, 'weeks': 86400 * 7}
@@ -22,7 +24,7 @@ class model(dict):
#'newest_history_key': 0,
'title': None,
'previous_md5': False,
'uuid': str(uuid_builder.uuid4()),
'uuid': str(uuid.uuid4()),
'headers': {}, # Extra headers to send
'body': None,
'method': 'GET',
@@ -45,6 +47,8 @@ class model(dict):
'consecutive_filter_failures': 0, # Every time the CSS/xPath filter cannot be located, reset when all is fine.
'extract_title_as_title': False,
'check_unique_lines': False, # On change-detected, compare against all history if its something new
'trigger_add': True,
'trigger_del': True,
'proxy': None, # Preferred proxy connection
# Re #110, so then if this is set to None, we know to use the default value instead
# Requires setting to None on submit if it's the same as the default
@@ -60,7 +64,7 @@ class model(dict):
self.update(self.__base_config)
self.__datastore_path = kw['datastore_path']
self['uuid'] = str(uuid_builder.uuid4())
self['uuid'] = str(uuid.uuid4())
del kw['datastore_path']
@@ -82,10 +86,19 @@ class model(dict):
return False
def ensure_data_dir_exists(self):
target_path = os.path.join(self.__datastore_path, self['uuid'])
if not os.path.isdir(target_path):
print ("> Creating data dir {}".format(target_path))
os.mkdir(target_path)
if not os.path.isdir(self.watch_data_dir):
print ("> Creating data dir {}".format(self.watch_data_dir))
os.mkdir(self.watch_data_dir)
@property
def link(self):
url = self.get('url', '')
if '{%' in url or '{{' in url:
from jinja2 import Environment
# Jinja2 available in URLs along with https://pypi.org/project/jinja2-time/
jinja2_env = Environment(extensions=['jinja2_time.TimeExtension'])
return str(jinja2_env.from_string(url).render())
return url
@property
def label(self):
@@ -109,16 +122,40 @@ class model(dict):
@property
def history(self):
"""History index is just a text file as a list
{watch-uuid}/history.txt
contains a list like
{epoch-time},{filename}\n
We read in this list as the history information
"""
tmp_history = {}
import logging
import time
# Read the history file as a dict
fname = os.path.join(self.__datastore_path, self.get('uuid'), "history.txt")
fname = os.path.join(self.watch_data_dir, "history.txt")
if os.path.isfile(fname):
logging.debug("Reading history index " + str(time.time()))
with open(fname, "r") as f:
tmp_history = dict(i.strip().split(',', 2) for i in f.readlines())
for i in f.readlines():
if ',' in i:
k, v = i.strip().split(',', 2)
# The index history could contain a relative path, so we need to make the fullpath
# so that python can read it
if not '/' in v and not '\'' in v:
v = os.path.join(self.watch_data_dir, v)
else:
# It's possible that they moved the datadir on older versions
# So the snapshot exists but is in a different path
snapshot_fname = v.split('/')[-1]
proposed_new_path = os.path.join(self.watch_data_dir, snapshot_fname)
if not os.path.exists(v) and os.path.exists(proposed_new_path):
v = proposed_new_path
tmp_history[k] = v
if len(tmp_history):
self.__newest_history_key = list(tmp_history.keys())[-1]
@@ -129,7 +166,7 @@ class model(dict):
@property
def has_history(self):
fname = os.path.join(self.__datastore_path, self.get('uuid'), "history.txt")
fname = os.path.join(self.watch_data_dir, "history.txt")
return os.path.isfile(fname)
# Returns the newest key, but if theres only 1 record, then it's counted as not being new, so return 0.
@@ -148,33 +185,58 @@ class model(dict):
# Save some text file to the appropriate path and bump the history
# result_obj from fetch_site_status.run()
def save_history_text(self, contents, timestamp):
import uuid
import logging
output_path = "{}/{}".format(self.__datastore_path, self['uuid'])
self.ensure_data_dir_exists()
snapshot_fname = "{}.txt".format(str(uuid.uuid4()))
snapshot_fname = "{}/{}.stripped.txt".format(output_path, uuid.uuid4())
logging.debug("Saving history text {}".format(snapshot_fname))
with open(snapshot_fname, 'wb') as f:
# in /diff/ and /preview/ we are going to assume for now that it's UTF-8 when reading
# most sites are utf-8 and some are even broken utf-8
with open(os.path.join(self.watch_data_dir, snapshot_fname), 'wb') as f:
f.write(contents)
f.close()
# Append to index
# @todo check last char was \n
index_fname = "{}/history.txt".format(output_path)
index_fname = os.path.join(self.watch_data_dir, "history.txt")
with open(index_fname, 'a') as f:
f.write("{},{}\n".format(timestamp, snapshot_fname))
f.close()
self.__newest_history_key = timestamp
self.__history_n+=1
self.__history_n += 1
#@todo bump static cache of the last timestamp so we dont need to examine the file to set a proper ''viewed'' status
# @todo bump static cache of the last timestamp so we dont need to examine the file to set a proper ''viewed'' status
return snapshot_fname
# Save previous text snapshot for diffing - used for calculating additions and deletions
def save_previous_text(self, contents):
import logging
output_path = os.path.join(self.__datastore_path, self['uuid'])
# Incase the operator deleted it, check and create.
self.ensure_data_dir_exists()
snapshot_fname = os.path.join(self.watch_data_dir, "previous.txt")
logging.debug("Saving previous text {}".format(snapshot_fname))
with open(snapshot_fname, 'wb') as f:
f.write(contents)
return snapshot_fname
# Get previous text snapshot for diffing - used for calculating additions and deletions
def get_previous_text(self):
snapshot_fname = os.path.join(self.watch_data_dir, "previous.txt")
if self.history_n < 1:
return ""
with open(snapshot_fname, 'rb') as f:
contents = f.read()
return contents
@property
def has_empty_checktime(self):
# using all() + dictionary comprehension
@@ -204,15 +266,40 @@ class model(dict):
# if not, something new happened
return not local_lines.issubset(existing_history)
# Get diff types (addition, deletion, modification) from the previous snapshot and new_text
# uses similar algorithm to customSequenceMatcher in diff.py
# Returns a dict of diff types and wether they are present in the diff
def get_diff_types(self, new_text):
import difflib
diff_types = {
'add': False,
'del': False,
}
# get diff types using difflib
cruncher = difflib.SequenceMatcher(isjunk=lambda x: x in " \\t", a=str(self.get_previous_text()), b=str(new_text))
for tag, alo, ahi, blo, bhi in cruncher.get_opcodes():
if tag == 'delete':
diff_types["del"] = True
elif tag == 'insert':
diff_types["add"] = True
elif tag == 'replace':
diff_types["del"] = True
diff_types["add"] = True
return diff_types
def get_screenshot(self):
fname = os.path.join(self.__datastore_path, self['uuid'], "last-screenshot.png")
fname = os.path.join(self.watch_data_dir, "last-screenshot.png")
if os.path.isfile(fname):
return fname
return False
def __get_file_ctime(self, filename):
fname = os.path.join(self.__datastore_path, self['uuid'], filename)
fname = os.path.join(self.watch_data_dir, filename)
if os.path.isfile(fname):
return int(os.path.getmtime(fname))
return False
@@ -237,9 +324,14 @@ class model(dict):
def snapshot_error_screenshot_ctime(self):
return self.__get_file_ctime('last-error-screenshot.png')
@property
def watch_data_dir(self):
# The base dir of the watch data
return os.path.join(self.__datastore_path, self['uuid'])
def get_error_text(self):
"""Return the text saved from a previous request that resulted in a non-200 error"""
fname = os.path.join(self.__datastore_path, self['uuid'], "last-error.txt")
fname = os.path.join(self.watch_data_dir, "last-error.txt")
if os.path.isfile(fname):
with open(fname, 'r') as f:
return f.read()
@@ -247,7 +339,7 @@ class model(dict):
def get_error_snapshot(self):
"""Return path to the screenshot that resulted in a non-200 error"""
fname = os.path.join(self.__datastore_path, self['uuid'], "last-error-screenshot.png")
fname = os.path.join(self.watch_data_dir, "last-error-screenshot.png")
if os.path.isfile(fname):
return fname
return False

View File

@@ -9,6 +9,8 @@
# exit when any command fails
set -e
SCRIPT_DIR=$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )
find tests/test_*py -type f|while read test_name
do
echo "TEST RUNNING $test_name"
@@ -23,6 +25,13 @@ export BASE_URL="https://really-unique-domain.io"
pytest tests/test_notification.py
## JQ + JSON: filter test
# jq is not available on windows and we should just test it when the package is installed
# this will re-test with jq support
pip3 install jq~=1.3
pytest tests/test_jsonpath_jq_selector.py
# Now for the selenium and playwright/browserless fetchers
# Note - this is not UI functional tests - just checking that each one can fetch the content
@@ -38,7 +47,9 @@ docker kill $$-test_selenium
echo "TESTING WEBDRIVER FETCH > PLAYWRIGHT/BROWSERLESS..."
# Not all platforms support playwright (not ARM/rPI), so it's not packaged in requirements.txt
pip3 install playwright~=1.24
PLAYWRIGHT_VERSION=$(grep -i -E "RUN pip install.+" "$SCRIPT_DIR/../Dockerfile" | grep --only-matching -i -E "playwright[=><~+]+[0-9\.]+")
echo "using $PLAYWRIGHT_VERSION"
pip3 install "$PLAYWRIGHT_VERSION"
docker run -d --name $$-test_browserless -e "DEFAULT_LAUNCH_ARGS=[\"--window-size=1920,1080\"]" --rm -p 3000:3000 --shm-size="2g" browserless/chrome:1.53-chrome-stable
# takes a while to spin up
sleep 5

View File

@@ -156,7 +156,7 @@ body:after, body:before {
.fetch-error {
padding-top: 1em;
font-size: 60%;
font-size: 80%;
max-width: 400px;
display: block;
}
@@ -803,4 +803,4 @@ ul {
padding: 0.5rem;
border-radius: 5px;
color: #ff3300;
}
}

View File

@@ -30,14 +30,14 @@ class ChangeDetectionStore:
def __init__(self, datastore_path="/datastore", include_default_watches=True, version_tag="0.0.0"):
# Should only be active for docker
# logging.basicConfig(filename='/dev/stdout', level=logging.INFO)
self.needs_write = False
self.__data = App.model()
self.datastore_path = datastore_path
self.json_store_path = "{}/url-watches.json".format(self.datastore_path)
self.needs_write = False
self.proxy_list = None
self.start_time = time.time()
self.stop_thread = False
self.__data = App.model()
# Base definition for all watchers
# deepcopy part of #569 - not sure why its needed exactly
self.generic_definition = deepcopy(Watch.model(datastore_path = datastore_path, default={}))
@@ -548,6 +548,10 @@ class ChangeDetectionStore:
# `last_changed` not needed, we pull that information from the history.txt index
def update_4(self):
for uuid, watch in self.data['watching'].items():
# Be sure it's recalculated
p = watch.history
if watch.history_n < 2:
watch['last_changed'] = 0
try:
# Remove it from the struct
del(watch['last_changed'])
@@ -583,3 +587,23 @@ class ChangeDetectionStore:
for v in ['User-Agent', 'Accept', 'Accept-Encoding', 'Accept-Language']:
if self.data['settings']['headers'].get(v):
del self.data['settings']['headers'][v]
# Generate a previous.txt for all watches that do not have one and contain history
def update_8(self):
for uuid, watch in self.data['watching'].items():
# Make sure we actually have history
if (watch.history_n == 0):
continue
latest_file_name = watch.history[watch.newest_history_key]
# Check if the previous.txt exists
if not os.path.exists(os.path.join(watch.watch_data_dir, "previous.txt")):
# Generate a previous.txt
with open(os.path.join(watch.watch_data_dir, "previous.txt"), "wb") as f:
# Fill it with the latest history
latest_file_name = watch.history[watch.newest_history_key]
with open(latest_file_name, "rb") as f2:
f.write(f2.read())

View File

@@ -40,7 +40,8 @@
<fieldset>
<div class="pure-control-group">
{{ render_field(form.url, placeholder="https://...", required=true, class="m-d") }}
<span class="pure-form-message-inline">Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a></span>
<span class="pure-form-message-inline">Some sites use JavaScript to create the content, for this you should <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Fetching-pages-with-WebDriver">use the Chrome/WebDriver Fetcher</a></span><br/>
<span class="pure-form-message-inline">You can use variables in the URL, perfect for inserting the current date and other logic, <a href="https://github.com/dgtlmoon/changedetection.io/wiki/Handling-variables-in-the-watched-URL">help and examples here</a></span><br/>
</div>
<div class="pure-control-group">
{{ render_field(form.title, class="m-d") }}
@@ -172,6 +173,16 @@ User-Agent: wonderbra 1.0") }}
<span class="pure-form-message-inline">Good for websites that just move the content around, and you want to know when NEW content is added, compares new lines against all history for this watch.</span>
</div>
</fieldset>
<fieldset>
<div class="pure-control-group">
<label for="trigger-type">Filter and restrict change detection of content to</label>
{{ render_checkbox_field(form.trigger_add, class="trigger-type") }}
{{ render_checkbox_field(form.trigger_del, class="trigger-type") }}
<span class="pure-form-message-inline">
Filters the change-detection of this watch to only this type of content change. <strong>Replacements</strong> (neither additions nor deletions) are always included. The 'diff' will still include all changes.
</span>
</div>
</fieldset>
<div class="pure-control-group">
{% set field = render_field(form.css_filter,
placeholder=".class-name or #some-id, or other CSS selector rule.",
@@ -184,10 +195,14 @@ User-Agent: wonderbra 1.0") }}
<span class="pure-form-message-inline">
<ul>
<li>CSS - Limit text to this CSS rule, only text matching this CSS rule is included.</li>
<li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a>.
<li>JSON - Limit text to this JSON rule, using either <a href="https://pypi.org/project/jsonpath-ng/" target="new">JSONPath</a> or <a href="https://stedolan.github.io/jq/" target="new">jq</a> (if installed).
<ul>
<li>JSONPath: Prefix with <code>json:</code>, use <code>json:$</code> to force re-formatting if required, <a href="https://jsonpath.com/" target="new">test your JSONPath here</a>.</li>
{% if jq_support %}
<li>jq: Prefix with <code>jq:</code> and <a href="https://jqplay.org/" target="new">test your jq here</a>. Using <a href="https://stedolan.github.io/jq/" target="new">jq</a> allows for complex filtering and processing of JSON data with built-in functions, regex, filtering, and more. See examples and documentation <a href="https://stedolan.github.io/jq/manual/" target="new">here</a>.</li>
{% else %}
<li>jq support not installed</li>
{% endif %}
</ul>
</li>
<li>XPath - Limit text to this XPath rule, simply start with a forward-slash,
@@ -198,7 +213,7 @@ User-Agent: wonderbra 1.0") }}
</ul>
</li>
</ul>
Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath, or jq selector rules before filing an issue on GitHub! <a
Please be sure that you thoroughly understand how to write CSS, JSONPath, XPath{% if jq_support %}, or jq selector{%endif%} rules before filing an issue on GitHub! <a
href="https://github.com/dgtlmoon/changedetection.io/wiki/CSS-Selector-help">here for more CSS selector help</a>.<br/>
</span>
</div>

View File

@@ -87,7 +87,7 @@
<a class="state-{{'on' if watch.notification_muted}}" href="{{url_for('index', op='mute', uuid=watch.uuid, tag=active_tag)}}"><img src="{{url_for('static_content', group='images', filename='bell-off.svg')}}" alt="Mute notifications" title="Mute notifications"/></a>
</td>
<td class="title-col inline">{{watch.title if watch.title is not none and watch.title|length > 0 else watch.url}}
<a class="external" target="_blank" rel="noopener" href="{{ watch.url.replace('source:','') }}"></a>
<a class="external" target="_blank" rel="noopener" href="{{ watch.link.replace('source:','') }}"></a>
<a href="{{url_for('form_share_put_watch', uuid=watch.uuid)}}"><img style="height: 1em;display:inline-block;" src="{{url_for('static_content', group='images', filename='spread.svg')}}" /></a>
{%if watch.fetch_backend == "html_webdriver" %}<img style="height: 1em; display:inline-block;" src="{{url_for('static_content', group='images', filename='Google-Chrome-icon.png')}}" />{% endif %}

View File

@@ -147,6 +147,16 @@ def test_api_simple(client, live_server):
# @todo how to handle None/default global values?
assert watch['history_n'] == 2, "Found replacement history section, which is in its own API"
# basic systeminfo check
res = client.get(
url_for("systeminfo"),
headers={'x-api-key': api_key},
)
info = json.loads(res.data)
assert info.get('watch_count') == 1
assert info.get('uptime') > 0.5
# Finally delete the watch
res = client.delete(
url_for("watch", uuid=watch_uuid),

View File

@@ -1,18 +1,31 @@
#!/usr/bin/python3
import time
from .util import set_original_response, set_modified_response, live_server_setup
from flask import url_for
from urllib.request import urlopen
from . util import set_original_response, set_modified_response, live_server_setup
from zipfile import ZipFile
import re
import time
def test_backup(client, live_server):
live_server_setup(live_server)
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
res = client.post(
url_for("import_page"),
data={"urls": url_for('test_endpoint', _external=True)},
follow_redirects=True
)
assert b"1 Imported" in res.data
time.sleep(3)
res = client.get(
url_for("get_backup"),
follow_redirects=True
@@ -20,6 +33,19 @@ def test_backup(client, live_server):
# Should get the right zip content type
assert res.content_type == "application/zip"
# Should be PK/ZIP stream
assert res.data.count(b'PK') >= 2
# ZipFile from buffer seems non-obvious, just save it instead
with open("download.zip", 'wb') as f:
f.write(res.data)
zip = ZipFile('download.zip')
l = zip.namelist()
uuid4hex = re.compile('^[a-f0-9]{8}-?[a-f0-9]{4}-?4[a-f0-9]{3}-?[89ab][a-f0-9]{3}-?[a-f0-9]{12}.*txt', re.I)
newlist = list(filter(uuid4hex.match, l)) # Read Note below
# Should be three txt files in the archive (history and the snapshot)
assert len(newlist) == 3

View File

@@ -0,0 +1,107 @@
#!/usr/bin/python3
# @NOTE: THIS RELIES ON SOME MIDDLEWARE TO MAKE CHECKBOXES WORK WITH WTFORMS UNDER TEST CONDITION, see changedetectionio/tests/util.py
import time
from flask import url_for
from .util import live_server_setup
def set_original_response():
test_return_data = """
Here
is
some
text
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def set_response_with_deleted_word():
test_return_data = """
Here
is
text
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def set_response_with_changed_word():
test_return_data = """
Here
ix
some
text
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def test_diff_filter_changes_as_add_delete(client, live_server):
live_server_setup(live_server)
sleep_time_for_fetch_thread = 3
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
# Wait for it to read the original version
time.sleep(sleep_time_for_fetch_thread)
# Make a change that ONLY includes deletes
set_response_with_deleted_word()
res = client.post(
url_for("edit_page", uuid="first"),
data={"trigger_add": "y",
"trigger_del": "n",
"url": test_url,
"fetch_backend": "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
time.sleep(sleep_time_for_fetch_thread)
# We should NOT see a change because we chose to not know about any Deletions
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
# Recheck to be sure
client.get(url_for("form_watch_checknow"), follow_redirects=True)
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
# Now set the original response, which will include the word, which should trigger Added (because trigger_add ==y)
set_original_response()
client.get(url_for("form_watch_checknow"), follow_redirects=True)
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' in res.data
# Now check 'changes' are always going to be triggered
set_original_response()
client.post(
url_for("edit_page", uuid="first"),
# Neither trigger add nor del? then we should see changes still
data={"trigger_add": "n",
"trigger_del": "n",
"url": test_url,
"fetch_backend": "html_requests"},
follow_redirects=True
)
time.sleep(sleep_time_for_fetch_thread)
client.get(url_for("mark_all_viewed"), follow_redirects=True)
set_response_with_changed_word()
client.get(url_for("form_watch_checknow"), follow_redirects=True)
time.sleep(sleep_time_for_fetch_thread)
res = client.get(url_for("index"))
assert b'unviewed' in res.data

View File

@@ -0,0 +1,83 @@
#!/usr/bin/python3
import time
from flask import url_for
from .util import live_server_setup
def set_original_response():
test_return_data = """
A few new lines
Where there is more lines originally
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def set_delete_response():
test_return_data = """
A few new lines
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def test_diff_filtering_no_del(client, live_server):
live_server_setup(live_server)
sleep_time_for_fetch_thread = 3
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
time.sleep(sleep_time_for_fetch_thread)
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"trigger_add": "y",
"trigger_del": "n",
"url": test_url,
"fetch_backend": "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
assert b'unviewed' not in res.data
# Make an delete change
set_delete_response()
time.sleep(sleep_time_for_fetch_thread)
# Trigger a check
client.get(url_for("form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# We should NOT see the change
res = client.get(url_for("index"))
assert b'unviewed' not in res.data
# Make an delete change
set_original_response()
time.sleep(sleep_time_for_fetch_thread)
# Trigger a check
client.get(url_for("form_watch_checknow"), follow_redirects=True)
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# We should see the change
res = client.get(url_for("index"))
assert b'unviewed' in res.data

View File

@@ -0,0 +1,72 @@
#!/usr/bin/python3
import time
from flask import url_for
from .util import live_server_setup
def set_original_response():
test_return_data = """
A few new lines
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def set_add_response():
test_return_data = """
A few new lines
Where there is more lines than before
"""
with open("test-datastore/endpoint-content.txt", "w") as f:
f.write(test_return_data)
def test_diff_filtering_no_add(client, live_server):
live_server_setup(live_server)
sleep_time_for_fetch_thread = 3
set_original_response()
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_endpoint', _external=True)
res = client.post(
url_for("import_page"),
data={"urls": test_url},
follow_redirects=True
)
assert b"1 Imported" in res.data
time.sleep(sleep_time_for_fetch_thread)
# Add our URL to the import page
res = client.post(
url_for("edit_page", uuid="first"),
data={"trigger_add": "n",
"trigger_del": "y",
"url": test_url,
"fetch_backend": "html_requests"},
follow_redirects=True
)
assert b"Updated watch." in res.data
assert b'unviewed' not in res.data
# Make an add change
set_add_response()
time.sleep(sleep_time_for_fetch_thread)
# Trigger a check
# Give the thread time to pick it up
time.sleep(sleep_time_for_fetch_thread)
# We should NOT see the change
res = client.get(url_for("index"))
# save res.data to a file
assert b'unviewed' not in res.data

View File

@@ -81,4 +81,4 @@ def test_consistent_history(client, live_server):
assert len(files_in_watch_dir) == 2, "Should be just two files in the dir, history.txt and the snapshot"
assert len(files_in_watch_dir) == 3, "Should be just three files in the dir, history.txt, previous.txt, and the snapshot"

View File

@@ -0,0 +1,33 @@
#!/usr/bin/python3
import time
from flask import url_for
from .util import live_server_setup
# If there was only a change in the whitespacing, then we shouldnt have a change detected
def test_jinja2_in_url_query(client, live_server):
live_server_setup(live_server)
# Give the endpoint time to spin up
time.sleep(1)
# Add our URL to the import page
test_url = url_for('test_return_query', _external=True)
# because url_for() will URL-encode the var, but we dont here
full_url = "{}?{}".format(test_url,
"date={% now 'Europe/Berlin', '%Y' %}.{% now 'Europe/Berlin', '%m' %}.{% now 'Europe/Berlin', '%d' %}", )
res = client.post(
url_for("form_quick_watch_add"),
data={"url": full_url, "tag": "test"},
follow_redirects=True
)
assert b"Watch added" in res.data
time.sleep(3)
# It should report nothing found (no new 'unviewed' class)
res = client.get(
url_for("preview_page", uuid="first"),
follow_redirects=True
)
assert b'date=2' in res.data

View File

@@ -5,7 +5,12 @@ import time
from flask import url_for, escape
from . util import live_server_setup
import pytest
jq_support = True
try:
import jq
except ModuleNotFoundError:
jq_support = False
def test_setup(live_server):
live_server_setup(live_server)
@@ -40,22 +45,24 @@ and it can also be repeated
assert text == "23.5"
# also check for jq
text = html_tools.extract_json_as_string(content, "jq:.offers.price")
assert text == "23.5"
if jq_support:
text = html_tools.extract_json_as_string(content, "jq:.offers.price")
assert text == "23.5"
text = html_tools.extract_json_as_string('{"id":5}', "jq:.id")
assert text == "5"
text = html_tools.extract_json_as_string('{"id":5}', "json:$.id")
assert text == "5"
text = html_tools.extract_json_as_string('{"id":5}', "jq:.id")
assert text == "5"
# When nothing at all is found, it should throw JSONNOTFound
# Which is caught and shown to the user in the watch-overview table
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "json:$.id")
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "jq:.id")
if jq_support:
with pytest.raises(html_tools.JSONNotFound) as e_info:
html_tools.extract_json_as_string('COMPLETE GIBBERISH, NO JSON!', "jq:.id")
def set_original_ext_response():
data = """
@@ -271,7 +278,8 @@ def test_check_jsonpath_filter(client, live_server):
check_json_filter('json:boss.name', client, live_server)
def test_check_jq_filter(client, live_server):
check_json_filter('jq:.boss.name', client, live_server)
if jq_support:
check_json_filter('jq:.boss.name', client, live_server)
def check_json_filter_bool_val(json_filter, client, live_server):
set_original_response()
@@ -329,7 +337,8 @@ def test_check_jsonpath_filter_bool_val(client, live_server):
check_json_filter_bool_val("json:$['available']", client, live_server)
def test_check_jq_filter_bool_val(client, live_server):
check_json_filter_bool_val("jq:.available", client, live_server)
if jq_support:
check_json_filter_bool_val("jq:.available", client, live_server)
# Re #265 - Extended JSON selector test
# Stuff to consider here
@@ -408,4 +417,5 @@ def test_check_jsonpath_ext_filter(client, live_server):
check_json_ext_filter('json:$[?(@.status==Sold)]', client, live_server)
def test_check_jq_ext_filter(client, live_server):
check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server)
if jq_support:
check_json_ext_filter('jq:.[] | select(.status | contains("Sold"))', client, live_server)

View File

@@ -4,6 +4,12 @@ from flask import make_response, request
from flask import url_for
import logging
import time
from werkzeug import Request
import io
# This is a fix for macOS running tests.
import multiprocessing
multiprocessing.set_start_method("fork")
def set_original_response():
test_return_data = """<html>
@@ -159,5 +165,42 @@ def live_server_setup(live_server):
ret = " ".join([auth.username, auth.password, auth.type])
return ret
# Make sure any checkboxes that are supposed to be defaulted to true are set during the post request
# This is due to the fact that defaults are set in the HTML which we are not using during tests.
# This does not affect the server when running outside of a test
class DefaultCheckboxMiddleware(object):
def __init__(self, app):
self.app = app
def __call__(self, environ, start_response):
request = Request(environ)
if request.method == "POST" and "/edit" in request.path:
body = environ['wsgi.input'].read()
# if the checkboxes are not set, set them to true
if b"trigger_add" not in body:
body += b'&trigger_add=y'
if b"trigger_del" not in body:
body += b'&trigger_del=y'
# remove any checkboxes set to "n" so wtforms processes them correctly
body = body.replace(b"trigger_add=n", b"")
body = body.replace(b"trigger_del=n", b"")
body = body.replace(b"&&", b"&")
new_stream = io.BytesIO(body)
environ["CONTENT_LENGTH"] = len(body)
environ['wsgi.input'] = new_stream
return self.app(environ, start_response)
live_server.app.wsgi_app = DefaultCheckboxMiddleware(live_server.app.wsgi_app)
# Just return some GET var
@live_server.app.route('/test-return-query', methods=['GET'])
def test_return_query():
return request.query_string
live_server.start()

View File

@@ -45,6 +45,9 @@ services:
# Respect proxy_pass type settings, `proxy_set_header Host "localhost";` and `proxy_set_header X-Forwarded-Prefix /app;`
# More here https://github.com/dgtlmoon/changedetection.io/wiki/Running-changedetection.io-behind-a-reverse-proxy-sub-directory
# - USE_X_SETTINGS=1
#
# Hides the `Referer` header so that monitored websites can't see the changedetection.io hostname.
# - HIDE_REFERER=true
# Comment out ports: when using behind a reverse proxy , enable networks: etc.
ports:

View File

@@ -1,8 +1,8 @@
flask~= 2.0
flask ~= 2.0
flask_wtf
eventlet>=0.31.0
eventlet >= 0.31.0
validators
timeago ~=1.0
timeago ~= 1.0
inscriptis ~= 2.2
feedgen ~= 0.9
flask-login ~= 0.5
@@ -19,7 +19,8 @@ chardet > 2.3.0
wtforms ~= 3.0
jsonpath-ng ~= 1.5.3
jq ~= 1.3.0
# jq not available on Windows so must be installed manually
# Notification library
apprise ~= 1.1.0
@@ -45,4 +46,9 @@ selenium ~= 4.1.0
# need to revisit flask login versions
werkzeug ~= 2.0.0
# Templating, so far just in the URLs but in the future can be for the notifications also
jinja2 ~= 3.1
jinja2-time
# playwright is installed at Dockerfile build time because it's not available on all platforms