Python script to monitor website changes

No comments

My local Sheriff’s office puts out a “Daily Activity Report” every weekday, and it’s a great way to keep an eye on the neighborhood. I have been meaning to find a way to scrape content on that website, and have an email alert set up.  Here is the process flow that I came up with:

Step 1.  Hash the contents of a website
Step 2.  Wait X amount of seconds
Step 3.  Hash the contents again.
Step 4.  If there was a change, send me an email. If no change, wait X seconds and try again.

So using this simple logic, I used my rudimentary Python skills to write a short script to do this.  Lets break the script down:


import requests
import time
import smtplib
from email.message import EmailMessage
import hashlib
from urllib.request import urlopen


Above are all of the modules we’re importing. Pretty standard stuff.


url = 'https://www.volusiasheriff.org/reports/district5-logs.stml'
response = urlopen(url).read()
currentHash = hashlib.sha224(response).hexdigest()


These are some variables we’re going to set. The url is obviously the page we’re wanting to monitor.’response’ is just a function that reads out URL, and ‘currentHash’ hashes the entire page. So these lines set the initial hash.

Note that while it works on this site, many pages serve dynamic content, including ads, relative dates, etc., and any change on a page can cause the hash to change.  There is a use case for the BeautifulSoup library, which I will discuss in the future.


while True:


I typically don’t use a ‘while True’ loop, but in this case, it really did make the most sense. It’s a script that I want to continually run unless certain conditions meet criteria for action.  It’s best practice to do an if/then statement, generally.


    try:

        response = urlopen(url).read()
        currentHash = hashlib.sha224(response).hexdigest()
        time.sleep(240)
        response = urlopen(url).read()
        newHash = hashlib.sha224(response).hexdigest()

        if newHash == currentHash:
            continue


As you can see above, I have things nested in a ‘try/except’ statement.  I did this because I would occasionally hit connectivity issues, and the script would fail.Below ‘try’, the first two lines visit the page and hash the contents.  This produces a unique “ID” for that page and everything on it.

I then wait for 240 seconds, and then revisit and grab the hash.

Next is an ‘if/else’ statement.  I compare the old and new hash, and if they match, I just continue the loop.


        else:

            msg = EmailMessage()
            msg.set_content(url)
            msg['From'] = 'emailaddress@gmail.com'
            msg['To'] = 'emailaddress@gmail.com'
            msg['Subject'] = 'New Daily Activity Report'
            fromaddr = 'emailaddress@gmail.com'
            toaddrs = ['emailaddress@gmail.com']
            server = smtplib.SMTP('smtp.gmail.com', 587)
            server.starttls()
            server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
            server.login('emailaddress@gmail.com', 'insertpasswordhere')
            server.send_message(msg)
            server.quit()
            response = urlopen(url).read()
            currentHash = hashlib.sha224(response).hexdigest()
            time.sleep(240)
            continue


If the hashes don’t match, I want to send an email letting me know the page was updated.  Above, you can see that I set the fields for the email to be sent.  I actually made a separate gmail account just for sending emails – that way, because the password here is plaintext, I can mitigate the security implications.  The ‘to’ field is my actual email address, and the ‘from’ is the new one, which is only used for these purposes.

So I set the fields, then use Google’s SMTP to send the email.  The content is just the URL, so that I open the email and click.  I suppose you could scrape the site and insert the fields into the body, and I may do that one day.

Again, after it’s sent, I sleep 240 seconds and then compare hashes again.


    except Exception as e:

        msg = EmailMessage()
        msg.set_content(url)
        msg['From'] = 'emailaddress@gmail.com'
        msg['To'] = 'emailaddress@gmail.com'
        msg['Subject'] = 'DAR NETWORK FAILURE'
        fromaddr = 'emailaddress@gmail.com'
        toaddrs = ['emailaddress@gmail.com']
        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
        server.login('emailaddress@gmail.com', 'insertpasswordhere')
        server.send_message(msg)
        server.quit()


Here is the ‘except’ part of the try/except loop. By nesting the primary functions in the ‘try’ section, if there is an issue with connecting to the site, I have the script emailing me to let me know there was an issue. It then waits a while and tries everything again.

So that’s the script! It’s simple, and won’t work on a lot of websites, but it works great on this site.  Here’s the final script:


import requests
import time
import smtplib
from email.message import EmailMessage
import hashlib
from urllib.request import urlopen

url = 'https://www.volusiasheriff.org/reports/district4-logs.stml'
response = urlopen(url).read()
currentHash = hashlib.sha224(response).hexdigest()

while True:

    try:

        response = urlopen(url).read()
        currentHash = hashlib.sha224(response).hexdigest()
        time.sleep(240)
        response = urlopen(url).read()
        newHash = hashlib.sha224(response).hexdigest()

        if newHash == currentHash:
            continue

        else:

            msg = EmailMessage()
            msg.set_content(url)
            msg['From'] = 'emailaddress@gmail.com'
            msg['To'] = 'emailaddress@gmail.com'
            msg['Subject'] = 'New Daily Activity Report'
            fromaddr = 'emailaddress@gmail.com'
            toaddrs = ['emailaddress@gmail.com']
            server = smtplib.SMTP('smtp.gmail.com', 587)
            server.starttls()
            server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
            server.login('emailaddress@gmail.com', 'insertpasswordhere')
            server.send_message(msg)
            server.quit()
            response = urlopen(url).read()
            currentHash = hashlib.sha224(response).hexdigest()
            time.sleep(240)
            continue

    except Exception as e:

        msg = EmailMessage()
        msg.set_content(url)
        msg['From'] = 'emailaddress@gmail.com'
        msg['To'] = 'emailaddress@gmail.com'
        msg['Subject'] = 'DAR NETWORK FAILURE'
        fromaddr = 'emailaddress@gmail.com'
        toaddrs = ['emailaddress@gmail.com']
        server = smtplib.SMTP('smtp.gmail.com', 587)
        server.starttls()
        server = smtplib.SMTP_SSL('smtp.gmail.com', 465)
        server.login('emailaddress@gmail.com', 'insertpasswordhere')
        server.send_message(msg)
        server.quit()


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s