2021 has been a strange year. Supply chains are in ruins, cost of goods are sky high and there is extreme shipping congestion.
This is all really bad news for me if i want to buy a new mountain bike from the UK.
The bike i want is only available from one online store, globally. There are country specific domains for the website, but the stock of these bikes are globally shared. The site also does not offer an alert system for these bikes due to scarcity. This creates a very interesting situation that i believe i can solve for myself (and now you).
When the bike i want does come into stock, i want to know about that instantly.
I’m going to create an AWS Lambda bot that will check the stock of the bike i want, every 5 minutes, and send me a SMS message if it is in stock.
Selenium is a powerful web browser automation tool.
Today i’m going to use it to load the website where i want to buy my bike and determine the stock level. First i need to load the website and locate the information.
Using the inspect tool, you can analyse the HTML of any webpage. The goal here is to determine where the key information is located, and how you can extract it.
The key information for me is:
Let’s get a better look at that important bit of HTML:
<li title="Large" for="104711085" class="email-when-in-stock bem-sku-selector__option-group-item" style="display: list-item;">
<input id="104711085" data-colour="Artichoke Green " data-size="Large" data-size-cd="" data-out-of-stock-for-country="False" type="radio" data-ga-action="Size" data-ga-label="Large" data-display-buy="{"BuyType":1,"AddToBasketButtonText":"Add to Basket","ProductAvailabilityMessage":"Currently out of stock","IsAvailableToOrder":true,"IsInStock":false,"EmailWhenInStockAvailable":false,"ShowAddToBasketButton":false,"IsAddedToDefaultWishList":false,"ProductAvailabilityAdditionalMessage":"Out of stock. Normally available in 2-4 weeks"}" data-list-price="NZ$7,743.83" data-unit-price="NZ$7,743.83" data-price-reason="Regular Price Change" data-additional-message="Out of stock. Normally available in 2-4 weeks" name="id" value="104711085" data-promo-message="" data-promo-sticker="" data-promo-ends="" data-percentage-saving="" data-product-availability-message="Currently out of stock" data-ewis-subscribed="false" data-ewis-message="" data-available-to-order="0" class="js-product-sku productId_104711085">
<span class="bem-sku-selector__size js-size">Large</span>
<span class="bem-sku-selector__price pull-right">NZ$7,743.83</span>
<div class="bem-sku-selector__status">
<span class="bem-sku-selector__status-stock bem-product-selector__radio out-of-stock js-stock-status-message">Currently out of stock</span>
</div>
</li>
What i can see here is title
is the size of the bike - This is how i will extract the bike size information.
Within the sub-element of type input
, there is a attribute called data-display-buy
. data-display-buy
contains a json structure, and one of those keys is called ProductAvailabilityMessage
- That’s how i’ll get the bike’s stock level.
First we need to set things up.
# Create webdriver
driver = webdriver.Chrome("/opt/bin/chromedriver",
options=options)
# Load the web page
driver.get("https://<insert bike product page>)
Now the driver is setup. I want to search for all elements whose class name is bem-sku-selector__option-group-item
. I know there is one of these elements for each size of the bike.
From there all i needed to do is get the title
of that element.
# iterate through all elements of class name bem-sku-selector__option-group-item
# i.e. iterate through all sizes
for bike_driver in driver.find_elements_by_class_name("bem-sku-selector__option-group-item"):
# get title from element.
size = bike_driver.get_attribute("title").lower()
A bike_driver
variable exists for each size of the bike, which i’m iterating through. bike_driver
represents the element of class name bem-sku-selector__option-group-item
.
I know that object has a sub-element called input
, and i want to get inside that and extract some information from one of input
’s attributes.
# get inner element, from within the previously used 'bike_driver' (specific size)
inner_element = bike_driver.find_element_by_xpath("input").get_attribute("data-display-buy")
# load the json and extract the value from key 'ProductAvailabilityMessage'.
status = json.loads(inner_element)['ProductAvailabilityMessage'].lower()
I can see the bike status when out of stock is called: currently out of stock
.
I also know that the in-stock status message is different depending on the level of stock.
I want to be notified for every single stock level.
I also only care about the Large size.
# Throw away anything that's not large.
if size == 'large':
# Check if in stock by:
# Check the if the out-of-stock status is NOT the current status
if 'currently out of stock' not in status:
# Print the successful result
print(f"{bike.split('/')[-1].replace('-',' ')} available in {size}. See here - {bike}")
# Send me a SMS via AWS SNS.
sns_client.publish(TopicArn='arn:aws:sns:ap-southeast-2:111122223333:mobile', Message=f"{bike.split('/')[-1].replace('-',' ')} available in {size}. See here - {bike}")
# If out of stock
else:
# Print unsuccessful result
print(f"{bike.split('/')[-1].replace('-',' ')} is unavailable in {size}")
# Catch any error
except Exception as e:
# Send me an email of the error so i can fix it asap.
sns_client.publish(TopicArn='arn:aws:sns:ap-southeast-2:111122223333:personal-email', Message=f"MTB Lambda Error: {e}")
# Print error
print(e)
# We're all done! Close the driver.
driver.quit()
I want to run this bot day and night, non-stop, and be notified immediately in the event of stock - But i also don’t want to spend money to run this on a server.
That’s where AWS Lambda comes in.
AWS Lambda is a serverless compute service and it costs peanuts. Peanuts however, is more than zero! Thankfully, AWS has a huge free tier and Lambda is a key part of that. I get 400,000 GB seconds of lambda compute every month, forever, for free. Thanks Jeff mate!
Now that sounds great, but it’s a little more tricky to run Selenium within Lambda… To solve that, i’ll be using Docker.
I’m not the first person to have this issue, and the good people of the internet have shared their knowledge for me to learn from. Thanks docker-selenium-lambda!
FROM public.ecr.aws/lambda/python:3.8 as build
RUN mkdir -p /opt/bin/ && \
mkdir -p /tmp/downloads && \
yum install -y unzip && \
curl -SL https://chromedriver.storage.googleapis.com/2.37/chromedriver_linux64.zip > /tmp/downloads/chromedriver.zip && \
curl -SL https://github.com/adieuadieu/serverless-chrome/releases/download/v1.0.0-37/stable-headless-chromium-amazonlinux-2017-03.zip > /tmp/downloads/headless-chromium.zip && \
unzip /tmp/downloads/chromedriver.zip -d /opt/bin/ && \
unzip /tmp/downloads/headless-chromium.zip -d /opt/bin/
FROM public.ecr.aws/lambda/python:3.8
COPY requirements.txt ./
RUN mkdir -p /opt/bin && python3.8 -m pip install -r requirements.txt -t .
COPY google-chrome.repo /etc/yum.repos.d/
RUN yum install -y --enablerepo=google-chrome google-chrome-stable
COPY --from=build /opt/bin/headless-chromium /opt/bin/
COPY --from=build /opt/bin/chromedriver /opt/bin/
COPY app.py ./
CMD ["app.lambda_handler"]
We’re almost there!
We currently have:
And now we need to deploy this to AWS, and run it on a schedule.
AWS SAM (Serverless Application Model) is a framework to build and deploy serverless apps.
I’ll be using AWS Lambda, SNS, and EventBridge - So AWS SAM is perfect for me!
Now this post’s focus is on Selenium & running it within Lambda, so i won’t dwell on AWS SAM - Here’s a getting started guide.
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: >
python3.8
SAM Template for mtb-stock-notification
# More info about Globals: https://github.com/awslabs/serverless-application-model/blob/master/docs/globals.rst
Globals:
Function:
Timeout: 120
Resources:
MTBStockNotification:
Type: AWS::Serverless::Function # More info about Function Resource: https://github.com/awslabs/serverless-application-model/blob/master/versions/2016-10-31.md#awsserverlessfunction
Properties:
PackageType: Image
MemorySize: 230
ImageUri: 'mtbstocknotification'
Events:
MTBStockNotificationSchedule:
Type: Schedule
Properties:
Schedule: 'rate(5 minutes)'
Name: mtb-stock-notification
Enabled: True
Policies:
- SNSPublishMessagePolicy:
TopicName: mobile
- SNSPublishMessagePolicy:
TopicName: personal-email
Metadata:
Dockerfile: Dockerfile
DockerContext: ./source
DockerTag: python3.8-v1
DockerBuildArgs: {"--platform": "linux/amd64"}
Outputs:
MTBStockNotification:
Description: "MTBStockNotification Function ARN"
Value: !GetAtt MTBStockNotification.Arn
MTBStockNotificationIamRole:
Description: "Implicit IAM Role created for MTBStockNotification function"
Value: !GetAtt MTBStockNotificationRole.Arn
aws sam build
aws sam deploy --guided
Nicely done!
We’ve successfully used selenium to extract the important information from a website. Using Serverless AWS tools, we deployed a system that will run our code over and over, reporting any MTB stock success to our mobile phone.
I’ve been running this bot for 9 months now. The bike came into stock once over that time, for 3 hours, while i was overseas. Maybe next time i’ll be more successful…