Using SelectorLib to build an API for any websites in less than 10 minutes

You can also use SelectorLib to make APIs that can scrape websites in Real Time. Here is an example API built using Python aiohttp, that will fetch product details from any product from https://scrapeme.live/ with a page structure similar to https://scrapeme.live/shop/Bulbasaur/

  1. The YAML
  2. The Code
  3. Directory Structure
  4. Running the API Server
  5. Full Code

The YAML

We will re use the YAML with formatters used in Formatting Fields

name:
    css: h1.product_title
    type: Text
image:
    css: .woocommerce-product-gallery__wrapper img
    type: Attribute
    attribute: src
price:
    css: 'p.price span.woocommerce-Price-amount'
    type: Text
    format: Price
short_description:
    css: 'div.woocommerce-product-details__short-description p'
    type: Text
stock:
    css: p.stock
    type: Text
sku:
    css: span.sku
    type: Text
categories:
    css: 'span.posted_in a'
    multiple: true
    type: Text
tags:
    css: 'span.tagged_as a'
    multiple: true
    type: Text
description:
    css: 'div.woocommerce-Tabs-panel.woocommerce-Tabs-panel--description p'
    type: Text
additional_information:
    css: 'table.shop_attributes tr'
    multiple: true
    type: Text
    children:
        info:
            css: th
            type: Text
        value:
            css: td
            type: Text
related_products:
    css: li.product
    multiple: true
    type: Text
    children:
        name:
            css: h2.woocommerce-loop-product__title
            type: Text
        image:
            css: img.attachment-woocommerce_thumbnail
            type: Attribute
            attribute: src
        price:
            css: span.price
            type: Text
            format: Price
        url:
            css: a.woocommerce-LoopProduct-link
            type: Link

The Code

import asyncio
import aiohttp
from aiohttp import web
import selectorlib
from selectorlib.formatter import Formatter

class Price(Formatter):
    def format(self, text):
        price = text.replace('£','').strip()
        return float(price)

product_page_extractor = selectorlib.Extractor.from_yaml_file('ProductPage_with_Formatter.yml',formatters = [Price])

async def get_product_page(request):
    async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(verify_ssl=False)) as session:
        product_url = request.rel_url.query['product_url']
        data = {'error':'Please provide a URL'}
        if product_url:
            html = await fetch(session, product_url)
            data = product_page_extractor.extract(html)
    return web.json_response(data)

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()


app = web.Application()
app.add_routes([web.get('/', get_product_page)])

if __name__ == '__main__':
    web.run_app(app)

Directory Structure

The directory structure should look like

aiohttp-api-example/
├── ProductPage_with_Formatter.yml
└── api.py

Running the API Server

From inside the folder aiohttp-api-example run

python3.7 api.py

The server should now start in localhost:8080

Make a request to get the data from

Try loading http://localhost:8080/?product_url=https://scrapeme.live/shop/Bulbasaur/ in your browser with the server running, and should see some data similar to

{
    "name": "Bulbasaur",
    "image": "https://scrapeme.live/wp-content/uploads/2018/08/001.png",
    "price": 63.0,
    "short_description": "Bulbasaur can be seen napping in bright sunlight. There is a seed on its back. By soaking up the sun\u2019s rays, the seed grows progressively larger.",
    "stock": "45 in stock",
    "sku": "4391",
    "categories": [
        "Pokemon",
        "Seed"
    ],
    "tags": [
        "bulbasaur",
        "Overgrow",
        "Seed"
    ],
    "description": "Bulbasaur can be seen napping in bright sunlight. There is a seed on its back. By soaking up the sun\u2019s rays, the seed grows progressively larger.",
    "additional_information": [
        {
            "info": "Weight",
            "value": "15.2 kg"
        },
        {
            "info": "Dimensions",
            "value": "2 x 2 x 2 cm"
        }
    ],
    "related_products": [
        {
            "name": "Beedrill",
            "image": "https://scrapeme.live/wp-content/uploads/2018/08/015-350x350.png",
            "price": 168.0,
            "url": "https://scrapeme.live/shop/Beedrill/"
        },
        {
            "name": "Metapod",
            "image": "https://scrapeme.live/wp-content/uploads/2018/08/011-350x350.png",
            "price": 148.0,
            "url": "https://scrapeme.live/shop/Metapod/"
        },
        {
            "name": "Wartortle",
            "image": "https://scrapeme.live/wp-content/uploads/2018/08/008-350x350.png",
            "price": 123.0,
            "url": "https://scrapeme.live/shop/Wartortle/"
        }
    ]
}

Full Code

You can find the full project in Github

View Code in Github