Python to Web Scraping CoWin Vaccine Slots with Source Code

n this article, you’ll learn how to perform web scraping on CoWin vaccine slots.

Prerequisites:

  1. Python Basics
  2. HTML Basics
  3. requests module
  4. pygame module

What is Web Scrapping?

  • Web scraping, also known as web data extraction, is the process of retrieving or “scraping” data from a website.
  • This information is collected and then exported into a format that is more useful for the user.
  • Be it a spreadsheet or an API.

Two important points to be taken into consideration here:

  1. Always be respectful and try to get permission to scrape, do not bombard a website with scraping requests, otherwise, your IP address may get blocked!
  2. Be aware that websites change often, meaning your code could go from working to totally broken from one day to the next.

The Process:

  1. Request for a response from the webpage
  2. Parse and extract with the help of Beautiful soup and lxml
  3. Download and export the data with pandas into excel

Uses:

  • It can serve several purposes, most popular ones are Investment Decision Making, Competitor Monitoring, News Monitoring, Market Trend Analysis, Appraising Property Value, Estimating Rental Yields, Politics and Campaigns and many more.

Covid-19 Data Source:

  • We will use CoWin website to fetch the data because we are interested in the data contained in a table at CoWin’s website, where there are lists all the available slots for vaccination along with the types of vaccines and their prices.

HTML:

<!DOCTYPE html>
<html>
  <head>
  </head> 
  <body>
    <h1> Scrapping </h1>
    <p> Hello </p>
  </body>
</html>
  • <!DOCTYPE html>: HTML documents must start with a type declaration.
  • The HTML document is contained between <html> and </html> .
  • The meta and script declaration of the HTML document is between <head> and </head> .
  • The visible part of the HTML document is between <body> and </body> tags.
  • Title headings are defined with the <h1> through <h6> tags.
  • Paragraphs are defined with the <p> tag.
  • Other useful tags include <a> for hyperlinks, <table> for tables, <tr> for table rows, and <td> for table columns.

Install Necessary Modules:

Open your  Prompt  and type and run the following command (individually):

pip install requests pip install pygame

Requests:

  • Use the requests library to grab the page.
  • This may fail if you have a firewall blocking Python/Jupyter.
  • Sometimes you need to run this twice if it fails the first time.

Once Installed now we can import it inside our python code.

Source Code:

'''
Python Program for Web Scraping on CoWin Vaccine Slots
'''

# Import the necessary module!
import requests
from pygame import mixer 
from datetime import datetime, timedelta
import time


age = 19
pincodes = ["400092"]
num_days = 2

print_flag = 'Y'

print("Starting search for Covid vaccine slots!")

# Get today's date
actual = datetime.today()
# Run a loop to convert it into list format. 
# We will make use of timedelta method to convert it into list format
list_format = [actual + timedelta(days=i) for i in range(num_days)]
# We will again run a loop to fetch the dates from the list and 
# we will make use of strftime method to do so. 
# Note that we are storing date in date-month-year format.
actual_dates = [i.strftime("%d-%m-%Y") for i in list_format]

# Let's introduce two more loops here:
# 1. to fetch details for each pin-code.
# 2. to fetch details for each date in the given pin-code.


while True:
    counter = 0   

    for pincode in pincodes:   
        for given_date in actual_dates:

            # Now in order to get requests, let's define the URL.
            # I have added parameters in the URL itself using string formatting. 
            # We are passing in two parameters here, pincode and date. 
            # Every time the inner loop runs, this URL will be called and 
            # the respective date and pin-code will be passed as arguments for each case.
            URL = "https://cdn-api.co-vin.in/api/v2/appointment/sessions/public/calendarByPin?pincode={}&date={}".format(pincode, given_date)
            # In order to get the request, let's define the header.
            header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'} 
            
            result = requests.get(URL, headers=header)

            # This data is not structured. So, we will make use of json here. 
            # Let's cast the data into json file.
            if result.ok:
                response_json = result.json()
                if response_json["centers"]:
                    if(print_flag.lower() =='y'):
                        for center in response_json["centers"]:
                            # let's display and verify our data for each center
                            # print(center)
                            # Finally, we can retrieve the information for each session. 
                            # We will apply the check parameters as the arguments we saved earlier.
                            for session in center["sessions"]:
                                if (session["min_age_limit"] <= age and session["available_capacity"] > 0 ):
                                    # If it stratifies our conditions, let's display the result
                                    print('Pincode: ' + pincode)
                                    print("Available on: {}".format(given_date))
                                    print("\t", center["name"])
                                    print("\t", center["block_name"])
                                    print("\t Price: ", center["fee_type"])
                                    print("\t Availablity : ", session["available_capacity"])

                                    if(session["vaccine"] != ''):
                                        print("\t Vaccine type: ", session["vaccine"])
                                    print("\n")
                                    # At the end, let's increase the counter by one
                                    counter = counter + 1
            else:
                print("No Response!")
    
    # Let's cover the edge case
    if(counter == 0):
        print("No Vaccination slot available!")
    else:
        mixer.init()
        mixer.music.load('sound/chime.mp3')
        mixer.music.play()
        print("Search Completed!")

    # Lastly, let's sync the data in real-time
    dt = datetime.now() + timedelta(minutes=3)

    while datetime.now() < dt:
        time.sleep(1)

Output:

pygame 2.0.1 (SDL 2.0.14, Python 3.8.8) Hello from the pygame community. https://www.pygame.org/contribute.html Starting search for Covid vaccine slots! Pincode: 400092 Available on: 26-07-2021 HCG APEX HOSPITAL BORIVALI W Ward R North Corporation - MH Price: Paid Availablity : 3576 Vaccine type: COVISHIELD Pincode: 400092 Available on: 26-07-2021 HCG APEX HOSPITAL BORIVALI W Ward R North Corporation - MH Price: Paid Availablity : 3775 Vaccine type: COVISHIELD Pincode: 400092 Available on: 26-07-2021 Adharika Bhawan Gorai Ward R Central Corporation - MH Price: Free Availablity : 1 Vaccine type: COVISHIELD Pincode: 400092 Available on: 27-07-2021 R/C APEX HOSPITALS Ward R Central Corporation - MH Price: Paid Availablity : 1 Vaccine type: COVISHIELD Pincode: 400092 Available on: 27-07-2021 HCG APEX HOSPITAL BORIVALI W Ward R North Corporation - MH Price: Paid Availablity : 3769 Vaccine type: COVISHIELD Search Completed! --------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) <ipython-input-1-79e540851788> in <module> 65 66 while datetime.now() < dt: ---> 67 time.sleep(1) KeyboardInterrupt:
Code language: JavaScript (javascript)

Leave a Comment