Python to Find IMDB Ratings with Full Source Code

  • This script is used to fetch the Ratings and Genre of the films in your films folder that match with ones on IMDb, the data is scraped from IMDB’s official website and store in a csv file.
  • The csv file can be used for analysis then, sorting acc to rating etc.

Input:

  • Path of the directory which contains the films.

Output:

  • A new csv file is made – ‘film_ratings.csv’ which contains the ratings for the films in your directory.

Requirement:

  • beautifulsoup4==4.9.1
  • certifi==2020.6.20
  • chardet==3.0.4
  • idna==2.10
  • pandas==1.1.2
  • python-dateutil==2.8.1
  • pytz==2020.1
  • requests==2.24.0
  • six==1.15.0
  • soupsieve==2.0.1
  • urllib3==1.26.5

Prerequisites:

  • This program uses and external dependency of ‘BeautifulSoup’ (for web scraping), ‘requests’ (for fetching content of the webpage), ‘pandas’ (to make the csv file), ‘os’ (to get data from directory).
  • These libraries can be installed easily by using the following command: pip install -r requirements.txt

Run the Script:

  • Install the requirements.
  • Inside the find_IMDb_rating.py, update the directory path.
  • Type the following command: python find_IMDb_rating.py
  • A csv file with rating will be created in the same directory as the python file.

Source Code:

find_IMDb_rating.py

from bs4 import BeautifulSoup
import requests
import pandas as pd
import os

# Setting up session
s = requests.session()  

# List contaiting all the films for which data has to be scraped from IMDB
films = []

# Lists contaiting web scraped data
names = []
ratings = []
genres = []

# Define path where your films are present 
# For eg: "/Users/utkarsh/Desktop/films"
path = input("Enter the path where your films are: ")

# Films with extensions
filmswe = os.listdir(path)

for film in filmswe:
    # Append into my films list (without extensions)
    films.append(os.path.splitext(film)[0])
    # print(os.path.splitext(film)[0])

for line in films:
    # x = line.split(", ")
    title = line.lower()
    # release = x[1]
    query = "+".join(title.split()) 
    URL = "https://www.imdb.com/search/title/?title=" + query
    print(URL)
    # print(release)
    try: 
        response = s.get(URL)

        #getting contect from IMDB Website
        content = response.content

        # print(response.status_code)

        soup = BeautifulSoup(response.content, features="html.parser") 
        #searching all films containers found
        containers = soup.find_all("div", class_="lister-item-content")
        for result in containers:
            name1 = result.h3.a.text
            name = result.h3.a.text.lower()

            # Uncomment below lines if you want year specific as well, define year variable before this 
            # year = result.h3.find(
            # "span", class_="lister-item-year text-muted unbold"
            # ).text.lower() 

            #if film found (searching using name)
            if title in name:
                #scraping rating
                rating = result.find("div",class_="inline-block ratings-imdb-rating")["data-value"]
                #scraping genre
                genre = result.p.find("span", class_="genre")
                genre = genre.contents[0]

                #appending name, rating and genre to individual lists
                names.append(name1)
                ratings.append(rating)
                genres.append(genre)
                


    except Exception:
        print("Try again with valid combination of tile and release year")

#storing in pandas dataframe
df = pd.DataFrame({'Film Name':names,'Rating':ratings,'Genre':genres}) 

#making csv using pandas
df.to_csv('film_ratings.csv', index=False, encoding='utf-8')

Output:

Folder
CSV File

Leave a Comment