Python to Retrieves all links from a given Webpage with Source Code

This script retrieves all links from a given Webpage and saves them as a tct file

Prerequisites

Required Modules

  • BeautifulSoup4
  • requests

To Install:

$ pip install -r requirements.txt

How to Run the Script:

$ python get_links.py

You will then be asked which webpage you would like to analyze. After that, the extracted links will be saved like an array inĀ myLinks.txt.

Requirements:

  • beautifulsoup4==4.9.2
  • requests==2.24.0

Source Code:

import requests as rq
from bs4 import BeautifulSoup

url = input("Enter Link: ")
if ("https" or "http") in url:
    data = rq.get(url)
else:
    data = rq.get("https://" + url)
soup = BeautifulSoup(data.text, "html.parser")
links = []
for link in soup.find_all("a"):
    links.append(link.get("href"))

# Writing the output to a file (myLinks.txt) instead of to stdout
# You can change 'a' to 'w' to overwrite the file each time
with open("myLinks.txt", 'a') as saved:
    print(links[:10], file=saved)

Leave a Comment