Python PDF to TXT Converter with Full Source Code For Beginners

This script takes in a .pdf file and outputs a .txt file

Requirements:

  • Python
  • PyPDF2

Steps:

  • In this program you have to provide the path for the pdf file that you want to convert into text and you may also provide the path where you want your output text file to be stored.
  • By default the output files created will be stored in temp folder in the same directory.

Run the script:

  • Open on device using a python IDE
  • Run the script
python converter1.pyCode language: CSS (css)

Source Code:

converter1.py

import PyPDF2
import os


if(os.path.isdir("temp") == False):
    os.mkdir("temp")
    
txtpath = ""
pdfpath = ""



pdfpath = input("Enter the name of your pdf file - please use backslash when typing in directory path: ")   #Provide the path for your pdf here
txtpath = input("Enter the name of your txt file - please use backslash when typing in directory path: ")   #Provide the path for the output text file  

BASEDIR = os.path.realpath("temp") # This is the sample base directory where all your text files will be stored if you do not give a specific path
print(BASEDIR)


if(len(txtpath) == 0):
    txtpath = os.path.join(BASEDIR,os.path.basename(os.path.normpath(pdfpath)).replace(".pdf", "")+".txt")
pdfobj = open(pdfpath, 'rb')

pdfread = PyPDF2.PdfFileReader(pdfobj)

x = pdfread.numPages


for i in range(x):
    pageObj = pdfread.getPage(i)
    with open(txtpath, 'a+') as f: 
        f.write((pageObj.extractText()))
    print(pageObj.extractText()) #This just provides the overview of what is being added to your output, you can remove it if want
                                    
pdfobj.close()  Code language: PHP (php)

Leave a Comment