This script takes in a .pdf file and outputs a .txt file
- In this program you have to provide the path for the pdf file that you want to convert into text and you may also provide the path where you want your output text file to be stored.
- By default the output files created will be stored in temp folder in the same directory.
Run the script:
Code language: CSS (css)
Code language: PHP (php)
import PyPDF2 import os if(os.path.isdir("temp") == False): os.mkdir("temp") txtpath = "" pdfpath = "" pdfpath = input("Enter the name of your pdf file - please use backslash when typing in directory path: ") #Provide the path for your pdf here txtpath = input("Enter the name of your txt file - please use backslash when typing in directory path: ") #Provide the path for the output text file BASEDIR = os.path.realpath("temp") # This is the sample base directory where all your text files will be stored if you do not give a specific path print(BASEDIR) if(len(txtpath) == 0): txtpath = os.path.join(BASEDIR,os.path.basename(os.path.normpath(pdfpath)).replace(".pdf", "")+".txt") pdfobj = open(pdfpath, 'rb') pdfread = PyPDF2.PdfFileReader(pdfobj) x = pdfread.numPages for i in range(x): pageObj = pdfread.getPage(i) with open(txtpath, 'a+') as f: f.write((pageObj.extractText())) print(pageObj.extractText()) #This just provides the overview of what is being added to your output, you can remove it if want pdfobj.close()