This script takes in a .pdf file and outputs a .txt file
Requirements:
- Python
- PyPDF2
Steps:
- In this program you have to provide the path for the pdf file that you want to convert into text and you may also provide the path where you want your output text file to be stored.
- By default the output files created will be stored in temp folder in the same directory.
Run the script:
python converter1.py
Code language: CSS (css)
Source Code:
converter1.py
import PyPDF2
import os
if(os.path.isdir("temp") == False):
os.mkdir("temp")
txtpath = ""
pdfpath = ""
pdfpath = input("Enter the name of your pdf file - please use backslash when typing in directory path: ") #Provide the path for your pdf here
txtpath = input("Enter the name of your txt file - please use backslash when typing in directory path: ") #Provide the path for the output text file
BASEDIR = os.path.realpath("temp") # This is the sample base directory where all your text files will be stored if you do not give a specific path
print(BASEDIR)
if(len(txtpath) == 0):
txtpath = os.path.join(BASEDIR,os.path.basename(os.path.normpath(pdfpath)).replace(".pdf", "")+".txt")
pdfobj = open(pdfpath, 'rb')
pdfread = PyPDF2.PdfFileReader(pdfobj)
x = pdfread.numPages
for i in range(x):
pageObj = pdfread.getPage(i)
with open(txtpath, 'a+') as f:
f.write((pageObj.extractText()))
print(pageObj.extractText()) #This just provides the overview of what is being added to your output, you can remove it if want
pdfobj.close()
Code language: PHP (php)