Python to Create AudioBook from PDF with Source Code

In this article, you’ll learn how to create an audiobook from PDF using python


  1. Python Basics
  2. PyPDF2 module
  3. pyttsx3 module

Install Necessary Modules:

Open your  Prompt  and type and run the following command (individually):

pip install PyPDF2
pip install pyttsx3

PyPDF2 is a python library built as a PDF toolkit. It is capable of extracting document information splitting documents page by page merging documents page by page cropping pages merging multiple pages into a single page encrypting and decrypting PDF files and more!

pyttsx3 is a text-to-speech conversion library in Python. Unlike alternative libraries, it works offline and is compatible with both Python 2 and 3.

Once Installed now we can import it inside our python code.

Source Code:

Python Program to Create AudioBook from PDF

# Import the necessary modules!
import pyttsx3
import PyPDF2

# Open our file in reading format and store into book
book = open('demo.pdf','rb')    # `rb` stands for reading mode

# Call PyPDF2's PdfFileReader method on book and store it into pdf_reader
pdf_reader = PyPDF2.PdfFileReader(book)

# Calculate the no of pages in our pdf by using numPages method
num_pages = pdf_reader.numPages

# Initialize pyttsx3 using init method and let's print playing Audiobook
play = pyttsx3.init()
print('Playing Audio Book')

# Run a loop for the number of pages in our pdf file. 
# A page will get retrieved at each iteration.
for num in range(0, num_pages):
    page = pdf_reader.getPage(num)
    # Extract the text from our page using extractText method on our page and store it into data.
    data = page.extractText()

# Call say method on data and finally we can call runAndWait method at the end.    


Playing Audio Book

