Python to Store Emails in CSV with Full Source Code For Beginners

  • This project contains a simple script to extract email messages from an IMAP server.
  • The messages are written to a simple four-column CSV file.

Dependencies:

  • This depends on the BeautifulSoup library and lxml for extracting text from HTML messages.

Requirements:

  • beautifulsoup4
  • lxml

Running the script:

  • You will need to have a file credentials.txt with your IMAP server account name and password on separate lines.
  • Gmail – and many other IMAP providers – requires you to create a separate “application password” to allow this code to run, so probably do that first. Then put that password in credentials.txt.
  • Then simply run
python store_emails.py
Code language: CSS (css)

This generates mails.csv in the current directory.

The generated CSV file contains the following fields for each message:

  • Date
  • From (Sender)
  • Subject
  • Message text

Development ideas:

  • This hardcodes the IMAP server for Gmail.com and the "INBOX" folder. Perhaps this should be configured outside of the code for easier customization.
  • This brutally marks all messages as read. Perhaps make it PEEK so as to not change the message flags.
  • This will read everything in the INBOX folder. It could be useful to make it remember which messages it has already seen, and update a CSV file only with information from messages which have arrived since the previous poll.
  • It might be useful to be able to specify which messages to fetch, instead of have it fetch everything every time.
  • The exception handling is not a good example of how to do this properly.

Credentials.txt

yourEmailID yourPassword

Run the Script:

python split_files.py <csv/text_file> <split/line_number>
Code language: HTML, XML (xml)

Source Code:

store_emails.py

#!/usr/bin/env python import csv import email from email import policy import imaplib import logging import os import ssl from bs4 import BeautifulSoup credential_path = "credentials.txt" csv_path = "mails.csv" logger = logging.getLogger('imap_poller') host = "imap.gmail.com" port = 993 ssl_context = ssl.create_default_context() def connect_to_mailbox(): # get mail connection mail = imaplib.IMAP4_SSL(host, port, ssl_context=ssl_context) with open(credential_path, "rt") as fr: user = fr.readline().strip() pw = fr.readline().strip() mail.login(user, pw) # get mail box response and select a mail box status, messages = mail.select("INBOX") return mail, messages # get plain text out of html mails def get_text(email_body): soup = BeautifulSoup(email_body, "lxml") return soup.get_text(separator="\n", strip=True) def write_to_csv(mail, writer, N, total_no_of_mails): for i in range(total_no_of_mails, total_no_of_mails - N, -1): res, data = mail.fetch(str(i), "(RFC822)") response = data[0] if isinstance(response, tuple): msg = email.message_from_bytes(response[1], policy=policy.default) # get header data email_subject = msg["subject"] email_from = msg["from"] email_date = msg["date"] email_text = "" # if the email message is multipart if msg.is_multipart(): # iterate over email parts for part in msg.walk(): # extract content type of email content_type = part.get_content_type() content_disposition = str(part.get("Content-Disposition")) try: # get the email email_body email_body = part.get_payload(decode=True) if email_body: email_text = get_text(email_body.decode('utf-8')) except Exception as exc: logger.warning('Caught exception: %r', exc) if ( content_type == "text/plain" and "attachment" not in content_disposition ): # print text/plain emails and skip attachments # print(email_text) pass elif "attachment" in content_disposition: pass else: # extract content type of email content_type = msg.get_content_type() # get the email email_body email_body = msg.get_payload(decode=True) if email_body: email_text = get_text(email_body.decode('utf-8')) if email_text is not None: # Write data in the csv file row = [email_date, email_from, email_subject, email_text] writer.writerow(row) else: logger.warning('%s:%i: No message extracted', "INBOX", i) def main(): mail, messages = connect_to_mailbox() logging.basicConfig(level=logging.WARNING) total_no_of_mails = int(messages[0]) # no. of latest mails to fetch # set it equal to total_no_of_emails to fetch all mail in the inbox N = 2 with open(csv_path, "wt", encoding="utf-8", newline="") as fw: writer = csv.writer(fw) writer.writerow(["Date", "From", "Subject", "Text mail"]) try: write_to_csv(mail, writer, N, total_no_of_mails) except Exception as exc: logger.warning('Caught exception: %r', exc) if __name__ == "__main__": main()
Code language: PHP (php)

Leave a Comment