Site Map - skip to main content

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes Monday through Friday.


hpr3596 :: Extracting text, tables and images from docx files using Python

In this episode, I describe how I used 2 python libraries to extract import data from docx files

<< First, < Previous, Latest >>

Host Image
Hosted by b-yeezi on Monday 2022-05-16 is flagged as Clean and is released under a CC-BY-SA license.
Tags: python,docx.

Listen in ogg, spx, or mp3 format. | Comments (0)

Part of the series: A Little Bit of Python

Initially based on the podcast "A Little Bit of Python", by Michael Foord, Andrew Kuchling, Steve Holden, Dr. Brett Cannon and Jesse Noller. https://www.voidspace.org.uk/python/weblog/arch_d7_2009_12_19.shtml#e1138

Now the series is open to all.

Tools to extract data from docx files:

  1. docx2txt
  2. python-docx2txt
  3. python-docx

Code Snippets

text = docx2txt.process(src, img_dest)
with open("data.txt", "wt") as f:
    f.write(text)
document = docx.Document(src)
tables = document.tables
data = []
for table in tables:
    table_data = []
    for row in table.rows:
        row_data = []
        for cell in row.cells:
            row_data.append(cell.text)
        table_data.append(row_data)
    data.append(table_table)

for i, table in enumerate(tables):
    with open(f"{i}.csv", "wt") as f:
        writer = csv.writer(f)
        writer.writerows(table)

Show Transcript

Automatically generated using whisper

whisper --model tiny --language en hpr3596.wav

<< First, < Previous, Latest >>


Comments

Subscribe to the comments RSS feed.

<< First, < Previous, Latest >>

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Provide feedback
Your Name/Handle:
Title:
Comment:
Anti Spam Question: What does the P in HPR stand for ?
Are you a spammer →
Who hosted this show →
What does HPR mean to you ?