hpr2309 :: Crowdsourcing Accessibility

A show about my efforts to get lots of students to help correct transcriptions of my online lectures

Hosted by Jon Kulp on 2017-06-08 is flagged as Clean and is released under a CC-BY-SA license.
Tags: Accessibility, scripting, audio editing, speech-to-text. Comments: 2.
The show is available on the Internet Archive at: https://archive.org/details/hpr2309

Listen in ogg, spx, or mp3 format. Play now:

Duration: 00:22:34

Part of the series: Accessibility.

Shows about tearing down the barriers for our fellow hackers.

In order to meet basic accessibility standards, I need to have text alternatives to the audio of my online video lectures for my music appreciation class. I have a transcription tool called Dragon Dictate that can do most of the heavy lifting as far as getting a raw transcript of the audio, but the transcription it generates needs a lot of attention in terms of correction, capitalization, and punctuation. It also needs to have all of the text separated into logical paragraphs and it really helps to have proper section headings.

There are 20 lectures in all, and I have finished doing 11 of them, but I still have nine to go and no time to do it. I had an idea to crowdsource this effort by giving extra-credit points to my students for doing little bits of it at a time. They get one extra-credit point for every one minute of lecture that they correct.

I got the idea for this from the Distributed Proofreaders project, where volunteers work to help correct any mistakes that are found in the OCR scans of public-domain books before being posted on a website like Project Gutenberg. So far I've gotten about 30 minutes of lecture transcripts corrected by students who needed extra credit, and I have high hopes that we will finish the project either this summer or next fall.

One excellent tool that I found while I was figuring out how to handle this project logistically is the Linux command line tool called mp3splt. I use this tool to cut the long lecture files up into one-minute segments like so:

mp3splt -t 1.0.0 L13audio.mp3

I also wrote my own script that will generate an HTML page with individual audio players for all of these one-minute audio files so that students can very easily choose an audio file to work on that is exactly one minute long. The script also pushes all of the audio files over to my server after creating ogg versions of the mp3s using mp32ogg.

#!/bin/bash


url='https://servername.edu/path/to/filedir'
page=$(pwd)/$(basename $(pwd))_page.html

LESSON=$(ls *.mp3 |head -n1 | sed -e 's/audio.*$//')

cat >> $page <<EOFtop
<h2><a href="https://servername.edu/path/to/filedir/$LESSON.html">RAW TRANSCRIPT HERE</a></h2>
EOFtop

for i in *.mp3; do
  stem=$(basename $i .mp3)
  mp32ogg $i 
  sleep .2
cat >> $page <<EOF

<h3>File: "$i"</h3>
<div class="centered">
	<audio controls>
		<source src="$url/$stem.mp3" type="audio/mpeg">
		<source src="$url/$stem.ogg" type="audio/ogg">
	</audio>
</div>
EOF
done

scp *.ogg servername:~/path/to/filedir/
sleep 1
scp *.mp3 servername:~/path/to/filedir/
#sleep 1
scp $page servername:~/path/to/filedir/

Show Transcript

Automatically generated using whisper

whisper --model tiny --language en hpr2309.wav

You can save these subtitle files to the same location as the HPR Episode, and they will automatically show in players like mpv, vlc. Some players allow you to specify the subtitle file location.

<< First, < Previous, Next >, Latest >>

Comments

Comment #1 posted on 2017-06-11 20:31:20 by Dave Morriss

Interesting project; interesting word

I like 'bloviate' too. In investigating its etymology I found an article on "World Wide Words", where I often go for information on unusual words. I found this, which you might like: https://www.worldwidewords.org/weirdwords/ww-blo1.htm

Comment #2 posted on 2017-06-11 21:36:07 by Jonathan Kulp

absquatulate

Great page! I like the reference to the following words as well: sockdolager, hornswoggle and absquatulate. Gotta start using those...

Leave Comment

Note to Verbose Commenters
If you can't fit everything you want to say in the comment below then you really should record a response show instead.

Note to Spammers
All comments are moderated. All links are checked by humans. We strip out all html. Feel free to record a show about yourself, or your industry, or any other topic we may find interesting. We also check shows for spam :).

Your Name/Handle:
Title:
Comment:
Anti Spam Question:	What does the letter P in HPR stand for?
Are you a spammer?	Yes No
What is the HOST_ID for the host of this show?
What does HPR mean to you?

Hacker Public Radio

Your ideas, projects, opinions - podcasted.

New episodes every weekday Monday through Friday.
This page was generated by The HPR Robot at Fri, 26 Apr 2024 10:38:03 +0000