Links

Here are several sets of links to useful sociophonetics resources. Many more sociophonetically relevant resources are online than we can catalogue and more resources appear regularly, so please consider this page just a starter.

Also please note that website locations can change. If any of the links do not work, you may have success searching the internet for the project or webpage by name. We will update this site from time to time, but we cannot guarantee that the linked to resources will remain active.

Data discussed in the book

Vowels in America (VIA) Project – see Fridland’s website, https://packpages.unr.edu/fridland/
We provide some sample files from Vowels in America Project on the Audio and figures.php'>Figures pages of this website.
Corpus of Regional African American Language (CORAAL) – https://oraal.uoregon.edu/coraal
All of CORAAL is freely downloadable from its website, and accessible through its online interface at http://lingtools.uoregon.edu/coraal/explorer/. The book provides a couple of TinyURLs that link directly to files discussed in the book, but you can browse or search the entire corpus directly from its website.

Also mentioned in the book:
- Samples of Arthur the Rat passage from DARE, read by speakers of different US dialects, can be accessed on the DARE website at https://dare.wisc.edu/audio/arthur-the-rat/.
- The Atlas of North American English (ANAE) used to have an interactive website, at http://www.atlas.mouton-content.com/. The site is down at the time of publication, but hopefully it will return. Vowel formant measurements from the ANAE have been used by a number of projects and may be available online as well.
- The Sociolinguistic Archive and Analysis Project (SLAAP) is a large archive housing a number of sociolinguistic data collections, at https://slaap.chass.ncsu.edu/. SLAAP’s software also includes a range of analysis tools and was the basis of Kendall’s (2009, 2013) corpus sociophonetic studies of speech rate and pause related phenomena. SLAAP requires a user account to access and permissions from the researchers in charge of its different recording collections, but many collections can be shared for research or educational purposes. Many of SLAAP’s collections are indexed in OLAC.
- Peterson & Barney's (1952) vowel formant data are available through several channels. The easiest way to obtain these data is through Praat (see https://www.fon.hum.uva.nl/praat/manual/Create_formant_table__Peterson___Barney_1952_.html). Also see the Figures page for the code and data to recreate our plots of the Peterson & Barney data.

[ Return to top ]

The International Phonetic Alphabet

International Phonetic Alphabet (IPA) – https://www.internationalphoneticassociation.org/content/ipa-chart
The official home of the International Phonetic Alphabet. Please note that many other versions of the IPA Chart are available online, including some with example sounds (such as this one: https://web.uvic.ca/ling/resources/ipa/charts/IPAlab/IPAlab.htm).

[ Return to top ]

Sociophonetic-related software and online tools

AutoVOT - https://github.com/mlml/autovot/
Software for the automatic measurement of voice onset time (VOT). (Keshet et al. 2014)
DARLA: Dartmouth Linguistic Automation – http://darla.dartmouth.edu
A suite of vowel formant extraction tools tailored to research questions in sociophonetics. (Reddy & Stanford 2015)
FAVE - https://github.com/JoFrhwld/FAVE/wiki
The most widely used tool for automated vowel extraction from transcribed speech. Note that the system is trained on American English; despite that it is used widely for other languages and varieties of English, one should make sure it works appropriately before trusting its results for other language varieties. (Rosenfelder et al. 2014)
ISCAN (and associated software) – https://spade.glasgow.ac.uk/software/
The main software being developed as a part of the Speech across Dialects of English (SPADE) project, which seeks to develop innovative and user-friendly software to facilitate large-scale, integrated speech corpus analysis across many datasets together. (McAuliffe et al. 2019)
LaBB-CAT (formely ONZE Miner) – http://labbcat.sourceforge.net
LaBB-CAT is a browser-based tool that stores audio or video recordings, text transcripts, and other annotations, and facilitates processing of the files. LaBB-CAT was initially designed as a part of the Origins of New Zealand English (ONZE) project (see https://www.canterbury.ac.nz/nzilbb/research/onze/) and has been used extensively in projects related to ONZE. While LaBB-CAT is not included in our list of forced alignment systems below, LaBB-CAT includes forced alignment routines. (Fromont & Hay 2012)
NORM: The Online Vowel Normalization and Plotting Suite – http://lingtools.uoregon.edu/norm/
NORM is a web-based interface to the vowels.R package for the R programming language, which is designed to aid in the manipulation, normalization, and plotting of vowel formant data. Its easy to use web-interface is a good starting place for plotting and normalizing vowel data. (Thomas & Kendall 2009)
Praat - http://www.fon.hum.uva.nl/praat/
Praat is the main software used for acoustic (and other) phonetic analysis currently. We discuss it quite a bit in the book and provide some screenshots of Praat’s Editor window. (Boersma & Weenink 2020)
Praat resources and scripts
- Parselmouth, a Python interface for Praat - https://parselmouth.readthedocs.io
  Over the years several interfaces into Praat from other programming languages have been developed. Parselmouth is a major one for Python, which is increasingly becoming a major programming tool for linguistic research. (Jadoul et al. 2018)
- rPraat, an R interface for aspects of Praat - https://cran.r-project.org/web/packages/rPraat/index.html
  R is a widely used programming language and data analysis environment used by linguists (see just below). This package provides some control of Praat through R.
- Praat Scripts! Very many collections of Praat scripts are available online. Many are available from the websites of individual researchers/labs or as supplemental materials in publications, but several larger clearinghouses exist, such as https://sites.google.com/site/praatscripts/. Also, several researchers have compiled extensive websites on Praat scripts and Praat scripting (e.g. https://lennes.github.io/spect/, http://phonetics.linguistics.ucla.edu/facilities/acoustic/praat.html, http://mattwinn.com/praat.html, and https://www.ub.edu/phoneticslaboratory/praat-scripts.html).
- Praat Tutorials. Very many tutorials for Praat and Praat scripting are also available online. For example, https://wstyler.ucsd.edu/praat/. We recommend you search the web for "Praat Tutorial" and see what is currently available
- See our page Scripts & Code for scripts we provide to aid in vowel formant extraction and sibilant measure extraction in Praat.
R programming language and environment – https://r-project.org/
R has become one of the main tools used by linguists in recent years and this is especially true in sociophonetics. R was used to generate all of the figures for the book. A number of packages or other resources are available for R that are specifically designed for sociophonetic or phonetic purposes.
R packages & other scripts
- Vowels.R – https://cran.r-project.org/web/packages/vowels/index.html
  Vowels.R is a package for the R programming language which provides a number of functions to help with vowel normalization and plotting. The vowel plots presented in this book were generated with the help of the Vowels.R package. (Kendall & Thomas 2012)
- PhonTools – http://www.santiagobarreda.com/rscripts.html & https://cran.r-project.org/web/packages/phonTools/
  PhonTools is another package for R, with a wide range of features supporting phonetic data processing and presentation. (Barreda 2014)
- PhonR – https://drammock.github.io/phonR/ & https://cran.r-project.org/web/packages/phonR/
  PhonR is another package for R, providing phonetic data processing and presentation features. (McCloy 2016)
- English Syllable Counter – http://lingtools.uoregon.edu/scripts/english_syllable_counter-102.R
  This R script contains a function that will count syllables in English orthographic text. It is a slightly updated version of the algorithm used in Kendall’s (2013) book Speech Rate, Pause, and Sociolinguistic Variation: Studies in Corpus Sociophonetics.
- We don’t dive deeply into statistics in the book, but R is very commonly used for statistical analysis and other more general data processing routines. Vast resources are available for these tasks in R, but a catalog of those kinds of resources is outside our scope here.
- See our Scripts & Code page and our Figures page for R scripts we provide to exemplify some sociophonetic data processing.
Snack Sound Toolkit - http://www.speech.kth.se/snack/
Software for several platforms, including Tcl/Tk and Python (and Ruby http://rbsnack.sourceforge.net), for sound manipulation and analysis. (Sjölander 2004)
STRAIGHT - http://web.wakayama-u.ac.jp/~kawahara/STRAIGHTadv/index_e.html
A speech analysis, modification, and synthesis system. (Kawahara 2008)
VoiceSauce - http://www.phonetics.ucla.edu/voicesauce/
A Matlab application from UCLA which provides automated voice measurements. (Shue et al. 2011)
Vowel Overlap Indication Software (VOIS3D) Project – https://depts.washington.edu/sociolab/VOIS3D/
Software to calculate and visualize vowel distributional overlap. (Wassink 2006)

[ Return to top ]

Forced-alignment software

Montreal Forced Aligner (MFA) - https://montreal-forced-aligner.readthedocs.io/
A very popular and accurate forced alignment system at the time of the book’s publication. Unlike most other forced alignment systems, MFA uses Kaldi as its back-end (see below). MFA can be trained and used on any language. (McAuliffe et al. 2017)
FAVE-Align - https://github.com/JoFrhwld/FAVE/wiki/FAVE-align
A popular forced aligner; a part of the larger FAVE system. (Rosenfelder et al. 2014)
ProsodyLab Aligner - http://prosodylab.org/tools/aligner/
A forced alignment system that can be trained and run on any language. (Gorman et al. 2011)
The Munich Automatic Speech Segmentation System (MAUS) - a href='https://www.bas.uni-muenchen.de/Bas/BasMAUS.html'>https://www.bas.uni-muenchen.de/Bas/BasMAUS.html
A robust system for forced alignment, supporting a number of languages and also with a web-interface. (Kisler et al. 2012)
P2FA - https://babel.ling.upenn.edu/phonetics/old_website_2015/p2fa/index.html
The Penn Phonetics Lab Forced Aligner was the first widely used forced alignment system for sociophonetic work. Its code has been reused in many more recent systems. We provide a link to its original website, but Google searches for this software will find various later versions and derivatives. (Yuan & Liberman 2008)

Related to forced alignment:

CMU Pronouncing Dictionary - http://www.speech.cs.cmu.edu/cgi-bin/cmudict
A list of forced aligners compiled by Alberto Pettarin - https://github.com/pettarin/forced-alignment-tools
Eleanor Chodroff’s Corpus Phonetics Tutorial provides good discussions and tutorials for several forced alignment systems - https://eleanorchodroff.com/tutorial/intro.html
Back-ends:
- HTK - http://htk.eng.cam.ac.uk
  HTK (Hidden Markov Model Toolkit) is the main back-end processing engine behind most of the existing forced alignment systems.
- Kaldi - http://kaldi-asr.org
  Kaldi is the back-end processing engine behind some newer forced alignments systems, in particular the Montreal Forced Aligner.

[ Return to top ]

Software for running speech experiments

JsPsych - https://www.jspsych.org
A JavaScript library for running behavioral experiments through a web-browser. (de Leeuw 2015)
LMEDS - https://github.com/timmahrt/LMEDS
Language Markup and Experimental Design Software, for running experiments over the internet. (Mahrt 2016)
PsychoPy - https://www.psychopy.org/online/
An open-source package for building and running experiments online using Python. (Pierce et al. 2019)

[ Return to top ]

Additional online sources for speech recordings

A huge amount of data of potential interest to sociophonetics is online. Here are some examples of possible sources:
- Accents & Dialects collections at the British Library - https://sounds.bl.uk/Accents-and-dialects/
- Alaska Native Language Archive at University of Alaska Fairbanks - https://www.uaf.edu/anla/
- Archive of the Indigenous Languages of Latina America - https://ailla.utexas.org
- CLARIN's collection of Spoken Corpora - https://www.clarin.eu/content/spoken-corpora-0
- Endangered Language Archive (ELAR) at SOAS - https://www.soas.ac.uk/elar/
- IFA Spoken Language Corpus of Dutch - https://www.fon.hum.uva.nl/IFA-SpokenLanguageCorpora/IFAcorpus/
- IDEA: International Dialects of English Archive - https://www.dialectsarchive.com/
- Kaipuleohone Language Archive at University of Hawai'i - http://ling.hawaii.edu/kaipuleohone-language-archive/
- Linguistic Atlas Project - http://www.lap.uga.edu
  - Including the Digital Archive of Southern Speech (DASS) - http://www.lap.uga.edu/Site/DASS.html
- OLAC: Open Language Archives Community - http://olac.ldc.upenn.edu
- PARADESIC - https://www.paradisec.org.au
- Sound Comparisons: Exploring Diversity in Phonetics across Language Families - https://soundcomparisons.com
- The Speech Accent Archive at George Mason University - http://accent.gmu.edu
- SpeechBox at Northwestern University - https://speechbox.linguistics.northwestern.edu/
- TalkBank - https://talkbank.org (in particular see https://ca.talkbank.org/access/ for conversational recordings and sociolinguistic corpora)
- University of British Columbia Library's guide to Linguistic Corpora - https://guides.library.ubc.ca/c.php?g=306932&p=2051153
- UCLA Phonetics Lab Archive - http://archive.phonetics.ucla.edu

[ Return to top ]