CORAAL: Corpus of Regional African American Language
CORAAL Explorer: Browse | Search[ Back to ORAAL | LingTools @ UO | LVC Lab @ UO ]

CORAAL Example Code, Analyses, Etc.

This page collects some example code, derived data, and other resources related to CORAAL. We hope these are useful for research and educational purposes. (Note that other derivatives from the main CORAAL data are also available elsewhere on the website, e.g. MFA aligned versions of many of the transcripts.)

  1. CORAAL_web.R [ CORAAL_web.R ]
    A suite of basic functions for the R program language are available. These allow you to download and easily manipulate the transcripts and metadata directly in R. The code provides much of the functionality behind the CORAAL Explorer Search features and also undergirds many of the other examples available on this page.
    The code can be loaded directly in R by executing: source('http://lingtools.uoregon.edu/coraal/explorer/R/CORAAL_web.R')

  2. Support Vector Machines as a Sociolinguistic Tool [ Presentation Slides | R code ]
    Tyler Kendall's portion of the Computational Sociolinguistics Workshop at New Ways of Analyzing Variation (NWAV) 47, 2018.
    Some R code that uses CORAAL to demonstrate using SVMs for classification tasks, along with talk slides about SVMs as tools for sociolinguists.
    From: Grieve, Jack, Dirk Hovy, David Jurgens, Tyler Kendall, Dong Nguyen, James Stanford, Meghan Sumner, and Rachael Tatman. 2018. Workshop: Computational Sociolinguistics. New Ways of Analyzing Variation (NWAV) 47: New York, NY. October. [ https://osf.io/96st3/ ]

  3. Code and data for automatically coding variable (ING) [ Paper @ Frontiers ]
    As a part of the paper "Considering Performance in the Automated and Manual Coding of Sociolinguistic Variables: Lessons from Variable (ING)", the authors provide extensive R code and data from CORAAL.
    Supplemental files are available from the sidebar in the paper. All can be downloaded through this link: https://ndownloader.figstatic.com/collections/5407596/versions/1.
    Citation: Kendall, Tyler, Charlotte Vaughn, Charlie Farrington, Chloe Tacata, Jaidan McLean, Shelby Arnson, and Kaylynn Gunter. 2021. Considering performance in the automated and manual coding of sociolinguistic variables: Lessons from variable (ING). Frontiers in Artifical Intelligence: Language and Computation, vol 4. [ https://doi.org/10.3389/frai.2021.648543 ]

  4. English syllable counter function in R, for CORAAL [ english_syllable_counter-coraal.R ]
    A slightly modified version of the syllable counting function from Kendall (2013) Speech Rate, Pause, and Sociolinguistic Variation: Studies in Corpus Sociophonetics that deals gracefully with CORAAL's redaction codes in the transcripts.
    Based on: Kendall, Tyler. 2013. Speech Rate, Pause, and Sociolinguistic Variation: Studies in Corpus Sociophonetics. Basingstoke, UK: Palgrave Macmillan.
    The function can be loaded directly in R by executing: source('http://lingtools.uoregon.edu/coraal/explorer/R/english_syllable_counter-coraal.R')

  5. Explore CORAAL speech rate and pause in R [ ExploreCORAALSpeechRatePause_HbkofSociophonetics.R ]
    Tyler Kendall's forthcoming contribution "Sociophonetics and Speech Rate and Pause" for the Routledge Handbook of Sociophonetics includes a small empirical study of some aspects of speech rate and pause in CORAAL. Code to replicate that analysis, and to explore aspects of CORAAL using R, is available here. (This uses the english_syllable_counter-coraal.R function.) Hopefully this might also serve useful as an example of using the CORAAL_web.R suite to work with CORAAL data.
    Citation: Kendall, Tyler. forthcoming. Sociophonetics and speech rate and pause. In Chris Strelluf (ed.), The Routledge Handbook of Sociophonetics. New York: Routledge.

  6. Explore CORAAL between-speaker intervals in R [ Presentation Slides | ExploreCORAALBetweenSpeakerIntervals_KendallNWAV2022.R ]
    Code to extract between-speaker intervals (BSIs; gaps and overlaps between speaker turns) from CORAAL and to replicate visualizations and statistical models in the talk. (This uses the english_syllable_counter-coraal.R function.) Slides also available here.
    Citation: Kendall, Tyler. 2022. Interturn pausing, overlaps, and the co-construction of linguistic variation. Paper presented at NWAV 50. San Jose, CA. October 15, 2022.

  7. R (cop/aux) modeling examples [ Data File (.txt) | R code ]
    Tyler Kendall's forthcoming chapter "Quantitatively analyzing variation and change" for the Handbook of Variationist Sociolinguistics includes a series of examples of different quantitative analyses of a set of copula and auxiliary absence data from CORAAL. The hand-coded data (2,500 tokens of (cop/aux) data) and code to replicate th analyses and figures in the chapter is available here.
    Citation: Kendall, Tyler. forthcoming. Quantitatively analyzing variation and change. In Paul Kerswill, Yoshiyuki Asahi, and Alexandra D'Arcy (eds.), Handbook of Variationist Sociolinguistics. London: Routledge.

Creative Commons License CORAAL is completely free for research use. It is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike (4.0) International license.

T. Kendall November 2022