../my_corpus
├── speaker1.TextGrid
├── speaker1.wav
├── speaker2.TextGrid
└── speaker2.wav
Adding Speaker Demographics
We’ve tried to make adding speaker demographics to fave-extract
output as flexible as possible, including
File Formats
Excel or CSV files
To ensure demographic information in a an .xlsx
or .csv
file is correctly included in fave-extract output two columns are required:
file_name
: The file stem of the wav and textgrid filesspeaker_num
: The speaker to be analyzed in a file. the first speaker is1
.
So, if you had a corpus that looked like this:
Your excel file or csv file would have to look something like this:
file_name | speaker_num | age |
---|---|---|
speaker1 | 1 | 26 |
speaker2 | 1 | 50 |
speaker2 | 2 | 23 |
If a speaker demographics file is provided, fave-extract will only process data for speakers with entries.
YAML file
Another option for formatting speaker demographic information is in a yaml file. Yaml is a very flexible data structuring format. For this corpus:
../my_corpus
├── speaker1.TextGrid
├── speaker1.wav
├── speaker2.TextGrid
└── speaker2.wav
An speaker demographics yaml file would look like
# yaml
- file_name: speaker1
speaker_num: 1
age: 26
- file_name: speaker3
speaker_num: 1
age: 50
- file_name: speaker1
speaker_num: 1
age: 23
The file_name
and speaker_num
fields are required.
Outside of the required fields
Not every speaker has to have the same fields defined.
The fields don’t need to appear in a consistent order.
Legacy-fave speaker file
If you have legacy-fave .speaker
files, you can pass them to the --speakers
option.
Usage
All three fave-extract subcommands support passing of demographic files.
fave-extract corpus my_corpus/ --speakers demographics.csv