Navigating an AlignedTextGrid

This documentation covers reading in the output from the Montreal Forced Aligner using the Word and Phone classes from aligned_textgrid, but everything will generalize to custom classes.

from aligned_textgrid import AlignedTextGrid
from aligned_textgrid import Word, Phone

Reading in a TextGrid

To read in a one-speaker TextGrid, either give AlignedTextGrid() the path to the file, or a textgrid that has already been read in with praatio.textgrid.openTextgrid().

You also need to specify the sequence classes of each tier in the order they appear. For MFA output, the top tier is Word and the bottom tier is Phone, but if these were reversed, you would have to pass [Phone, Word] to entry_classes. The information about which class is the superset and which is the subset is encoded in the class information, and is automatically handled.

one_speaker = AlignedTextGrid(
    textgrid_path = "../resources/josef-fruehwald_speaker.TextGrid", 
    entry_classes = [Word, Phone]
)

With a two or more speaker TextGrid, you can either pass entry_classes a single list of interval classes to re-use with each speaker (for example [Word, Phone]), or an explicit list of nested classes (for example, [[Word, Phone], [Word, Phone]]).

two_speaker = AlignedTextGrid(
    textgrid_path = "../resources/KY25A_1.TextGrid",
    entry_classes= [Word, Phone]
)

If you have a textgrid a mixture of sequence hierarchies, you have to read it in with then fully nested list of classes.

from aligned_textgrid import custom_classes
Turn = custom_classes("Turn")

multi_hierarchy = AlignedTextGrid(
    textgrid_path = "../resources/KY25A_1_multi.TextGrid",
    entry_classes = [[Word, Phone], [Turn], [Word, Phone], [Turn]]
)

print(multi_hierarchy)
AlignedTextGrid with 4 groups, each with [2, 1, 2, 1] tiers. [['Word', 'Phone'], ['Turn'], ['Word', 'Phone'], ['Turn']]

Get interval at time

The “Get interval at time” functionality from Praat has been implemented for each level of TextGrid representation.

speaker_one = two_speaker[0]
speaker_one_word = speaker_one[0]
speaker_one_word.get_interval_at_time(11)
1

This is the index for the word that appears at 11 seconds.

speaker_one.get_intervals_at_time(11)
[1, 2]

These are the indices for the word and phone tiers that are at 11 seconds.

two_speaker.get_intervals_at_time(11)
[[1, 2], [39, 96]]
two_speaker.get_intervals_at_time(11)
[[1, 2], [39, 96]]

These are the indices for the word and phone tiers for both speakers at 11 seconds.

Nested indexing

You can use the nested indices returned by .get_intervals_at_time() to get the actual sequence intervals as well.

eleven_seconds = two_speaker.get_intervals_at_time(11)
two_speaker[eleven_seconds]
[[Class Word, label: yeah, .superset_class: Top_wp, .super_instance, None, .subset_class: Phone, .subset_list: ['Y', 'AE1'],
  Class Phone, label: AE1, .superset_class: Word, .super_instance: yeah, .subset_class: Bottom_wp],
 [Class Word, label: after, .superset_class: Top_wp, .super_instance, None, .subset_class: Phone, .subset_list: ['AE1', 'F', 'T', 'ER0'],
  Class Phone, label: F, .superset_class: Word, .super_instance: after, .subset_class: Bottom_wp]]

Reuse

GPLv3