Phrase Creation

Create a phrase tier by fusing word sequences.
Author

Josef Fruehwald

Published

June 26, 2024

When working with force-aligned TextGrid with a Word and Phone tier, you can also add Phrase tier.

from aligned_textgrid import AlignedTextGrid, custom_classes

atg = AlignedTextGrid(
    "resources/josef-fruehwald_speaker.TextGrid", 
    entry_classes = custom_classes(["Word", "Phone"])
)

Interleave a new Phrase tier

First, we need to interleave a new tier class above Word, copying its timing and labels.

atg.interleave_class(
    name = "Phrase",
    above = "Word",
    timing_from = "below",
    copy_labels = True
)

print(atg)
AlignedTextGrid with 1 groups named ['group_0'] each with [3] tiers. [['Phrase', 'Word', 'Phone']]

Iterate through phrase and fuse

We need to define a function that will take an existing phrase label and add on an incoming label.

def make_phrase_label(a_label, b_label):
    if len(b_label) > 0:
        a_label = f"{a_label} {b_label}"
    
    return a_label

The fuse_rightwards() method will fuse the following interval to the current interval and pop the following interval from the tier. Therfore, we don’t want to use a for-loop.

Instead, we’ll use a while loop, which will end when we reach the end of the Phrase tier. We’ll update the interval we are fusing with when

  • Its current interval label is “” (or a pause)
  • The following interval label is “” and longer than 220 ms.

The continue keyword under the if statements bumps us back to the top of the while loop, which will check to see if we’re at the end of the Phrase tier.

this_interval = atg[0].Phrase.first

while this_interval is not atg[0].Phrase.last:

    if this_interval.label == "":
        this_interval = this_interval.fol
        continue
    
    following_long_pause = (
        this_interval.fol.label == ""
        and
        this_interval.fol.duration >= 0.220
    )

    if following_long_pause:
        this_interval = this_interval.fol
        continue

    this_interval.fuse_rightwards(
        label_fun = make_phrase_label
    )
1
Manually begin at the first interval.
2
The value of .last is dynamically updated, so this is safe.
3
If we are currently in a pause interval, move to the next interval.
4
Get a True or False if the next interval is a pause equal to or greater than 220ms.
5
If the following interval is a long pause, update this_interval to be the following interval. The previous if statement will keep bumping us along until we get to a non-pause interval.
6
If neither of the previous if statements were triggered, we fuse this_interval with the following interval.

We can check on the results.

for phrase in atg[0].Phrase[0:10]:
    print(phrase.label)

when the sunlight strikes raindrops in the air they act like a prism and formza rainbow

the rainbow is a division of white light into many beautiful colors

these take the shape of a long round arch

with its path high above and its two ends apparently beyond the horizon

there is according to legend a boiling pot of gold at one end

And just for clarity, each non-pause word is now a subset member of a phrase interval.

(
    atg[0].Word[1].label,
    atg[0].Word[1].within.label
)
('when',
 'when the sunlight strikes raindrops in the air they act like a prism and formza rainbow')

More ideas

We can also, for example, get a list of the duration of pauses that occur within a phrase.

import numpy as np

in_phrase_pauses = [
    interval
    for interval in atg[0].Word
    if interval.label == ""
    if interval.within.label != ""
]

pause_durs = np.array([
    interval.duration
    for interval in in_phrase_pauses
])

pause_durs
array([0.16, 0.11, 0.03, 0.03, 0.03, 0.22, 0.04, 0.04, 0.15, 0.08, 0.06,
       0.04, 0.06, 0.03, 0.03, 0.12, 0.03, 0.03, 0.05, 0.04, 0.14, 0.06,
       0.21, 0.05, 0.03, 0.03, 0.08, 0.03, 0.04, 0.06, 0.14, 0.03, 0.03,
       0.03, 0.03])

Session Info

Code
import sys
import aligned_textgrid

print(
    (
        f"Python version: {sys.version}\n"
        f"aligned-textgrid version: {aligned_textgrid.__version__}"
    )
)
Python version: 3.11.9 (main, May  9 2024, 14:13:20) [GCC 11.4.0]
aligned-textgrid version: 0.7.4

Reuse

GPLv3