from aligned_textgrid import AlignedTextGrid, custom_classes
= AlignedTextGrid(
atg "resources/josef-fruehwald_speaker.TextGrid",
= custom_classes(["Word", "Phone"])
entry_classes )
Phrase Creation
When working with force-aligned TextGrid with a Word and Phone tier, you can also add Phrase tier.
Interleave a new Phrase tier
First, we need to interleave a new tier class above Word, copying its timing and labels.
atg.interleave_class(= "Phrase",
name = "Word",
above = "below",
timing_from = True
copy_labels
)
print(atg)
AlignedTextGrid with 1 groups named ['group_0'] each with [3] tiers. [['Phrase', 'Word', 'Phone']]
Iterate through phrase and fuse
We need to define a function that will take an existing phrase label and add on an incoming label.
def make_phrase_label(a_label, b_label):
if len(b_label) > 0:
= f"{a_label} {b_label}"
a_label
return a_label
The fuse_rightwards()
method will fuse the following interval to the current interval and pop the following interval from the tier. Therfore, we don’t want to use a for
-loop.
Instead, we’ll use a while
loop, which will end when we reach the end of the Phrase
tier. We’ll update the interval we are fusing with when
- Its current interval label is “” (or a pause)
- The following interval label is “” and longer than 220 ms.
The continue
keyword under the if
statements bumps us back to the top of the while
loop, which will check to see if we’re at the end of the Phrase tier.
= atg[0].Phrase.first
this_interval
while this_interval is not atg[0].Phrase.last:
if this_interval.label == "":
= this_interval.fol
this_interval continue
= (
following_long_pause == ""
this_interval.fol.label and
>= 0.220
this_interval.fol.duration
)
if following_long_pause:
= this_interval.fol
this_interval continue
this_interval.fuse_rightwards(= make_phrase_label
label_fun )
- 1
- Manually begin at the first interval.
- 2
-
The value of
.last
is dynamically updated, so this is safe. - 3
- If we are currently in a pause interval, move to the next interval.
- 4
-
Get a
True
orFalse
if the next interval is a pause equal to or greater than 220ms. - 5
-
If the following interval is a long pause, update
this_interval
to be the following interval. The previousif
statement will keep bumping us along until we get to a non-pause interval. - 6
-
If neither of the previous
if
statements were triggered, we fusethis_interval
with the following interval.
We can check on the results.
for phrase in atg[0].Phrase[0:10]:
print(phrase.label)
when the sunlight strikes raindrops in the air they act like a prism and formza rainbow
the rainbow is a division of white light into many beautiful colors
these take the shape of a long round arch
with its path high above and its two ends apparently beyond the horizon
there is according to legend a boiling pot of gold at one end
And just for clarity, each non-pause word is now a subset member of a phrase interval.
(0].Word[1].label,
atg[0].Word[1].within.label
atg[ )
('when',
'when the sunlight strikes raindrops in the air they act like a prism and formza rainbow')
More ideas
We can also, for example, get a list of the duration of pauses that occur within a phrase.
import numpy as np
= [
in_phrase_pauses
intervalfor interval in atg[0].Word
if interval.label == ""
if interval.within.label != ""
]
= np.array([
pause_durs
interval.durationfor interval in in_phrase_pauses
])
pause_durs
array([0.16, 0.11, 0.03, 0.03, 0.03, 0.22, 0.04, 0.04, 0.15, 0.08, 0.06,
0.04, 0.06, 0.03, 0.03, 0.12, 0.03, 0.03, 0.05, 0.04, 0.14, 0.06,
0.21, 0.05, 0.03, 0.03, 0.08, 0.03, 0.04, 0.06, 0.14, 0.03, 0.03,
0.03, 0.03])
Session Info
Code
import sys
import aligned_textgrid
print(
(f"Python version: {sys.version}\n"
f"aligned-textgrid version: {aligned_textgrid.__version__}"
) )
Python version: 3.11.10 (main, Sep 9 2024, 03:20:25) [GCC 11.4.0]
aligned-textgrid version: 0.7.4