You can ouput and save any given aligned_textgrid as a polars dataframe with the to_df() function.
Outputting as a data frame
import polars as plfrom aligned_textgrid import AlignedTextGrid, Word, Phone,\ to_dftg = AlignedTextGrid( textgrid_path="../resources/josef-fruehwald_speaker.TextGrid", entry_classes=[Word, Phone])
Single Intervals
Bottom of the hierarchy
If you pass a single interval from the bottom of the sequence hierarchy, you’ll get back a fairly minimal dataframe with the start and end times, the label, and an ID for the interval.
If you pass to_df() an interval from higher up in the hierarchy, by default it will output its data, as well as the data for every interval below it in the hierarchy, concatenated horizontally.
However, if you want just a simplified, single row output for an interval, regardless of its location within the hierarchy, pass to_df(..., with_subset = False).
If you pass a tier to to_df(), it will output a dataframe for ever interval in the tier concatenated vertically. By default, this means intervals high in the hierarchy will have their rows repeated for every interval they contain, but if you want one row per interval in the output, you can pass to_df(..., with_subset = False).
tier_df1 = to_df(tg[0].Word)tier_df1.shape
(1191, 10)
tier_df1.head(10)
shape: (10, 10)
Word_id
Word_tier_index
Word_label
Word_start
Word_end
Phone_id
Phone_tier_index
Phone_label
Phone_start
Phone_end
str
i64
str
f64
f64
str
i64
str
f64
f64
"0-0-0"
0
""
0.0
0.11
"0-0-0-0"
0
""
0.0
0.11
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-0"
1
"HH"
0.11
1.97
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-1"
2
"W"
1.97
2.09
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-2"
3
"EH1"
2.09
2.13
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-3"
4
"N"
2.13
2.2
"0-0-2"
2
"the"
2.2
2.26
"0-0-2-0"
5
"DH"
2.2
2.22
"0-0-2"
2
"the"
2.2
2.26
"0-0-2-1"
6
"AH0"
2.22
2.26
"0-0-3"
3
"sunlight"
2.26
2.72
"0-0-3-0"
7
"S"
2.26
2.39
"0-0-3"
3
"sunlight"
2.26
2.72
"0-0-3-1"
8
"AH1"
2.39
2.44
"0-0-3"
3
"sunlight"
2.26
2.72
"0-0-3-2"
9
"N"
2.44
2.52
# 1 row per intervaltier_df2 = to_df(tg[0].Word, with_subset=False)tier_df2.shape
(377, 6)
tier_df2.head(10)
shape: (10, 6)
id
tier_index
label
start
end
entry_class
str
i64
str
f64
f64
str
"0-0-0"
0
""
0.0
0.11
"Word"
"0-0-1"
1
"when"
0.11
2.2
"Word"
"0-0-2"
2
"the"
2.2
2.26
"Word"
"0-0-3"
3
"sunlight"
2.26
2.72
"Word"
"0-0-4"
4
"strikes"
2.72
3.22
"Word"
"0-0-5"
5
"raindrops"
3.22
3.79
"Word"
"0-0-6"
6
"in"
3.79
3.89
"Word"
"0-0-7"
7
"the"
3.89
4.02
"Word"
"0-0-8"
8
"air"
4.02
4.45
"Word"
"0-0-9"
9
""
4.45
4.61
"Word"
TierGroups and TextGrids
The behavior for TierGroups and TextGrids are similar. By default, the to_df() function will either return a dataframe representing the entire hierarchy structure, or will return one row for each interval in the TextGrid.
full_df1 = to_df(tg)full_df1.shape
(1191, 10)
full_df1.head(10)
shape: (10, 10)
Word_id
Word_tier_index
Word_label
Word_start
Word_end
Phone_id
Phone_tier_index
Phone_label
Phone_start
Phone_end
str
i64
str
f64
f64
str
i64
str
f64
f64
"0-0-0"
0
""
0.0
0.11
"0-0-0-0"
0
""
0.0
0.11
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-0"
1
"HH"
0.11
1.97
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-1"
2
"W"
1.97
2.09
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-2"
3
"EH1"
2.09
2.13
"0-0-1"
1
"when"
0.11
2.2
"0-0-1-3"
4
"N"
2.13
2.2
"0-0-2"
2
"the"
2.2
2.26
"0-0-2-0"
5
"DH"
2.2
2.22
"0-0-2"
2
"the"
2.2
2.26
"0-0-2-1"
6
"AH0"
2.22
2.26
"0-0-3"
3
"sunlight"
2.26
2.72
"0-0-3-0"
7
"S"
2.26
2.39
"0-0-3"
3
"sunlight"
2.26
2.72
"0-0-3-1"
8
"AH1"
2.39
2.44
"0-0-3"
3
"sunlight"
2.26
2.72
"0-0-3-2"
9
"N"
2.44
2.52
# 1 row per intervalfull_df2 = to_df(tg, with_subset=False)full_df2.shape
(1568, 6)
full_df2.head(5)
shape: (5, 6)
id
tier_index
label
start
end
entry_class
str
i64
str
f64
f64
str
"0-0-0"
0
""
0.0
0.11
"Word"
"0-0-1"
1
"when"
0.11
2.2
"Word"
"0-0-2"
2
"the"
2.2
2.26
"Word"
"0-0-3"
3
"sunlight"
2.26
2.72
"Word"
"0-0-4"
4
"strikes"
2.72
3.22
"Word"
full_df2.tail(5)
shape: (5, 6)
id
tier_index
label
start
end
entry_class
str
i64
str
f64
f64
str
"0-0-374-1"
1186
"R"
111.83
111.92
"Phone"
"0-0-375-0"
1187
"B"
111.92
112.02
"Phone"
"0-0-375-1"
1188
"L"
112.02
112.08
"Phone"
"0-0-375-2"
1189
"UW1"
112.08
112.31
"Phone"
"0-0-376-0"
1190
""
112.31
115.065034
"Phone"
Saving a DataFrame
To save one of these dataframes, use one of the methods from polars, like DataFrame.write_csv()