annotations.SearchIPUs package

Submodules

annotations.SearchIPUs.searchipus module

filename

sppas.src.annotations.SearchIPUs.searchipus.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Silences vs sounding segments segmentation.

class annotations.SearchIPUs.searchipus.SearchIPUs(channel, win_len=0.02)[source]

Bases: annotations.SearchIPUs.silences.sppasSilences

An automatic silence/tracks segmentation system.

Silence/tracks segmentation aims at finding IPUs. IPUs - Inter-Pausal Units are blocks of speech bounded by silent pauses of more than X ms, and time-aligned on the speech signal.

DEFAULT_MIN_IPU_DUR = 0.3
DEFAULT_MIN_SIL_DUR = 0.25
DEFAULT_SHIFT_END = 0.02
DEFAULT_SHIFT_START = 0.02
DEFAULT_VOL_THRESHOLD = 0
MIN_IPU_DUR = 0.06
MIN_SIL_DUR = 0.06
__init__(channel, win_len=0.02)[source]

Create a new SearchIPUs instance.

Parameters

channel – (sppasChannel)

get_channel()[source]

Return the channel.

get_effective_threshold()[source]

Return the threshold volume estimated automatically to search for silences.

get_min_ipu_dur()[source]

Return the minimum duration of a track.

get_min_sil_dur()[source]

Return the minimum duration of a silence.

get_rms_stats()[source]

Return min, max, mean, median, stdev of the RMS.

get_shift_end()[source]
get_shift_start()[source]
get_track_data(tracks)[source]

Return the audio data of tracks.

Parameters

tracks – List of tracks. A track is a tuple (start, end).

Returns

List of audio data

get_tracks(time_domain=False)[source]

Return a list of tuples (from,to) of tracks.

(from,to) values are converted, or not, into the time-domain.

The tracks are found from the current list of silences, which is firstly filtered with the min_sil_dur.

This methods requires the following members to be fixed:
  • the volume threshold

  • the minimum duration for a silence,

  • the minimum duration for a track,

  • the duration to remove to the start boundary,

  • the duration to add to the end boundary.

Parameters

time_domain – (bool) Convert from/to values in seconds

Returns

(list of tuples) with (from,to) of the tracks

get_vol_threshold()[source]

Return the initial volume threshold used to search for silences.

get_win_length()[source]

Return the windows length used to estimate the RMS.

min_channel_duration()[source]

Return the minimum duration we expect for a channel.

set_min_ipu(min_ipu_dur)[source]

Fix the default minimum duration of an IPU.

Parameters

min_ipu_dur – (float) Duration in seconds.

set_min_sil(min_sil_dur)[source]

Fix the default minimum duration of a silence.

Parameters

min_sil_dur – (float) Duration in seconds.

set_shift_end(s)[source]

Fix the default minimum boundary shift value.

Parameters

s – (float) Duration in seconds.

set_shift_start(s)[source]

Fix the default minimum boundary shift value.

Parameters

s – (float) Duration in seconds.

set_vol_threshold(vol_threshold)[source]

Fix the default minimum volume value to find silences.

It won’t affect the current list of silence values. Use search_sil().

Parameters

vol_threshold – (int) RMS value

set_win_length(w)[source]

Set a new length of window for a estimation or volume values.

TAKE CARE: it cancels any previous estimation of volume and silence search.

Parameters

w – (float) between 0.01 and 0.04.

annotations.SearchIPUs.silences module

filename

sppas.src.annotations.SearchIPUs.silences.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

search for silences in a channel.

class annotations.SearchIPUs.silences.sppasSilences(channel, win_len=0.02, vagueness=0.005)[source]

Bases: object

Silence search on a channel of an audio file.

Silences are stored in a list of (from_pos,to_pos) values, indicating the frame from which the silences are beginning and ending respectively.

__init__(channel, win_len=0.02, vagueness=0.005)[source]

Create a sppasSilences instance.

Parameters
  • channel – (sppasChannel) the input channel

  • win_len – (float) duration of a window

  • vagueness – (float) Windows length to estimate the boundaries.

Maximum value of vagueness is win_len. The duration of a window (win_len) is relevant for the estimation of the rms values.

Radius (see sppasPoint) is the 2*vagueness of the boundaries.

extract_tracks(min_track_dur, shift_dur_start, shift_dur_end)[source]

Return the tracks, deduced from the silences and track constrains.

Parameters
  • min_track_dur – (float) The minimum duration for a track

  • shift_dur_start – (float) The time to remove to the start bound

  • shift_dur_end – (float) The time to add to the end boundary

Returns

list of tuples (from_pos,to_pos)

Duration is in seconds.

filter_silences(threshold, min_sil_dur=0.2)[source]

Filter the current silences.

Parameters

threshold – (int) Expected minimum volume (rms value)

If threshold is set to 0, search_minvol() will assign a value. :param min_sil_dur: (float) Minimum silence duration in seconds :returns: Number of silences with the expected minimum duration

filter_silences_from_tracks(min_track_dur=0.6)[source]

Filter the given silences to remove very small tracks.

Parameters

min_track_dur – (float) Minimum duration of a track

Returns

filtered silences

fix_threshold_vol()[source]

Fix the threshold for tracks/silences segmentation.

This is an observation of the distribution of rms values.

Returns

(int) volume value

get_vagueness()[source]

Get the vagueness value (=2*radius).

get_volstats()[source]

Return the sppasChannelVolume() estimated on the channel.

reset_silences()[source]

Reset silences to an empty list.

search_silences(threshold=0)[source]

Search windows with a volume lesser than a given threshold.

This is then a search for silences. All windows with a volume higher than the threshold are considered as tracks and not included in the result. Block of silences lesser than min_sil_dur are also considered tracks.

Parameters

threshold – (int) Expected minimum volume (rms value)

If threshold is set to 0, search_minvol() will assign a value. :returns: threshold

set_channel(channel)[source]

Set a channel, then reset all previous results.

Parameters

channel – (sppasChannel)

set_silences(silences)[source]

Fix manually silences.

To be use carefully!

Parameters

silences – (list of tuples (start_pos, end_pos))

set_vagueness(vagueness)[source]

Windows length to estimate the boundaries.

Parameters

vagueness – (float) Maximum value of radius is win_len.

track_data(tracks)[source]

Yield the track data: a set of frames for each track.

Parameters

tracks – (list of tuples) List of (from_pos,to_pos)

annotations.SearchIPUs.sppassearchipus module

filename

sppas.src.annotations.SearchIPUs.sppassearchipus.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

SPPAS integration of Search for IPUs automatic annotation

class annotations.SearchIPUs.sppassearchipus.sppasSearchIPUs(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the IPUs detection.

__init__(log=None)[source]

Create a new sppasSearchIPUs instance.

Parameters

log – (sppasLog) Human-readable logs.

convert(channel)[source]

Search for IPUs in the given channel.

Parameters

channel – (sppasChannel) Input channel

Returns

(sppasTier)

fix_options(options)[source]

Fix all options.

Available options are:

  • threshold: volume threshold to decide a window is silence or not

  • win_length: length of window for a estimation or volume values

  • min_sil: minimum duration of a silence

  • min_ipu: minimum duration of an ipu

  • shift_start: start boundary shift value.

  • shift_end: end boundary shift value.

Parameters

options – (sppasOption)

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_min_ipu()[source]
get_min_sil()[source]
get_shift_end()[source]
get_shift_start()[source]
get_threshold()[source]
get_win_length()[source]
run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Audio

  • output – (str) the output file name

Returns

(sppasTranscription)

run_for_batch_processing(input_files)[source]

Perform the annotation on a file.

This method is called by ‘batch_processing’. It fixes the name of the output file. If the output file is already existing, the annotation is cancelled (the file won’t be overridden). If not, it calls the run method.

Parameters

input_files – (list of str) the inputs to perform a run

Returns

output file name or None

set_min_ipu(value)[source]

Fix the default minimum duration of an IPU.

Parameters

value – (float) Duration in seconds.

set_min_sil(value)[source]

Fix the default minimum duration of a silence.

Parameters

value – (float) Duration in seconds.

set_shift_end(value)[source]

Fix the end boundary shift value.

Parameters

value – (float) Duration in seconds.

set_shift_start(value)[source]

Fix the start boundary shift value.

Parameters

value – (float) Duration in seconds.

set_threshold(value)[source]

Fix the threshold volume.

Parameters

value – (int) RMS value used as volume threshold

set_win_length(value)[source]

Set a new length of window for a estimation or volume values.

TAKE CARE: it cancels any previous estimation of volume and silence search.

Parameters

value – (float) generally between 0.01 and 0.04 seconds.

static tracks_to_tier(tracks, end_time, vagueness)[source]

Create a sppasTier object from tracks.

Parameters
  • tracks – (List of tuple) with (from, to) values in seconds

  • end_time – (float) End-time of the tier

  • vagueness – (float) vagueness used for silence search

Module contents

filename

sppas.src.annotations.SearchIPUs.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Search for Inter-Pausal Units in an audio file.

class annotations.SearchIPUs.SearchIPUs(channel, win_len=0.02)[source]

Bases: annotations.SearchIPUs.silences.sppasSilences

An automatic silence/tracks segmentation system.

Silence/tracks segmentation aims at finding IPUs. IPUs - Inter-Pausal Units are blocks of speech bounded by silent pauses of more than X ms, and time-aligned on the speech signal.

DEFAULT_MIN_IPU_DUR = 0.3
DEFAULT_MIN_SIL_DUR = 0.25
DEFAULT_SHIFT_END = 0.02
DEFAULT_SHIFT_START = 0.02
DEFAULT_VOL_THRESHOLD = 0
MIN_IPU_DUR = 0.06
MIN_SIL_DUR = 0.06
__init__(channel, win_len=0.02)[source]

Create a new SearchIPUs instance.

Parameters

channel – (sppasChannel)

get_channel()[source]

Return the channel.

get_effective_threshold()[source]

Return the threshold volume estimated automatically to search for silences.

get_min_ipu_dur()[source]

Return the minimum duration of a track.

get_min_sil_dur()[source]

Return the minimum duration of a silence.

get_rms_stats()[source]

Return min, max, mean, median, stdev of the RMS.

get_shift_end()[source]
get_shift_start()[source]
get_track_data(tracks)[source]

Return the audio data of tracks.

Parameters

tracks – List of tracks. A track is a tuple (start, end).

Returns

List of audio data

get_tracks(time_domain=False)[source]

Return a list of tuples (from,to) of tracks.

(from,to) values are converted, or not, into the time-domain.

The tracks are found from the current list of silences, which is firstly filtered with the min_sil_dur.

This methods requires the following members to be fixed:
  • the volume threshold

  • the minimum duration for a silence,

  • the minimum duration for a track,

  • the duration to remove to the start boundary,

  • the duration to add to the end boundary.

Parameters

time_domain – (bool) Convert from/to values in seconds

Returns

(list of tuples) with (from,to) of the tracks

get_vol_threshold()[source]

Return the initial volume threshold used to search for silences.

get_win_length()[source]

Return the windows length used to estimate the RMS.

min_channel_duration()[source]

Return the minimum duration we expect for a channel.

set_min_ipu(min_ipu_dur)[source]

Fix the default minimum duration of an IPU.

Parameters

min_ipu_dur – (float) Duration in seconds.

set_min_sil(min_sil_dur)[source]

Fix the default minimum duration of a silence.

Parameters

min_sil_dur – (float) Duration in seconds.

set_shift_end(s)[source]

Fix the default minimum boundary shift value.

Parameters

s – (float) Duration in seconds.

set_shift_start(s)[source]

Fix the default minimum boundary shift value.

Parameters

s – (float) Duration in seconds.

set_vol_threshold(vol_threshold)[source]

Fix the default minimum volume value to find silences.

It won’t affect the current list of silence values. Use search_sil().

Parameters

vol_threshold – (int) RMS value

set_win_length(w)[source]

Set a new length of window for a estimation or volume values.

TAKE CARE: it cancels any previous estimation of volume and silence search.

Parameters

w – (float) between 0.01 and 0.04.

class annotations.SearchIPUs.sppasSearchIPUs(log=None)[source]

Bases: annotations.baseannot.sppasBaseAnnotation

SPPAS integration of the IPUs detection.

__init__(log=None)[source]

Create a new sppasSearchIPUs instance.

Parameters

log – (sppasLog) Human-readable logs.

convert(channel)[source]

Search for IPUs in the given channel.

Parameters

channel – (sppasChannel) Input channel

Returns

(sppasTier)

fix_options(options)[source]

Fix all options.

Available options are:

  • threshold: volume threshold to decide a window is silence or not

  • win_length: length of window for a estimation or volume values

  • min_sil: minimum duration of a silence

  • min_ipu: minimum duration of an ipu

  • shift_start: start boundary shift value.

  • shift_end: end boundary shift value.

Parameters

options – (sppasOption)

static get_input_extensions()[source]

Extensions that the annotation expects for its input filename.

get_min_ipu()[source]
get_min_sil()[source]
get_shift_end()[source]
get_shift_start()[source]
get_threshold()[source]
get_win_length()[source]
run(input_files, output=None)[source]

Run the automatic annotation process on an input.

Parameters
  • input_files – (list of str) Audio

  • output – (str) the output file name

Returns

(sppasTranscription)

run_for_batch_processing(input_files)[source]

Perform the annotation on a file.

This method is called by ‘batch_processing’. It fixes the name of the output file. If the output file is already existing, the annotation is cancelled (the file won’t be overridden). If not, it calls the run method.

Parameters

input_files – (list of str) the inputs to perform a run

Returns

output file name or None

set_min_ipu(value)[source]

Fix the default minimum duration of an IPU.

Parameters

value – (float) Duration in seconds.

set_min_sil(value)[source]

Fix the default minimum duration of a silence.

Parameters

value – (float) Duration in seconds.

set_shift_end(value)[source]

Fix the end boundary shift value.

Parameters

value – (float) Duration in seconds.

set_shift_start(value)[source]

Fix the start boundary shift value.

Parameters

value – (float) Duration in seconds.

set_threshold(value)[source]

Fix the threshold volume.

Parameters

value – (int) RMS value used as volume threshold

set_win_length(value)[source]

Set a new length of window for a estimation or volume values.

TAKE CARE: it cancels any previous estimation of volume and silence search.

Parameters

value – (float) generally between 0.01 and 0.04 seconds.

static tracks_to_tier(tracks, end_time, vagueness)[source]

Create a sppasTier object from tracks.

Parameters
  • tracks – (List of tuple) with (from, to) values in seconds

  • end_time – (float) End-time of the tier

  • vagueness – (float) vagueness used for silence search

class annotations.SearchIPUs.sppasSilences(channel, win_len=0.02, vagueness=0.005)[source]

Bases: object

Silence search on a channel of an audio file.

Silences are stored in a list of (from_pos,to_pos) values, indicating the frame from which the silences are beginning and ending respectively.

__init__(channel, win_len=0.02, vagueness=0.005)[source]

Create a sppasSilences instance.

Parameters
  • channel – (sppasChannel) the input channel

  • win_len – (float) duration of a window

  • vagueness – (float) Windows length to estimate the boundaries.

Maximum value of vagueness is win_len. The duration of a window (win_len) is relevant for the estimation of the rms values.

Radius (see sppasPoint) is the 2*vagueness of the boundaries.

extract_tracks(min_track_dur, shift_dur_start, shift_dur_end)[source]

Return the tracks, deduced from the silences and track constrains.

Parameters
  • min_track_dur – (float) The minimum duration for a track

  • shift_dur_start – (float) The time to remove to the start bound

  • shift_dur_end – (float) The time to add to the end boundary

Returns

list of tuples (from_pos,to_pos)

Duration is in seconds.

filter_silences(threshold, min_sil_dur=0.2)[source]

Filter the current silences.

Parameters

threshold – (int) Expected minimum volume (rms value)

If threshold is set to 0, search_minvol() will assign a value. :param min_sil_dur: (float) Minimum silence duration in seconds :returns: Number of silences with the expected minimum duration

filter_silences_from_tracks(min_track_dur=0.6)[source]

Filter the given silences to remove very small tracks.

Parameters

min_track_dur – (float) Minimum duration of a track

Returns

filtered silences

fix_threshold_vol()[source]

Fix the threshold for tracks/silences segmentation.

This is an observation of the distribution of rms values.

Returns

(int) volume value

get_vagueness()[source]

Get the vagueness value (=2*radius).

get_volstats()[source]

Return the sppasChannelVolume() estimated on the channel.

reset_silences()[source]

Reset silences to an empty list.

search_silences(threshold=0)[source]

Search windows with a volume lesser than a given threshold.

This is then a search for silences. All windows with a volume higher than the threshold are considered as tracks and not included in the result. Block of silences lesser than min_sil_dur are also considered tracks.

Parameters

threshold – (int) Expected minimum volume (rms value)

If threshold is set to 0, search_minvol() will assign a value. :returns: threshold

set_channel(channel)[source]

Set a channel, then reset all previous results.

Parameters

channel – (sppasChannel)

set_silences(silences)[source]

Fix manually silences.

To be use carefully!

Parameters

silences – (list of tuples (start_pos, end_pos))

set_vagueness(vagueness)[source]

Windows length to estimate the boundaries.

Parameters

vagueness – (float) Maximum value of radius is win_len.

track_data(tracks)[source]

Yield the track data: a set of frames for each track.

Parameters

tracks – (list of tuples) List of (from_pos,to_pos)