analysis package

Submodules

analysis.tierfilters module

filename

sppas.src.analysis.tierfilters.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Filter system for annotations of a tier.

class analysis.tierfilters.RelationFilterTier(filters, annot_format=False, fit=False)[source]

Bases: object

This class applies predefined filters on a tier.

Example:

>>> ft = RelationFilterTier((["overlaps", "overlappedby"], [("overlap_min", 0.04)]), fit=False)
>>> res_tier = ft.filter_tier(tier_x, tier_y)
__init__(filters, annot_format=False, fit=False)[source]

Filter process of a tier.

“annot_format” has an impact on the labels of the ann results but “fit” has an impact on their localizations.

Parameters

filters – (tuple) ([list of functions], [list of options])

each option is a tuple with (name, value) :param annot_format: (bool) The annotation result contains the name of the filter (if True) or the original label (if False) :param fit: (bool) The annotation result fits the other tier.

filter_tier(tier, tier_y, out_tiername='Filtered')[source]

Apply the filters on the given tier.

Parameters
  • tier – (sppasTier) The tier to filter annotations

  • tier_y – (sppasTier) The tier to be in relation with

  • out_tiername – (str) Name or the filtered tier

functions = 'rel'
class analysis.tierfilters.SingleFilterTier(filters, annot_format=False, match_all=True)[source]

Bases: object

This class applies predefined filters on a tier.

Apply defined filters, as a list of tuples with:
  • name of the filter: one of “tag”, “loc”, “dur”, “nlab”, “rel”

  • name of the function in sppasCompare (equal, lt, …)

  • value of its expected type (str, float, int, bool)

__init__(filters, annot_format=False, match_all=True)[source]

Filter process of a tier.

Parameters
  • filters – (list) List of tuples (filter, function, [typed values])

  • annot_format – (bool) The annotation result contains the

name of the filter (if True) or the original label (if False) :param match_all: (bool) The annotations must match all the filters (il set to True) or any of them (if set to False)

filter_tier(tier, out_tiername='Filtered')[source]

Apply the filters on the given tier.

Applicable functions are “tag”, “loc” and “dur”.

Parameters
  • tier – (sppasTier)

  • out_tiername – (str) Name or the filtered tier

Returns

sppasTier or None if no annotation is matching

functions = ('tag', 'loc', 'dur', 'nlab')
class analysis.tierfilters.sppasTierFilters(obj)[source]

Bases: sppas.src.structs.basefilters.sppasBaseFilters

This class implements the ‘SPPAS tier filter system’.

Search in tiers. The class sppasTierFilters() allows to apply several types of filter (tag, duration, …), and the class sppasAnnSet() is a data set manager, i.e. it contains the annotations selected by a filter and a string representing the filter.

Create a filter:

>>> f = sppasTierFilters(tier)

then, apply a filter with some pattern like in the following examples. sppasAnnSet() can be combined with operators & and |, like for any other ‘set’ in Python, ‘an unordered collection of distinct hashable objects’.

Example1

extract silences:

>>> f.tag(exact=u('#')))
Example2

extract silences more than 200ms

>>> f.tag(exact=u("#")) & f.dur(gt=0.2)
Example3

find the annotations with at least a label with a tag

starting by “pa” and ending by “a” like “pa”, “papa”, “pasta”, etc:

>>> f.tag(startswith="pa", endswith='a')

It’s equivalent to write:

>>> f.tag(startswith="pa", endswith='a', logic_bool="and")

The classical “and” and “or” logical boolean predicates are accepted; “and” is the default one. It defines whether all the functions must be True (“and”) or any of them (“or”).

The result of the two previous lines of code is the same, but two times faster, compared to use this one:

>>> f.tag(startswith="pa") & f.tag(endswith='a')

In the first case, for each tag, the method applies the logical boolean between two predicates and creates the data set matching the combined condition. In the second case, each call to the method creates a data set matching each individual condition, then the data sets are combined.

Example4

find annotations with more than 1 label

>>> f.nlab(lge=1))
__init__(obj)[source]

Create a sppasTierFilters instance.

Parameters

obj – (sppasTier) The tier to be filtered.

static cast_data(tier, sfilter, entry)[source]

Return an entry into the appropriate type.

Parameters
  • tier – (sppasTier)

  • sfilter – (str) Name of the filter (tag, loc, …)

  • entry – (str) The entry to cast

Returns

typed entry

dur(**kwargs)[source]

Apply functions on durations of the location of annotations.

Parameters

kwargs – logic_bool/any sppasDurationCompare() method.

Returns

(sppasAnnSet)

Examples:
>>> f.dur(ge=0.03) & f.dur(le=0.07)
>>> f.dur(ge=0.03, le=0.07, logic_bool="and")
loc(**kwargs)[source]

Apply functions on localizations of annotations.

Parameters

kwargs – logic_bool/any sppasLocalizationCompare() method.

Returns

(sppasAnnSet)

Example
>>> f.loc(rangefrom=3.01) & f.loc(rangeto=10.07)
>>> f.loc(rangefrom=3.01, rangeto=10.07, logic_bool="and")
nlab(**kwargs)[source]

Apply functions on number of labels in annotations.

Parameters

kwargs – logic_bool/any sppasListCompare() method.

Returns

(sppasAnnSet)

Example
>>> f.nlab(leq=1)
rel(other_tier, *args, **kwargs)[source]

Apply functions of relations between localizations of annotations.

Parameters
  • other_tier – the tier to be in relation with.

  • args – any sppasIntervalCompare() method.

  • kwargs – any option of the methods.

Returns

(sppasAnnSet)

Example
>>> f.rel(other_tier, "equals", "overlaps", "overlappedby",
>>>       overlap_min=0.04, overlapped_min=0.02)

kwargs can be:

  • max_delay=value, used by before, after

  • overlap_min=value, used by overlap,

  • overlapped_min=value, used by overlappedby

  • percent=boolean, used by overlap, overlapped_by to define the overlap_min is a percentage

tag(**kwargs)[source]

Apply functions on all tags of all labels of annotations.

Each argument is made of a function name and its expected value. Each function can be prefixed with ‘not_’, like in the next example.

Example
>>> f.tag(startswith="pa", not_endswith='a', logic_bool="and")
>>> f.tag(startswith="pa") & f.tag(not_endswith='a')
>>> f.tag(startswith="pa") | f.tag(startswith="ta")
Parameters

kwargs – logic_bool/any sppasTagCompare() method.

Returns

(sppasAnnSet)

analysis.tierstats module

filename

sppas.src.analysis.tierstats.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Statistical distribution system for annotations of a tier.

class analysis.tierstats.sppasTierStats(tier=None, n=1, with_radius=0, with_alt=False)[source]

Bases: object

Estimate descriptive statistics of annotations of a tier.

Map a tier into a dictionary where:

  • key is a tag

  • value is the list of observed durations of this tag in annotations

__init__(tier=None, n=1, with_radius=0, with_alt=False)[source]

Create a new TierStats instance.

Parameters
  • tier – (either sppasTier or list of them)

  • n – (int) n-gram value

  • with_radius – (int) 0 to use Midpoint, negative value

to use R-, positive value to use R+ :param with_alt: (bool) Use or not use of alternative labels

ds()[source]

Create a DescriptiveStatistic object for the given tier.

Returns

(DescriptiveStatistic)

get_ngram()[source]

Returns the n-gram value.

get_tier()[source]

Return the tier to estimate stats.

get_with_alt()[source]

Return if alternative labels will be used or not.

get_with_radius()[source]

Returns how to use the radius in duration estimations.

0 means to use Midpoint, negative value means to use R-, and positive value means to use R+.

set_ngram(n)[source]

Set the n value of the n-grams.

It is used to fix the history size (at least =1).

set_with_radius(with_radius)[source]

Set the with_radius option, used to estimate the duration.

Parameters

with_radius – (int) Fix the with_radius option

with_radius can take the following values:

  • 0 means to use midpoint;

  • negative value means to use (midpoint-radius);

  • positive radius means to use (midpoint+radius).

set_withalt(withalt)[source]

Set the withalt option, used to select the labels.

  • False means to use only the label with the higher score

of each annotation - True means to use all labels of each annotation

static tuple_to_dict(items)[source]

Convert into a dictionary.

Parameters

items – (tuple) the ngram items

Returns

dictionary key=text, value=list of durations.

Module contents

filename

sppas.src.config.__init__.py

author

Brigitte Bigi

contact

develop@sppas.org

summary

Package for the automatic data analysis of SPPAS.

analysis: automatic data analysis

This package includes all the automatic analysis of annotated data. It requires the following other packages:

  • config

  • structs

  • anndata

  • calculus

class analysis.RelationFilterTier(filters, annot_format=False, fit=False)[source]

Bases: object

This class applies predefined filters on a tier.

Example:

>>> ft = RelationFilterTier((["overlaps", "overlappedby"], [("overlap_min", 0.04)]), fit=False)
>>> res_tier = ft.filter_tier(tier_x, tier_y)
__init__(filters, annot_format=False, fit=False)[source]

Filter process of a tier.

“annot_format” has an impact on the labels of the ann results but “fit” has an impact on their localizations.

Parameters

filters – (tuple) ([list of functions], [list of options])

each option is a tuple with (name, value) :param annot_format: (bool) The annotation result contains the name of the filter (if True) or the original label (if False) :param fit: (bool) The annotation result fits the other tier.

filter_tier(tier, tier_y, out_tiername='Filtered')[source]

Apply the filters on the given tier.

Parameters
  • tier – (sppasTier) The tier to filter annotations

  • tier_y – (sppasTier) The tier to be in relation with

  • out_tiername – (str) Name or the filtered tier

functions = 'rel'
class analysis.SingleFilterTier(filters, annot_format=False, match_all=True)[source]

Bases: object

This class applies predefined filters on a tier.

Apply defined filters, as a list of tuples with:
  • name of the filter: one of “tag”, “loc”, “dur”, “nlab”, “rel”

  • name of the function in sppasCompare (equal, lt, …)

  • value of its expected type (str, float, int, bool)

__init__(filters, annot_format=False, match_all=True)[source]

Filter process of a tier.

Parameters
  • filters – (list) List of tuples (filter, function, [typed values])

  • annot_format – (bool) The annotation result contains the

name of the filter (if True) or the original label (if False) :param match_all: (bool) The annotations must match all the filters (il set to True) or any of them (if set to False)

filter_tier(tier, out_tiername='Filtered')[source]

Apply the filters on the given tier.

Applicable functions are “tag”, “loc” and “dur”.

Parameters
  • tier – (sppasTier)

  • out_tiername – (str) Name or the filtered tier

Returns

sppasTier or None if no annotation is matching

functions = ('tag', 'loc', 'dur', 'nlab')
class analysis.sppasTierFilters(obj)[source]

Bases: sppas.src.structs.basefilters.sppasBaseFilters

This class implements the ‘SPPAS tier filter system’.

Search in tiers. The class sppasTierFilters() allows to apply several types of filter (tag, duration, …), and the class sppasAnnSet() is a data set manager, i.e. it contains the annotations selected by a filter and a string representing the filter.

Create a filter:

>>> f = sppasTierFilters(tier)

then, apply a filter with some pattern like in the following examples. sppasAnnSet() can be combined with operators & and |, like for any other ‘set’ in Python, ‘an unordered collection of distinct hashable objects’.

Example1

extract silences:

>>> f.tag(exact=u('#')))
Example2

extract silences more than 200ms

>>> f.tag(exact=u("#")) & f.dur(gt=0.2)
Example3

find the annotations with at least a label with a tag

starting by “pa” and ending by “a” like “pa”, “papa”, “pasta”, etc:

>>> f.tag(startswith="pa", endswith='a')

It’s equivalent to write:

>>> f.tag(startswith="pa", endswith='a', logic_bool="and")

The classical “and” and “or” logical boolean predicates are accepted; “and” is the default one. It defines whether all the functions must be True (“and”) or any of them (“or”).

The result of the two previous lines of code is the same, but two times faster, compared to use this one:

>>> f.tag(startswith="pa") & f.tag(endswith='a')

In the first case, for each tag, the method applies the logical boolean between two predicates and creates the data set matching the combined condition. In the second case, each call to the method creates a data set matching each individual condition, then the data sets are combined.

Example4

find annotations with more than 1 label

>>> f.nlab(lge=1))
__init__(obj)[source]

Create a sppasTierFilters instance.

Parameters

obj – (sppasTier) The tier to be filtered.

static cast_data(tier, sfilter, entry)[source]

Return an entry into the appropriate type.

Parameters
  • tier – (sppasTier)

  • sfilter – (str) Name of the filter (tag, loc, …)

  • entry – (str) The entry to cast

Returns

typed entry

dur(**kwargs)[source]

Apply functions on durations of the location of annotations.

Parameters

kwargs – logic_bool/any sppasDurationCompare() method.

Returns

(sppasAnnSet)

Examples:
>>> f.dur(ge=0.03) & f.dur(le=0.07)
>>> f.dur(ge=0.03, le=0.07, logic_bool="and")
loc(**kwargs)[source]

Apply functions on localizations of annotations.

Parameters

kwargs – logic_bool/any sppasLocalizationCompare() method.

Returns

(sppasAnnSet)

Example
>>> f.loc(rangefrom=3.01) & f.loc(rangeto=10.07)
>>> f.loc(rangefrom=3.01, rangeto=10.07, logic_bool="and")
nlab(**kwargs)[source]

Apply functions on number of labels in annotations.

Parameters

kwargs – logic_bool/any sppasListCompare() method.

Returns

(sppasAnnSet)

Example
>>> f.nlab(leq=1)
rel(other_tier, *args, **kwargs)[source]

Apply functions of relations between localizations of annotations.

Parameters
  • other_tier – the tier to be in relation with.

  • args – any sppasIntervalCompare() method.

  • kwargs – any option of the methods.

Returns

(sppasAnnSet)

Example
>>> f.rel(other_tier, "equals", "overlaps", "overlappedby",
>>>       overlap_min=0.04, overlapped_min=0.02)

kwargs can be:

  • max_delay=value, used by before, after

  • overlap_min=value, used by overlap,

  • overlapped_min=value, used by overlappedby

  • percent=boolean, used by overlap, overlapped_by to define the overlap_min is a percentage

tag(**kwargs)[source]

Apply functions on all tags of all labels of annotations.

Each argument is made of a function name and its expected value. Each function can be prefixed with ‘not_’, like in the next example.

Example
>>> f.tag(startswith="pa", not_endswith='a', logic_bool="and")
>>> f.tag(startswith="pa") & f.tag(not_endswith='a')
>>> f.tag(startswith="pa") | f.tag(startswith="ta")
Parameters

kwargs – logic_bool/any sppasTagCompare() method.

Returns

(sppasAnnSet)

class analysis.sppasTierStats(tier=None, n=1, with_radius=0, with_alt=False)[source]

Bases: object

Estimate descriptive statistics of annotations of a tier.

Map a tier into a dictionary where:

  • key is a tag

  • value is the list of observed durations of this tag in annotations

__init__(tier=None, n=1, with_radius=0, with_alt=False)[source]

Create a new TierStats instance.

Parameters
  • tier – (either sppasTier or list of them)

  • n – (int) n-gram value

  • with_radius – (int) 0 to use Midpoint, negative value

to use R-, positive value to use R+ :param with_alt: (bool) Use or not use of alternative labels

ds()[source]

Create a DescriptiveStatistic object for the given tier.

Returns

(DescriptiveStatistic)

get_ngram()[source]

Returns the n-gram value.

get_tier()[source]

Return the tier to estimate stats.

get_with_alt()[source]

Return if alternative labels will be used or not.

get_with_radius()[source]

Returns how to use the radius in duration estimations.

0 means to use Midpoint, negative value means to use R-, and positive value means to use R+.

set_ngram(n)[source]

Set the n value of the n-grams.

It is used to fix the history size (at least =1).

set_with_radius(with_radius)[source]

Set the with_radius option, used to estimate the duration.

Parameters

with_radius – (int) Fix the with_radius option

with_radius can take the following values:

  • 0 means to use midpoint;

  • negative value means to use (midpoint-radius);

  • positive radius means to use (midpoint+radius).

set_withalt(withalt)[source]

Set the withalt option, used to select the labels.

  • False means to use only the label with the higher score

of each annotation - True means to use all labels of each annotation

static tuple_to_dict(items)[source]

Convert into a dictionary.

Parameters

items – (tuple) the ngram items

Returns

dictionary key=text, value=list of durations.