analysis package¶
Submodules¶
analysis.tierfilters module¶
- filename
sppas.src.analysis.tierfilters.py
- author
Brigitte Bigi
- contact
- summary
Filter system for annotations of a tier.
- class analysis.tierfilters.RelationFilterTier(filters, annot_format=False, fit=False)[source]¶
Bases:
object
This class applies predefined filters on a tier.
Example:
>>> ft = RelationFilterTier((["overlaps", "overlappedby"], [("overlap_min", 0.04)]), fit=False) >>> res_tier = ft.filter_tier(tier_x, tier_y)
- __init__(filters, annot_format=False, fit=False)[source]¶
Filter process of a tier.
“annot_format” has an impact on the labels of the ann results but “fit” has an impact on their localizations.
- Parameters
filters – (tuple) ([list of functions], [list of options])
each option is a tuple with (name, value) :param annot_format: (bool) The annotation result contains the name of the filter (if True) or the original label (if False) :param fit: (bool) The annotation result fits the other tier.
- filter_tier(tier, tier_y, out_tiername='Filtered')[source]¶
Apply the filters on the given tier.
- Parameters
tier – (sppasTier) The tier to filter annotations
tier_y – (sppasTier) The tier to be in relation with
out_tiername – (str) Name or the filtered tier
- functions = 'rel'¶
- class analysis.tierfilters.SingleFilterTier(filters, annot_format=False, match_all=True)[source]¶
Bases:
object
This class applies predefined filters on a tier.
- Apply defined filters, as a list of tuples with:
name of the filter: one of “tag”, “loc”, “dur”, “nlab”, “rel”
name of the function in sppasCompare (equal, lt, …)
value of its expected type (str, float, int, bool)
- __init__(filters, annot_format=False, match_all=True)[source]¶
Filter process of a tier.
- Parameters
filters – (list) List of tuples (filter, function, [typed values])
annot_format – (bool) The annotation result contains the
name of the filter (if True) or the original label (if False) :param match_all: (bool) The annotations must match all the filters (il set to True) or any of them (if set to False)
- filter_tier(tier, out_tiername='Filtered')[source]¶
Apply the filters on the given tier.
Applicable functions are “tag”, “loc” and “dur”.
- Parameters
tier – (sppasTier)
out_tiername – (str) Name or the filtered tier
- Returns
sppasTier or None if no annotation is matching
- functions = ('tag', 'loc', 'dur', 'nlab')¶
- class analysis.tierfilters.sppasTierFilters(obj)[source]¶
Bases:
sppas.src.structs.basefilters.sppasBaseFilters
This class implements the ‘SPPAS tier filter system’.
Search in tiers. The class sppasTierFilters() allows to apply several types of filter (tag, duration, …), and the class sppasAnnSet() is a data set manager, i.e. it contains the annotations selected by a filter and a string representing the filter.
Create a filter:
>>> f = sppasTierFilters(tier)
then, apply a filter with some pattern like in the following examples. sppasAnnSet() can be combined with operators & and |, like for any other ‘set’ in Python, ‘an unordered collection of distinct hashable objects’.
- Example1
extract silences:
>>> f.tag(exact=u('#')))
- Example2
extract silences more than 200ms
>>> f.tag(exact=u("#")) & f.dur(gt=0.2)
- Example3
find the annotations with at least a label with a tag
starting by “pa” and ending by “a” like “pa”, “papa”, “pasta”, etc:
>>> f.tag(startswith="pa", endswith='a')
It’s equivalent to write:
>>> f.tag(startswith="pa", endswith='a', logic_bool="and")
The classical “and” and “or” logical boolean predicates are accepted; “and” is the default one. It defines whether all the functions must be True (“and”) or any of them (“or”).
The result of the two previous lines of code is the same, but two times faster, compared to use this one:
>>> f.tag(startswith="pa") & f.tag(endswith='a')
In the first case, for each tag, the method applies the logical boolean between two predicates and creates the data set matching the combined condition. In the second case, each call to the method creates a data set matching each individual condition, then the data sets are combined.
- Example4
find annotations with more than 1 label
>>> f.nlab(lge=1))
- __init__(obj)[source]¶
Create a sppasTierFilters instance.
- Parameters
obj – (sppasTier) The tier to be filtered.
- static cast_data(tier, sfilter, entry)[source]¶
Return an entry into the appropriate type.
- Parameters
tier – (sppasTier)
sfilter – (str) Name of the filter (tag, loc, …)
entry – (str) The entry to cast
- Returns
typed entry
- dur(**kwargs)[source]¶
Apply functions on durations of the location of annotations.
- Parameters
kwargs – logic_bool/any sppasDurationCompare() method.
- Returns
(sppasAnnSet)
- Examples:
>>> f.dur(ge=0.03) & f.dur(le=0.07) >>> f.dur(ge=0.03, le=0.07, logic_bool="and")
- loc(**kwargs)[source]¶
Apply functions on localizations of annotations.
- Parameters
kwargs – logic_bool/any sppasLocalizationCompare() method.
- Returns
(sppasAnnSet)
- Example
>>> f.loc(rangefrom=3.01) & f.loc(rangeto=10.07) >>> f.loc(rangefrom=3.01, rangeto=10.07, logic_bool="and")
- nlab(**kwargs)[source]¶
Apply functions on number of labels in annotations.
- Parameters
kwargs – logic_bool/any sppasListCompare() method.
- Returns
(sppasAnnSet)
- Example
>>> f.nlab(leq=1)
- rel(other_tier, *args, **kwargs)[source]¶
Apply functions of relations between localizations of annotations.
- Parameters
other_tier – the tier to be in relation with.
args – any sppasIntervalCompare() method.
kwargs – any option of the methods.
- Returns
(sppasAnnSet)
- Example
>>> f.rel(other_tier, "equals", "overlaps", "overlappedby", >>> overlap_min=0.04, overlapped_min=0.02)
kwargs can be:
max_delay=value, used by before, after
overlap_min=value, used by overlap,
overlapped_min=value, used by overlappedby
percent=boolean, used by overlap, overlapped_by to define the overlap_min is a percentage
- tag(**kwargs)[source]¶
Apply functions on all tags of all labels of annotations.
Each argument is made of a function name and its expected value. Each function can be prefixed with ‘not_’, like in the next example.
- Example
>>> f.tag(startswith="pa", not_endswith='a', logic_bool="and") >>> f.tag(startswith="pa") & f.tag(not_endswith='a') >>> f.tag(startswith="pa") | f.tag(startswith="ta")
- Parameters
kwargs – logic_bool/any sppasTagCompare() method.
- Returns
(sppasAnnSet)
analysis.tierstats module¶
- filename
sppas.src.analysis.tierstats.py
- author
Brigitte Bigi
- contact
- summary
Statistical distribution system for annotations of a tier.
- class analysis.tierstats.sppasTierStats(tier=None, n=1, with_radius=0, with_alt=False)[source]¶
Bases:
object
Estimate descriptive statistics of annotations of a tier.
Map a tier into a dictionary where:
key is a tag
value is the list of observed durations of this tag in annotations
- __init__(tier=None, n=1, with_radius=0, with_alt=False)[source]¶
Create a new TierStats instance.
- Parameters
tier – (either sppasTier or list of them)
n – (int) n-gram value
with_radius – (int) 0 to use Midpoint, negative value
to use R-, positive value to use R+ :param with_alt: (bool) Use or not use of alternative labels
- ds()[source]¶
Create a DescriptiveStatistic object for the given tier.
- Returns
(DescriptiveStatistic)
- get_with_radius()[source]¶
Returns how to use the radius in duration estimations.
0 means to use Midpoint, negative value means to use R-, and positive value means to use R+.
- set_ngram(n)[source]¶
Set the n value of the n-grams.
It is used to fix the history size (at least =1).
- set_with_radius(with_radius)[source]¶
Set the with_radius option, used to estimate the duration.
- Parameters
with_radius – (int) Fix the with_radius option
with_radius can take the following values:
0 means to use midpoint;
negative value means to use (midpoint-radius);
positive radius means to use (midpoint+radius).
Module contents¶
- filename
sppas.src.config.__init__.py
- author
Brigitte Bigi
- contact
- summary
Package for the automatic data analysis of SPPAS.
analysis: automatic data analysis¶
This package includes all the automatic analysis of annotated data. It requires the following other packages:
config
structs
anndata
calculus
- class analysis.RelationFilterTier(filters, annot_format=False, fit=False)[source]¶
Bases:
object
This class applies predefined filters on a tier.
Example:
>>> ft = RelationFilterTier((["overlaps", "overlappedby"], [("overlap_min", 0.04)]), fit=False) >>> res_tier = ft.filter_tier(tier_x, tier_y)
- __init__(filters, annot_format=False, fit=False)[source]¶
Filter process of a tier.
“annot_format” has an impact on the labels of the ann results but “fit” has an impact on their localizations.
- Parameters
filters – (tuple) ([list of functions], [list of options])
each option is a tuple with (name, value) :param annot_format: (bool) The annotation result contains the name of the filter (if True) or the original label (if False) :param fit: (bool) The annotation result fits the other tier.
- filter_tier(tier, tier_y, out_tiername='Filtered')[source]¶
Apply the filters on the given tier.
- Parameters
tier – (sppasTier) The tier to filter annotations
tier_y – (sppasTier) The tier to be in relation with
out_tiername – (str) Name or the filtered tier
- functions = 'rel'¶
- class analysis.SingleFilterTier(filters, annot_format=False, match_all=True)[source]¶
Bases:
object
This class applies predefined filters on a tier.
- Apply defined filters, as a list of tuples with:
name of the filter: one of “tag”, “loc”, “dur”, “nlab”, “rel”
name of the function in sppasCompare (equal, lt, …)
value of its expected type (str, float, int, bool)
- __init__(filters, annot_format=False, match_all=True)[source]¶
Filter process of a tier.
- Parameters
filters – (list) List of tuples (filter, function, [typed values])
annot_format – (bool) The annotation result contains the
name of the filter (if True) or the original label (if False) :param match_all: (bool) The annotations must match all the filters (il set to True) or any of them (if set to False)
- filter_tier(tier, out_tiername='Filtered')[source]¶
Apply the filters on the given tier.
Applicable functions are “tag”, “loc” and “dur”.
- Parameters
tier – (sppasTier)
out_tiername – (str) Name or the filtered tier
- Returns
sppasTier or None if no annotation is matching
- functions = ('tag', 'loc', 'dur', 'nlab')¶
- class analysis.sppasTierFilters(obj)[source]¶
Bases:
sppas.src.structs.basefilters.sppasBaseFilters
This class implements the ‘SPPAS tier filter system’.
Search in tiers. The class sppasTierFilters() allows to apply several types of filter (tag, duration, …), and the class sppasAnnSet() is a data set manager, i.e. it contains the annotations selected by a filter and a string representing the filter.
Create a filter:
>>> f = sppasTierFilters(tier)
then, apply a filter with some pattern like in the following examples. sppasAnnSet() can be combined with operators & and |, like for any other ‘set’ in Python, ‘an unordered collection of distinct hashable objects’.
- Example1
extract silences:
>>> f.tag(exact=u('#')))
- Example2
extract silences more than 200ms
>>> f.tag(exact=u("#")) & f.dur(gt=0.2)
- Example3
find the annotations with at least a label with a tag
starting by “pa” and ending by “a” like “pa”, “papa”, “pasta”, etc:
>>> f.tag(startswith="pa", endswith='a')
It’s equivalent to write:
>>> f.tag(startswith="pa", endswith='a', logic_bool="and")
The classical “and” and “or” logical boolean predicates are accepted; “and” is the default one. It defines whether all the functions must be True (“and”) or any of them (“or”).
The result of the two previous lines of code is the same, but two times faster, compared to use this one:
>>> f.tag(startswith="pa") & f.tag(endswith='a')
In the first case, for each tag, the method applies the logical boolean between two predicates and creates the data set matching the combined condition. In the second case, each call to the method creates a data set matching each individual condition, then the data sets are combined.
- Example4
find annotations with more than 1 label
>>> f.nlab(lge=1))
- __init__(obj)[source]¶
Create a sppasTierFilters instance.
- Parameters
obj – (sppasTier) The tier to be filtered.
- static cast_data(tier, sfilter, entry)[source]¶
Return an entry into the appropriate type.
- Parameters
tier – (sppasTier)
sfilter – (str) Name of the filter (tag, loc, …)
entry – (str) The entry to cast
- Returns
typed entry
- dur(**kwargs)[source]¶
Apply functions on durations of the location of annotations.
- Parameters
kwargs – logic_bool/any sppasDurationCompare() method.
- Returns
(sppasAnnSet)
- Examples:
>>> f.dur(ge=0.03) & f.dur(le=0.07) >>> f.dur(ge=0.03, le=0.07, logic_bool="and")
- loc(**kwargs)[source]¶
Apply functions on localizations of annotations.
- Parameters
kwargs – logic_bool/any sppasLocalizationCompare() method.
- Returns
(sppasAnnSet)
- Example
>>> f.loc(rangefrom=3.01) & f.loc(rangeto=10.07) >>> f.loc(rangefrom=3.01, rangeto=10.07, logic_bool="and")
- nlab(**kwargs)[source]¶
Apply functions on number of labels in annotations.
- Parameters
kwargs – logic_bool/any sppasListCompare() method.
- Returns
(sppasAnnSet)
- Example
>>> f.nlab(leq=1)
- rel(other_tier, *args, **kwargs)[source]¶
Apply functions of relations between localizations of annotations.
- Parameters
other_tier – the tier to be in relation with.
args – any sppasIntervalCompare() method.
kwargs – any option of the methods.
- Returns
(sppasAnnSet)
- Example
>>> f.rel(other_tier, "equals", "overlaps", "overlappedby", >>> overlap_min=0.04, overlapped_min=0.02)
kwargs can be:
max_delay=value, used by before, after
overlap_min=value, used by overlap,
overlapped_min=value, used by overlappedby
percent=boolean, used by overlap, overlapped_by to define the overlap_min is a percentage
- tag(**kwargs)[source]¶
Apply functions on all tags of all labels of annotations.
Each argument is made of a function name and its expected value. Each function can be prefixed with ‘not_’, like in the next example.
- Example
>>> f.tag(startswith="pa", not_endswith='a', logic_bool="and") >>> f.tag(startswith="pa") & f.tag(not_endswith='a') >>> f.tag(startswith="pa") | f.tag(startswith="ta")
- Parameters
kwargs – logic_bool/any sppasTagCompare() method.
- Returns
(sppasAnnSet)
- class analysis.sppasTierStats(tier=None, n=1, with_radius=0, with_alt=False)[source]¶
Bases:
object
Estimate descriptive statistics of annotations of a tier.
Map a tier into a dictionary where:
key is a tag
value is the list of observed durations of this tag in annotations
- __init__(tier=None, n=1, with_radius=0, with_alt=False)[source]¶
Create a new TierStats instance.
- Parameters
tier – (either sppasTier or list of them)
n – (int) n-gram value
with_radius – (int) 0 to use Midpoint, negative value
to use R-, positive value to use R+ :param with_alt: (bool) Use or not use of alternative labels
- ds()[source]¶
Create a DescriptiveStatistic object for the given tier.
- Returns
(DescriptiveStatistic)
- get_with_radius()[source]¶
Returns how to use the radius in duration estimations.
0 means to use Midpoint, negative value means to use R-, and positive value means to use R+.
- set_ngram(n)[source]¶
Set the n value of the n-grams.
It is used to fix the history size (at least =1).
- set_with_radius(with_radius)[source]¶
Set the with_radius option, used to estimate the duration.
- Parameters
with_radius – (int) Fix the with_radius option
with_radius can take the following values:
0 means to use midpoint;
negative value means to use (midpoint-radius);
positive radius means to use (midpoint+radius).