annotations.StopWords package¶
Submodules¶
annotations.StopWords.sppasstpwds module¶
- filename
sppas.src.annotations.StopWords.sppaswtpwds.py
- author
Brigitte Bigi
- contact
- summary
SPPAS integration of the StopWords automatic annotation.
- class annotations.StopWords.sppasstpwds.sppasStopWords(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the identification of stop words in a tier.
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- load_resources(lang_resources, lang=None)[source]¶
Load a list of stop-words and replacements.
Override the existing loaded lists…
- Parameters
lang_resources – (str) File with extension ‘.stp’ or nothing
lang – (str)
- make_stp_tier(tier)[source]¶
Return a tier indicating if entries are stop-words.
- Parameters
tier – (sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Time-aligned tokens
output – (str) the output file name
- Returns
(sppasTranscription)
annotations.StopWords.stpwds module¶
- filename
sppas.src.annotations.StopWords.stpwds.py
- author
Brigitte Bigi
- contact
- summary
Stopwords detection.
- class annotations.StopWords.stpwds.StopWords(case_sensitive=False)[source]¶
Bases:
sppas.src.resources.vocab.sppasVocabulary
A vocabulary that can automatically evaluate a list of Stop-Words.
An entry ‘w’ is relevant for the speaker if its probability is less than a threshold:
P(w) <= 1 / (alpha * V)where ‘alpha’ is an empirical coefficient and ‘V’ is the vocabulary size of the speaker.
- MAX_ALPHA = 4.0¶
- MIN_ANN_NUMBER = 5¶
- __init__(case_sensitive=False)[source]¶
Create a new StopWords instance.
- Parameters
case_sensitive – (bool) Considers the case of entries or not.
- property alpha¶
Return the value of alpha coefficient (float).
- evaluate(tier=None, merge=True)[source]¶
Add entries to the list of stop-words from the content of a tier.
Estimate if a token is relevant: if not it adds it in the stop-list.
- Parameters
tier – (sppasTier) A tier with entries to be analyzed.
merge – (bool) Merge with the existing list (if True) or
delete the existing list and create a new one (if False) :returns: (int) Number of entries added into the list :raises: EmptyInputError, TooSmallInputError
Module contents¶
- filename
sppas.src.annotations.StopWords.__init__.py
- author
Brigitte Bigi
- contact
- summary
Stop-words boolean annotation.
- class annotations.StopWords.StopWords(case_sensitive=False)[source]¶
Bases:
sppas.src.resources.vocab.sppasVocabulary
A vocabulary that can automatically evaluate a list of Stop-Words.
An entry ‘w’ is relevant for the speaker if its probability is less than a threshold:
P(w) <= 1 / (alpha * V)where ‘alpha’ is an empirical coefficient and ‘V’ is the vocabulary size of the speaker.
- MAX_ALPHA = 4.0¶
- MIN_ANN_NUMBER = 5¶
- __init__(case_sensitive=False)[source]¶
Create a new StopWords instance.
- Parameters
case_sensitive – (bool) Considers the case of entries or not.
- property alpha¶
Return the value of alpha coefficient (float).
- evaluate(tier=None, merge=True)[source]¶
Add entries to the list of stop-words from the content of a tier.
Estimate if a token is relevant: if not it adds it in the stop-list.
- Parameters
tier – (sppasTier) A tier with entries to be analyzed.
merge – (bool) Merge with the existing list (if True) or
delete the existing list and create a new one (if False) :returns: (int) Number of entries added into the list :raises: EmptyInputError, TooSmallInputError
- class annotations.StopWords.sppasStopWords(log=None)[source]¶
Bases:
annotations.baseannot.sppasBaseAnnotation
SPPAS integration of the identification of stop words in a tier.
- get_inputs(input_files)[source]¶
Return the the tier with aligned tokens.
- Parameters
input_files – (list)
- Raise
NoTierInputError
- Returns
(sppasTier)
- load_resources(lang_resources, lang=None)[source]¶
Load a list of stop-words and replacements.
Override the existing loaded lists…
- Parameters
lang_resources – (str) File with extension ‘.stp’ or nothing
lang – (str)
- make_stp_tier(tier)[source]¶
Return a tier indicating if entries are stop-words.
- Parameters
tier – (sppasTier)
- run(input_files, output=None)[source]¶
Run the automatic annotation process on an input.
- Parameters
input_files – (list of str) Time-aligned tokens
output – (str) the output file name
- Returns
(sppasTranscription)