chana package¶
Submodules¶
chana.lemmatizer module¶
Lemmatizer for shipibo-konibo Source model is from the Chana project and a use KNeighborsClassifier from scikit-learn
-
class
chana.lemmatizer.
GeneralLemmatizer
(features_length=10, n_neighbors=5)[source]¶ Instance of a new lemmatizer to be trained and used
-
get_lemma
(rule, word)[source]¶ Method that returns the lemma of a word given a possible rule
Parameters: - rule (list) – a rule to transform a word
- word (str) – a word to be transformed
Returns: word transformed
Return type: str
Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.GeneralLemmatizer() >>> lemmatizer.get_lemma(['bo>'],'shipibobo') 'shipibo'
-
get_rule
(word)[source]¶ Method that returns the transformation rule for a word
Parameters: word (str) – a word to get the rule Returns: numpy array with the rule Return type: array Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.GeneralLemmatizer() >>> lemmatizer.get_rule('perrito') array(['ito>0'], dtype='<U16')
-
lemmatize
(word)[source]¶ Method that predicts the lemma of a word with the trained model
Parameters: word (str) – a word to get the lemma Returns: lemma of the word Return type: str Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.GeneralLemmatizer() >>> lemmatizer.lemmatize('perrito') 'perro'
-
preprocess_word
(word)[source]¶ Method that turns a word in an array of features for the classifier according to its features_length
Parameters: word (str) – a word to be transformed Returns: list with the features Return type: list Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.GeneralLemmatizer() >>> lemmatizer.preprocess_word('perritos') [115, 111, 116, 105, 114, 114, 101, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-
train
(words, lemmas)[source]¶ Method that trains a new lemmatizer with a list of words and a list of lemmas of the same size
Parameters: - words (list) – list of words
- lemmas (list) – list of lemmas
Returns: none
Return type: None
Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.GeneralLemmatizer() >>> lemmas = ['perro','gato','mono'] >>> words = ['perritos','gatitos','monotes'] >>> lemmatizer.train(words,lemmas)
-
-
class
chana.lemmatizer.
ShipiboLemmatizer
[source]¶ Instance of the pre-trained shipibo lemmatizer
-
get_lemma
(rule, word)[source]¶ Method that returns the lemma of a shipibo word given a possible rule
Parameters: - rule (list) – a rule to transform a word
- word (str) – a word to be transformed
Returns: word transformed
Return type: str
Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer() >>> lemmatizer.get_lemma(['bo>'],'shipibobo') 'shipibo'
-
get_rule
(word)[source]¶ Method that returns the transformation rule for a shipibo word
Parameters: word (str) – a word to get the rule Returns: numpy array with the rule Return type: array Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer() >>> lemmatizer.get_rule('pikanwe') array(['anwe>i'], dtype='<U16')
-
lemmatize
(word)[source]¶ Method that predicts the lemma of a shipibo word
Parameters: word (str) – a word to get the lemma Returns: lemma of the word Return type: str Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer() >>> lemmatizer.lemmatize('pikanwe') 'piki'
-
preprocess_word
(word)[source]¶ Method that turns a word in an array of features for the classifier
Parameters: word (str) – a word to be transformed Returns: list with the features Return type: list Example: >>> import chana.lemmatizer >>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer() >>> lemmatizer.preprocess_word('shipibobo') [111, 98, 111, 98, 105, 112, 105, 104, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0]
-
-
chana.lemmatizer.
has_shipibo_suffix
(str)[source]¶ Function that returns the possible existence of a shipo suffix in a a word
Parameters: str (str) – word to evaluate Returns: True or False Return type: bool Example: >>> import chana.lemmatizer >>> chana.lemmatizer.has_shipibo_suffix('pianra') True
-
chana.lemmatizer.
longest_common_substring
(string1, string2)[source]¶ Function to find the longest common substring of two strings
Parameters: - string1 (str) – string1
- string2 (str) – string2
Returns: longest common substring
Return type: str
Example: >>> import chana.lemmatizer >>> chana.lemmatizer.longest_common_substring('limanko','limanra') 'liman'
-
chana.lemmatizer.
replace_last
(source_string, replace_what, replace_with)[source]¶ Function that replaces the last ocurrence of a string in a word
Parameters: - source_string (str) – the source string
- replace_what (str) – the substring to be replaced
- replace_with (str) – the string to be inserted
Returns: string with the replacement
Return type: str
Example: >>> import chana.lemmatizer >>> chana.lemmatizer.replace_last('piati','ti','ra') 'piara'
chana.ner module¶
Named-entity recognizer for shipibo-konibo Source model is from the Chana project and use predefined rules for the language as well as a crf from pycrfsuite
-
class
chana.ner.
ShipiboNER
[source]¶ Instance of the rule based NER for shipibo
-
check_dates
(words, entity_tag)[source]¶ Inner method that tags the dates of a sentence with ‘FEC’
Parameters: - words (list) – a list of words to be evaluated
- entity_tag (list) – a list of words to be evaluated
Returns: none
Return type: None
-
check_locations
(words, entity_tag)[source]¶ Inner method that tags the locations of a sentence with ‘LOC’
Parameters: - words (list) – a list of words to be evaluated
- entity_tag (list) – a list of words to be evaluated
Returns: none
Return type: None
-
check_names
(words, entity_tag)[source]¶ Inner method that tags the names/persons of a sentence with ‘PER’
Parameters: - words (list) – a list of words to be evaluated
- entity_tag (list) – a list of words to be evaluated
Returns: none
Return type: None
-
check_numbers
(words, entity_tag)[source]¶ Inner method that tags the numbers of a sentence with ‘NUM’
Parameters: - words (list) – a list of words to be evaluated
- entity_tag (list) – a list of words to be evaluated
Returns: none
Return type: None
-
check_organizations
(words, entity_tag)[source]¶ Inner method that tags the organizations of a sentence with ‘ORG’
Parameters: - words (list) – a list of words to be evaluated
- entity_tag (list) – a list of words to be evaluated
Returns: none
Return type: None
-
crf_tag
(sentence)[source]¶ Method that tags a sentence with the rule based method and then with the crf model
Parameters: sentence (str) – a sentence to be evaluated Returns: list with the ner tags Return type: list Example: >>> import chana.ner >>> ner = chana.ner.ShipiboNer() >>> ner.crf_tag('Limanko enra atsawe') ['LOC', 'O', 'O']
-
rule_tag
(sentence)[source]¶ Method that tags a sentence with the rule based system
Parameters: sentence (str) – a sentence to be evaluated Returns: list with the ner tags Return type: list Example: >>> import chana.ner >>> ner = chana.ner.ShipiboNer() >>> ner.rule_tag('Limanko enra atsawe') ['LOC', 'O', 'O']
-
sent2features
(sent)[source]¶ Inner method that add features to a sentence to be tagged by the crf model
Parameters: sent (list) – a sentence in list form to be transformed into features Returns: list with features Return type: list
-
word2features
(sent, i)[source]¶ Inner method that add features to the words of a sentence to be tagged by the crf model
Parameters: - sent (list) – a sentence in list form to be transformed into features
- i (int) – index of the word to be evaluated
Returns: list with the features for the indexed word
Return type: list
-
-
chana.ner.
is_date
(word)[source]¶ Function that returns ‘FEC’ if a shipo word is a date or False if not
Parameters: word (str) – a word to be evaluated Returns: ‘FEC’ if a shipo word is a date or False if not Return type: str Example: >>> import chana.ner >>> chana.ner.is_date('Agosto') 'FEC'
-
chana.ner.
is_location
(word)[source]¶ Function that returns ‘LOC’ if a shipo word is a location or False if not
Parameters: word (str) – a word to be evaluated Returns: ‘LOC’ if a shipo word is a location or False if not Return type: str Example: >>> import chana.ner >>> chana.is_location.is_name('Limanko') 'LOC'
-
chana.ner.
is_name
(word)[source]¶ Function that returns ‘PER’ if a shipo word is a proper name/person or False if not
Parameters: word (str) – a word to be evaluated Returns: ‘PER’ if a shipo word is a proper name/person or False if not Return type: str Example: >>> import chana.ner >>> chana.ner.is_name('Adriano') 'PER'
-
chana.ner.
is_number
(word)[source]¶ Function that returns ‘NUM’ if a shipo word is a number or False if not
Parameters: word (str) – a word to be evaluated Returns: ‘NUM’ if a shipo word is a number or False if not Return type: str Example: >>> import chana.ner >>> chana.ner.is_number('kimisha') 'NUM'
-
chana.ner.
is_organization
(word)[source]¶ Function that returns ‘ORG’ if a shipo word is an organization or False if not
Parameters: word (str) – a word to be evaluated Returns: ‘ORG’ if a shipo word is an organization or False if not Return type: str Example: >>> import chana.ner >>> chana.ner.is_organization('AUT') 'ORG'
chana.pos_tagger module¶
Part-of-Speech (POS) Tagger for shipibo-konibo. Source model is from the Chana project
-
class
chana.pos_tagger.
ShipiboPosTagger
[source]¶ Instance of the pre-trained shipibo part-of-speech tagger
-
features
(sentence, tags, index)[source]¶ Method that returns the features of a word in a sentence to be used by the model
Parameters: - sentence (str) – a sentence in shipibo-konibo
- tags (list) – tags to be returned for the word
- index (int) – position of the word in the sentence
Returns: dict of features for the indexed word
Return type: dict
Example: >>> import chana.pos_tagger >>> tagger = chana.pos_tagger.ShipiboPosTagger() >>> tagger.features('Atsa ea piai',['','',''],2) {'word': 's', 'prevWord': 't', 'nextWord': 'a', 'isFirst': False, 'isLast': False, 'isCapitalized': False, 'isAllCaps': False, 'isAllLowers': True, 'prefix-1': 's', 'prefix-2': 's', 'prefix-3': 's', 'prefix-4': 's', 'suffix-1': 's', 'suffix-2': 's', 'suffix-3': 's', 'suffix-4': 's', 'tag-1': '', 'tag-2': ''}
-
full_pos_tag
(sentence)[source]¶ Method that predict the pos-tags of a shipibo sentence and returns the full tag in spanish
Parameters: sentence (str) – a sentence in shipibo-konibo Returns: list of the tags in spanish Return type: list Example: >>> import chana.pos_tagger >>> tagger = chana.pos_tagger.ShipiboPosTagger() >>> tagger.full_pos_tag('Atsa ea piai') ['Nombre', 'Pronombre', 'Verbo']
-
get_complete_tag
(pos)[source]¶ Method that returns the full tag in spanish of a tag
Parameters: pos (str) – a pos tag in the UD format Returns: str with the tag in spanish Return type: str Example: >>> import chana.pos_tagger >>> tagger = chana.pos_tagger.ShipiboPosTagger() >>> tagger.get_complete_tag('ADJ') 'Adjetivo'
-
pos_tag
(sentence)[source]¶ Method that predict the pos-tags of a shipibo sentence in the UD format
Parameters: sentence (str) – a sentence in shipibo-konibo Returns: list of the tags in UD format Return type: list Example: >>> import chana.pos_tagger >>> tagger = chana.pos_tagger.ShipiboPosTagger() >>> tagger.pos_tag('Atsa ea piai') ['NOUN', 'PRON', 'VERB']
-
chana.syllabificator module¶
Syllabificator for shipibo-konibo. General functions and rules to syllabify a shipibo-konibo word
-
chana.syllabificator.
accentuate
(letter)[source]¶ Function that adds the accentuation mark of a letter:
Parameters: letter (str) – a letter to be accentuated Returns: letter accentuated Return type: str Example: >>> import chana.syllabificator >>> chana.syllabificator.accentuate('a') á
-
chana.syllabificator.
change
(syllable)[source]¶ Function that returns the original form of a syllable
Parameters: syllable (str) – a syllable to be transformed Returns: syllable with its original form Return type: str Example: >>> import chana.syllabificator >>> chana.syllabificator.change('1a') cha
-
chana.syllabificator.
get_vc
(word)[source]¶ Function that returns all the vowels and consonants of a word
Parameters: word (str) – word to get its vowels and consonants Returns: list of ‘V’ and ‘C’ for each letter of the word Return type: list Example: >>> import chana.syllabificator >>> chana.syllabificator.get_vc('piti') [['p', 'C'], ['i', 'V'], ['t', 'C'], ['i', 'V']]
Module contents¶
Basic toolkit for the shipibo-konibo language
- Modules that are implemented:
- -Lemmatizer -NER -Syllabificator -Pos-Tagger
For more information on these modules check help(chana.module_name)
All the information and code is from the Chana project