API

This part of the documentation covers all the functions of Chana.

Modules

Lemmatizer

Lemmatizer for shipibo-konibo Source model is from the Chana project and a use KNeighborsClassifier from scikit-learn

class chana.lemmatizer.GeneralLemmatizer(features_length=10, n_neighbors=5)[source]

Instance of a new lemmatizer to be trained and used

get_lemma(rule, word)[source]

Method that returns the lemma of a word given a possible rule

Parameters:
  • rule (list) – a rule to transform a word
  • word (str) – a word to be transformed
Returns:

word transformed

Return type:

str

Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.get_lemma(['bo>'],'shipibobo')
'shipibo'       
get_rule(word)[source]

Method that returns the transformation rule for a word

Parameters:word (str) – a word to get the rule
Returns:numpy array with the rule
Return type:array
Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.get_rule('perrito')
array(['ito>0'], dtype='<U16')       
lemmatize(word)[source]

Method that predicts the lemma of a word with the trained model

Parameters:word (str) – a word to get the lemma
Returns:lemma of the word
Return type:str
Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.lemmatize('perrito')
'perro'       
preprocess_word(word)[source]

Method that turns a word in an array of features for the classifier according to its features_length

Parameters:word (str) – a word to be transformed
Returns:list with the features
Return type:list
Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.preprocess_word('perritos')
[115, 111, 116, 105, 114, 114, 101, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]        
train(words, lemmas)[source]

Method that trains a new lemmatizer with a list of words and a list of lemmas of the same size

Parameters:
  • words (list) – list of words
  • lemmas (list) – list of lemmas
Returns:

none

Return type:

None

Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmas = ['perro','gato','mono']
>>> words = ['perritos','gatitos','monotes']
>>> lemmatizer.train(words,lemmas)       
class chana.lemmatizer.ShipiboLemmatizer[source]

Instance of the pre-trained shipibo lemmatizer

get_lemma(rule, word)[source]

Method that returns the lemma of a shipibo word given a possible rule

Parameters:
  • rule (list) – a rule to transform a word
  • word (str) – a word to be transformed
Returns:

word transformed

Return type:

str

Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.get_lemma(['bo>'],'shipibobo')
'shipibo'       
get_rule(word)[source]

Method that returns the transformation rule for a shipibo word

Parameters:word (str) – a word to get the rule
Returns:numpy array with the rule
Return type:array
Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.get_rule('pikanwe')
array(['anwe>i'], dtype='<U16')       
lemmatize(word)[source]

Method that predicts the lemma of a shipibo word

Parameters:word (str) – a word to get the lemma
Returns:lemma of the word
Return type:str
Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.lemmatize('pikanwe')
'piki'       
preprocess_word(word)[source]

Method that turns a word in an array of features for the classifier

Parameters:word (str) – a word to be transformed
Returns:list with the features
Return type:list
Example:
>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.preprocess_word('shipibobo')
[111, 98, 111, 98, 105, 112, 105, 104, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0]        
chana.lemmatizer.has_shipibo_suffix(str)[source]

Function that returns the possible existence of a shipo suffix in a a word

Parameters:str (str) – word to evaluate
Returns:True or False
Return type:bool
Example:
>>> import chana.lemmatizer
>>> chana.lemmatizer.has_shipibo_suffix('pianra')
True
chana.lemmatizer.longest_common_substring(string1, string2)[source]

Function to find the longest common substring of two strings

Parameters:
  • string1 (str) – string1
  • string2 (str) – string2
Returns:

longest common substring

Return type:

str

Example:
>>> import chana.lemmatizer
>>> chana.lemmatizer.longest_common_substring('limanko','limanra')
'liman'
chana.lemmatizer.replace_last(source_string, replace_what, replace_with)[source]

Function that replaces the last ocurrence of a string in a word

Parameters:
  • source_string (str) – the source string
  • replace_what (str) – the substring to be replaced
  • replace_with (str) – the string to be inserted
Returns:

string with the replacement

Return type:

str

Example:
>>> import chana.lemmatizer
>>> chana.lemmatizer.replace_last('piati','ti','ra')
'piara'
chana.lemmatizer.shipibo_suffixes()[source]

Function that returns a list with all the shipibo suffixes

Returns:list with all the suffixes
Return type:list
Example:
>>> import chana.lemmatizer
>>> chana.lemmatizer.shipibo_suffixes()
['naan', 'yama', 'men', 'iosma', ..., 'shoko']

NER

Named-entity recognizer for shipibo-konibo Source model is from the Chana project and use predefined rules for the language as well as a crf from pycrfsuite

class chana.ner.ShipiboNER[source]

Instance of the rule based NER for shipibo

check_dates(words, entity_tag)[source]

Inner method that tags the dates of a sentence with ‘FEC’

Parameters:
  • words (list) – a list of words to be evaluated
  • entity_tag (list) – a list of words to be evaluated
Returns:

none

Return type:

None

check_locations(words, entity_tag)[source]

Inner method that tags the locations of a sentence with ‘LOC’

Parameters:
  • words (list) – a list of words to be evaluated
  • entity_tag (list) – a list of words to be evaluated
Returns:

none

Return type:

None

check_names(words, entity_tag)[source]

Inner method that tags the names/persons of a sentence with ‘PER’

Parameters:
  • words (list) – a list of words to be evaluated
  • entity_tag (list) – a list of words to be evaluated
Returns:

none

Return type:

None

check_numbers(words, entity_tag)[source]

Inner method that tags the numbers of a sentence with ‘NUM’

Parameters:
  • words (list) – a list of words to be evaluated
  • entity_tag (list) – a list of words to be evaluated
Returns:

none

Return type:

None

check_organizations(words, entity_tag)[source]

Inner method that tags the organizations of a sentence with ‘ORG’

Parameters:
  • words (list) – a list of words to be evaluated
  • entity_tag (list) – a list of words to be evaluated
Returns:

none

Return type:

None

crf_tag(sentence)[source]

Method that tags a sentence with the rule based method and then with the crf model

Parameters:sentence (str) – a sentence to be evaluated
Returns:list with the ner tags
Return type:list
Example:
>>> import chana.ner
>>> ner = chana.ner.ShipiboNer()
>>> ner.crf_tag('Limanko enra atsawe')
['LOC', 'O', 'O']
rule_tag(sentence)[source]

Method that tags a sentence with the rule based system

Parameters:sentence (str) – a sentence to be evaluated
Returns:list with the ner tags
Return type:list
Example:
>>> import chana.ner
>>> ner = chana.ner.ShipiboNer()
>>> ner.rule_tag('Limanko enra atsawe')
['LOC', 'O', 'O']
sent2features(sent)[source]

Inner method that add features to a sentence to be tagged by the crf model

Parameters:sent (list) – a sentence in list form to be transformed into features
Returns:list with features
Return type:list
word2features(sent, i)[source]

Inner method that add features to the words of a sentence to be tagged by the crf model

Parameters:
  • sent (list) – a sentence in list form to be transformed into features
  • i (int) – index of the word to be evaluated
Returns:

list with the features for the indexed word

Return type:

list

chana.ner.is_date(word)[source]

Function that returns ‘FEC’ if a shipo word is a date or False if not

Parameters:word (str) – a word to be evaluated
Returns:‘FEC’ if a shipo word is a date or False if not
Return type:str
Example:
>>> import chana.ner
>>> chana.ner.is_date('Agosto')
'FEC'
chana.ner.is_location(word)[source]

Function that returns ‘LOC’ if a shipo word is a location or False if not

Parameters:word (str) – a word to be evaluated
Returns:‘LOC’ if a shipo word is a location or False if not
Return type:str
Example:
>>> import chana.ner
>>> chana.is_location.is_name('Limanko')
'LOC'
chana.ner.is_name(word)[source]

Function that returns ‘PER’ if a shipo word is a proper name/person or False if not

Parameters:word (str) – a word to be evaluated
Returns:‘PER’ if a shipo word is a proper name/person or False if not
Return type:str
Example:
>>> import chana.ner
>>> chana.ner.is_name('Adriano')
'PER'
chana.ner.is_number(word)[source]

Function that returns ‘NUM’ if a shipo word is a number or False if not

Parameters:word (str) – a word to be evaluated
Returns:‘NUM’ if a shipo word is a number or False if not
Return type:str
Example:
>>> import chana.ner
>>> chana.ner.is_number('kimisha')
'NUM'
chana.ner.is_organization(word)[source]

Function that returns ‘ORG’ if a shipo word is an organization or False if not

Parameters:word (str) – a word to be evaluated
Returns:‘ORG’ if a shipo word is an organization or False if not
Return type:str
Example:
>>> import chana.ner
>>> chana.ner.is_organization('AUT')
'ORG'
chana.ner.load_array(file, array)[source]

Inner function that loads the information of a file into a list

Parameters:
  • file (File) – a file to be loaded
  • array (list) – a list to be populated with the information from the file
Returns:

none

Return type:

None

Pos_Tagger

Part-of-Speech (POS) Tagger for shipibo-konibo. Source model is from the Chana project

class chana.pos_tagger.ShipiboPosTagger[source]

Instance of the pre-trained shipibo part-of-speech tagger

features(sentence, tags, index)[source]

Method that returns the features of a word in a sentence to be used by the model

Parameters:
  • sentence (str) – a sentence in shipibo-konibo
  • tags (list) – tags to be returned for the word
  • index (int) – position of the word in the sentence
Returns:

dict of features for the indexed word

Return type:

dict

Example:
>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.features('Atsa ea piai',['','',''],2)
{'word': 's', 'prevWord': 't', 'nextWord': 'a', 'isFirst': False, 'isLast': False, 'isCapitalized': False, 'isAllCaps': False, 'isAllLowers': True, 'prefix-1': 's', 'prefix-2': 's', 'prefix-3': 's', 'prefix-4': 's', 'suffix-1': 's', 'suffix-2': 's', 'suffix-3': 's', 'suffix-4': 's', 'tag-1': '', 'tag-2': ''}
full_pos_tag(sentence)[source]

Method that predict the pos-tags of a shipibo sentence and returns the full tag in spanish

Parameters:sentence (str) – a sentence in shipibo-konibo
Returns:list of the tags in spanish
Return type:list
Example:
>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.full_pos_tag('Atsa ea piai')
['Nombre', 'Pronombre', 'Verbo']
get_complete_tag(pos)[source]

Method that returns the full tag in spanish of a tag

Parameters:pos (str) – a pos tag in the UD format
Returns:str with the tag in spanish
Return type:str
Example:
>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.get_complete_tag('ADJ')
'Adjetivo'
pos_tag(sentence)[source]

Method that predict the pos-tags of a shipibo sentence in the UD format

Parameters:sentence (str) – a sentence in shipibo-konibo
Returns:list of the tags in UD format
Return type:list
Example:
>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.pos_tag('Atsa ea piai')
['NOUN', 'PRON', 'VERB']

Syllabificator

Syllabificator for shipibo-konibo. General functions and rules to syllabify a shipibo-konibo word

chana.syllabificator.accentuate(letter)[source]

Function that adds the accentuation mark of a letter:

Parameters:letter (str) – a letter to be accentuated
Returns:letter accentuated
Return type:str
Example:
>>> import chana.syllabificator
>>> chana.syllabificator.accentuate('a')
á
chana.syllabificator.change(syllable)[source]

Function that returns the original form of a syllable

Parameters:syllable (str) – a syllable to be transformed
Returns:syllable with its original form
Return type:str
Example:
>>> import chana.syllabificator
>>> chana.syllabificator.change('1a')
cha
chana.syllabificator.get_vc(word)[source]

Function that returns all the vowels and consonants of a word

Parameters:word (str) – word to get its vowels and consonants
Returns:list of ‘V’ and ‘C’ for each letter of the word
Return type:list
Example:
>>> import chana.syllabificator
>>> chana.syllabificator.get_vc('piti')
[['p', 'C'], ['i', 'V'], ['t', 'C'], ['i', 'V']]
chana.syllabificator.syllabify(word)[source]

Function that returns all the syllables of a word

Parameters:word (str) – a word to get its syllables
Returns:list of syllables
Return type:list
Example:
>>> import chana.syllabificator
>>> chana.syllabificator.syllabify('atsabo')
['a', 'tsa', 'bo']