API¶

This part of the documentation covers all the functions of Chana.

Modules¶

Lemmatizer¶

Lemmatizer for shipibo-konibo Source model is from the Chana project and a use KNeighborsClassifier from scikit-learn

class chana.lemmatizer.GeneralLemmatizer(features_length=10, n_neighbors=5)[source]¶

Instance of a new lemmatizer to be trained and used

get_lemma(rule, word)[source]¶

Method that returns the lemma of a word given a possible rule

Parameters:	rule (list) – a rule to transform a word word (str) – a word to be transformed
Returns:	word transformed
Return type:	str
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.get_lemma(['bo>'],'shipibobo')
'shipibo'       

get_rule(word)[source]¶

Method that returns the transformation rule for a word

Parameters:	word (str) – a word to get the rule
Returns:	numpy array with the rule
Return type:	array
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.get_rule('perrito')
array(['ito>0'], dtype='<U16')       

lemmatize(word)[source]¶

Method that predicts the lemma of a word with the trained model

Parameters:	word (str) – a word to get the lemma
Returns:	lemma of the word
Return type:	str
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.lemmatize('perrito')
'perro'       

preprocess_word(word)[source]¶

Method that turns a word in an array of features for the classifier according to its features_length

Parameters:	word (str) – a word to be transformed
Returns:	list with the features
Return type:	list
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmatizer.preprocess_word('perritos')
[115, 111, 116, 105, 114, 114, 101, 112, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]        

train(words, lemmas)[source]¶

Method that trains a new lemmatizer with a list of words and a list of lemmas of the same size

Parameters:	words (list) – list of words lemmas (list) – list of lemmas
Returns:	none
Return type:	None
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.GeneralLemmatizer()
>>> lemmas = ['perro','gato','mono']
>>> words = ['perritos','gatitos','monotes']
>>> lemmatizer.train(words,lemmas)       

class chana.lemmatizer.ShipiboLemmatizer[source]¶

Instance of the pre-trained shipibo lemmatizer

get_lemma(rule, word)[source]¶

Method that returns the lemma of a shipibo word given a possible rule

Parameters:	rule (list) – a rule to transform a word word (str) – a word to be transformed
Returns:	word transformed
Return type:	str
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.get_lemma(['bo>'],'shipibobo')
'shipibo'       

get_rule(word)[source]¶

Method that returns the transformation rule for a shipibo word

Parameters:	word (str) – a word to get the rule
Returns:	numpy array with the rule
Return type:	array
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.get_rule('pikanwe')
array(['anwe>i'], dtype='<U16')       

lemmatize(word)[source]¶

Method that predicts the lemma of a shipibo word

Parameters:	word (str) – a word to get the lemma
Returns:	lemma of the word
Return type:	str
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.lemmatize('pikanwe')
'piki'       

preprocess_word(word)[source]¶

Method that turns a word in an array of features for the classifier

Parameters:	word (str) – a word to be transformed
Returns:	list with the features
Return type:	list
Example:

>>> import chana.lemmatizer
>>> lemmatizer = chana.lemmatizer.ShipiboLemmatizer()
>>> lemmatizer.preprocess_word('shipibobo')
[111, 98, 111, 98, 105, 112, 105, 104, 115, 0, 0, 0, 0, 0, 0, 0, 0, 0]        

chana.lemmatizer.has_shipibo_suffix(str)[source]¶

Function that returns the possible existence of a shipo suffix in a a word

Parameters:	str (str) – word to evaluate
Returns:	True or False
Return type:	bool
Example:

>>> import chana.lemmatizer
>>> chana.lemmatizer.has_shipibo_suffix('pianra')
True

chana.lemmatizer.longest_common_substring(string1, string2)[source]¶

Function to find the longest common substring of two strings

Parameters:	string1 (str) – string1 string2 (str) – string2
Returns:	longest common substring
Return type:	str
Example:

>>> import chana.lemmatizer
>>> chana.lemmatizer.longest_common_substring('limanko','limanra')
'liman'

chana.lemmatizer.replace_last(source_string, replace_what, replace_with)[source]¶

Function that replaces the last ocurrence of a string in a word

Parameters:	source_string (str) – the source string replace_what (str) – the substring to be replaced replace_with (str) – the string to be inserted
Returns:	string with the replacement
Return type:	str
Example:

>>> import chana.lemmatizer
>>> chana.lemmatizer.replace_last('piati','ti','ra')
'piara'

chana.lemmatizer.shipibo_suffixes()[source]¶

Function that returns a list with all the shipibo suffixes

Returns:	list with all the suffixes
Return type:	list
Example:

>>> import chana.lemmatizer
>>> chana.lemmatizer.shipibo_suffixes()
['naan', 'yama', 'men', 'iosma', ..., 'shoko']

NER¶

Named-entity recognizer for shipibo-konibo Source model is from the Chana project and use predefined rules for the language as well as a crf from pycrfsuite

class chana.ner.ShipiboNER[source]¶

Instance of the rule based NER for shipibo

check_dates(words, entity_tag)[source]¶

Inner method that tags the dates of a sentence with ‘FEC’

Parameters:	words (list) – a list of words to be evaluated entity_tag (list) – a list of words to be evaluated
Returns:	none
Return type:	None

check_locations(words, entity_tag)[source]¶

Inner method that tags the locations of a sentence with ‘LOC’

Parameters:	words (list) – a list of words to be evaluated entity_tag (list) – a list of words to be evaluated
Returns:	none
Return type:	None

check_names(words, entity_tag)[source]¶

Inner method that tags the names/persons of a sentence with ‘PER’

Parameters:	words (list) – a list of words to be evaluated entity_tag (list) – a list of words to be evaluated
Returns:	none
Return type:	None

check_numbers(words, entity_tag)[source]¶

Inner method that tags the numbers of a sentence with ‘NUM’

Parameters:	words (list) – a list of words to be evaluated entity_tag (list) – a list of words to be evaluated
Returns:	none
Return type:	None

check_organizations(words, entity_tag)[source]¶

Inner method that tags the organizations of a sentence with ‘ORG’

Parameters:	words (list) – a list of words to be evaluated entity_tag (list) – a list of words to be evaluated
Returns:	none
Return type:	None

crf_tag(sentence)[source]¶

Method that tags a sentence with the rule based method and then with the crf model

Parameters:	sentence (str) – a sentence to be evaluated
Returns:	list with the ner tags
Return type:	list
Example:

>>> import chana.ner
>>> ner = chana.ner.ShipiboNer()
>>> ner.crf_tag('Limanko enra atsawe')
['LOC', 'O', 'O']

rule_tag(sentence)[source]¶

Method that tags a sentence with the rule based system

Parameters:	sentence (str) – a sentence to be evaluated
Returns:	list with the ner tags
Return type:	list
Example:

>>> import chana.ner
>>> ner = chana.ner.ShipiboNer()
>>> ner.rule_tag('Limanko enra atsawe')
['LOC', 'O', 'O']

sent2features(sent)[source]¶

Inner method that add features to a sentence to be tagged by the crf model

Parameters:	sent (list) – a sentence in list form to be transformed into features
Returns:	list with features
Return type:	list

word2features(sent, i)[source]¶

Inner method that add features to the words of a sentence to be tagged by the crf model

Parameters:	sent (list) – a sentence in list form to be transformed into features i (int) – index of the word to be evaluated
Returns:	list with the features for the indexed word
Return type:	list

chana.ner.is_date(word)[source]¶

Function that returns ‘FEC’ if a shipo word is a date or False if not

Parameters:	word (str) – a word to be evaluated
Returns:	‘FEC’ if a shipo word is a date or False if not
Return type:	str
Example:

>>> import chana.ner
>>> chana.ner.is_date('Agosto')
'FEC'

chana.ner.is_location(word)[source]¶

Function that returns ‘LOC’ if a shipo word is a location or False if not

Parameters:	word (str) – a word to be evaluated
Returns:	‘LOC’ if a shipo word is a location or False if not
Return type:	str
Example:

>>> import chana.ner
>>> chana.is_location.is_name('Limanko')
'LOC'

chana.ner.is_name(word)[source]¶

Function that returns ‘PER’ if a shipo word is a proper name/person or False if not

Parameters:	word (str) – a word to be evaluated
Returns:	‘PER’ if a shipo word is a proper name/person or False if not
Return type:	str
Example:

>>> import chana.ner
>>> chana.ner.is_name('Adriano')
'PER'

chana.ner.is_number(word)[source]¶

Function that returns ‘NUM’ if a shipo word is a number or False if not

Parameters:	word (str) – a word to be evaluated
Returns:	‘NUM’ if a shipo word is a number or False if not
Return type:	str
Example:

>>> import chana.ner
>>> chana.ner.is_number('kimisha')
'NUM'

chana.ner.is_organization(word)[source]¶

Function that returns ‘ORG’ if a shipo word is an organization or False if not

Parameters:	word (str) – a word to be evaluated
Returns:	‘ORG’ if a shipo word is an organization or False if not
Return type:	str
Example:

>>> import chana.ner
>>> chana.ner.is_organization('AUT')
'ORG'

chana.ner.load_array(file, array)[source]¶

Inner function that loads the information of a file into a list

Parameters:	file (File) – a file to be loaded array (list) – a list to be populated with the information from the file
Returns:	none
Return type:	None

Pos_Tagger¶

Part-of-Speech (POS) Tagger for shipibo-konibo. Source model is from the Chana project

class chana.pos_tagger.ShipiboPosTagger[source]¶

Instance of the pre-trained shipibo part-of-speech tagger

features(sentence, tags, index)[source]¶

Method that returns the features of a word in a sentence to be used by the model

Parameters:	sentence (str) – a sentence in shipibo-konibo tags (list) – tags to be returned for the word index (int) – position of the word in the sentence
Returns:	dict of features for the indexed word
Return type:	dict
Example:

>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.features('Atsa ea piai',['','',''],2)
{'word': 's', 'prevWord': 't', 'nextWord': 'a', 'isFirst': False, 'isLast': False, 'isCapitalized': False, 'isAllCaps': False, 'isAllLowers': True, 'prefix-1': 's', 'prefix-2': 's', 'prefix-3': 's', 'prefix-4': 's', 'suffix-1': 's', 'suffix-2': 's', 'suffix-3': 's', 'suffix-4': 's', 'tag-1': '', 'tag-2': ''}

full_pos_tag(sentence)[source]¶

Method that predict the pos-tags of a shipibo sentence and returns the full tag in spanish

Parameters:	sentence (str) – a sentence in shipibo-konibo
Returns:	list of the tags in spanish
Return type:	list
Example:

>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.full_pos_tag('Atsa ea piai')
['Nombre', 'Pronombre', 'Verbo']

get_complete_tag(pos)[source]¶

Method that returns the full tag in spanish of a tag

Parameters:	pos (str) – a pos tag in the UD format
Returns:	str with the tag in spanish
Return type:	str
Example:

>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.get_complete_tag('ADJ')
'Adjetivo'

pos_tag(sentence)[source]¶

Method that predict the pos-tags of a shipibo sentence in the UD format

Parameters:	sentence (str) – a sentence in shipibo-konibo
Returns:	list of the tags in UD format
Return type:	list
Example:

>>> import chana.pos_tagger
>>> tagger = chana.pos_tagger.ShipiboPosTagger()
>>> tagger.pos_tag('Atsa ea piai')
['NOUN', 'PRON', 'VERB']

Syllabificator¶

Syllabificator for shipibo-konibo. General functions and rules to syllabify a shipibo-konibo word

chana.syllabificator.accentuate(letter)[source]¶

Function that adds the accentuation mark of a letter:

Parameters:	letter (str) – a letter to be accentuated
Returns:	letter accentuated
Return type:	str
Example:

>>> import chana.syllabificator
>>> chana.syllabificator.accentuate('a')
á

chana.syllabificator.change(syllable)[source]¶

Function that returns the original form of a syllable

Parameters:	syllable (str) – a syllable to be transformed
Returns:	syllable with its original form
Return type:	str
Example:

>>> import chana.syllabificator
>>> chana.syllabificator.change('1a')
cha

chana.syllabificator.get_vc(word)[source]¶

Function that returns all the vowels and consonants of a word

Parameters:	word (str) – word to get its vowels and consonants
Returns:	list of ‘V’ and ‘C’ for each letter of the word
Return type:	list
Example:

>>> import chana.syllabificator
>>> chana.syllabificator.get_vc('piti')
[['p', 'C'], ['i', 'V'], ['t', 'C'], ['i', 'V']]

chana.syllabificator.syllabify(word)[source]¶

Function that returns all the syllables of a word

Parameters:	word (str) – a word to get its syllables
Returns:	list of syllables
Return type:	list
Example:

>>> import chana.syllabificator
>>> chana.syllabificator.syllabify('atsabo')
['a', 'tsa', 'bo']