String Tools¶
The module string_tools gathers utils for string manipulation, essentially cleaning.
-
class
string_tools.StringCleaner[source]¶ Provides tools to clean strings, like accents removal and standardisation.
-
class
string_tools.StringHasher(n=1)[source]¶ Provides tools to transform a sentence into a bag-of-words vector.
Parameters: n (int) – the dimension of n-gram -
hash(s)[source]¶ Transforms a string into a n-gram count representation.
Parameters: s (string) – the string to hash Returns: n-gram count representation of the string given in input. Return type: np.ndarray
-
init_ngrams(tokens)[source]¶ Computes the ngrams from a list of words and affects them to
self.ngrams.Todo
deal with the case n != 1
Parameters: tokens (list of strings) – list of words from which compute the n-grams
-
-
class
string_tools.WordHasher(n=3, bord='#')[source]¶ Provides tools to transform a string into a bag-of-ngrams vector.
Parameters: - n (int) – dimension of n-gram
- bord (string) – delimiter character to surround words with
-
hash(s)[source]¶ Transforms a string into a n-gram count representation.
Parameters: s (string) – the string to hash Returns: a n-gram count representation of the string given in input. Return type: np.ndarray
-
init_ngrams(tokens)[source]¶ Computes the ngrams from a list of words and affects them to
self.ngrams.Parameters: tokens (list of strings) – list of words from which compute the ngrams