This blog post will explain how to encode a text sequence with Python. A text sequence has to be encoded into numbers in order to be processed by a neural network. There has to be a function named encode declared that takes two arguments. The first argument being a list of words that is encoded. The second argument being a dictionary.
def encode(words,values):
seq = []
for word in words:
seq.append(values[word])
return seq
A list of words is declared containing words to be encoded.
words = ["maple", "walnut", "maple", "oak", "walnut"]
The variable unique is set to a list function that takes the function set(words) as an argument. The set function eliminates all duplicates from a list.
unique = list(set(words))
A dictionary called dic is declared that will hold all of the words encoding values.
dic = {word: i + 1 for i, word in enumerate(unique)}
The variable result holds the result of the encode function. After that, the results are printed to the screen.
result = encode(unique,dic)
print(result)
This is what the whole source code looks like.
def encode(words,values):
seq = []
for word in words:
seq.append(values[word])
return seq
words = ["maple", "walnut", "maple", "oak", "walnut"]
unique = list(set(words))
dic = {word: i + 1 for i, word in enumerate(unique)}
result = encode(unique,dic)
print(result)
