This blog post will explain how to tokenize a string with Python.
First, the regular expressions library has to be loaded.
import re
Then a string variable is declared.
s = "How's it going?"
The variable tokens will hold the result of the re.sub function with the functions lower and split appended to it. The re.sub function will strip a string of all punctuation marks. The lower function will turn all characters in the string to lowercase. The split function will convert the string into a list of tokens or words using a space as a delimiter.
tokens = re.sub(r'[^\w\s]', '', s).lower().split(' ')
Then the tokens are printed out.
print(tokens)
This is what the whole code looks like.
import re
s = "How's it going?"
tokens = re.sub(r'[^\w\s]', '', s).lower().split(' ')
print(tokens)