Extract Noun Phrases From Sentence With Python

This blog post will explain how to extract noun phrases from a sentence with python.

From the nltk library, pos_tag, and RegexpParser need to be imported. The re library has to be imported also. They are for part of speech tagging the sample text, breaking down the sample text into a list of words or tokens, and parsing the part of speech tagged sentence.

import re
from nltk import pos_tag, RegexpParser

This is the sample text that will be used for this example.

sample = "i saw the big dog on the hill"

Tokenize the sample text.

words = re.sub(r'[^\w\s]', '', sample).lower().split(' ')

Determine each words part of speech tag.

tagged = pos_tag(words)

The regexpparser uses a rule constructed using regular expressions.

chunker = RegexpParser("""
    NP: {<DT>?<JJ.*>*<NN.*>+}
""")

Declare an empty list to hold noun phrases extracted from the sample.

noun_phrases = []

Create a tree based on the rules of the regexpparser.

tree = chunker.parse(tagged)

Traverse the tree and construct noun phrases with tag info extracted from the sample.

for subtree in tree.subtrees():
    if subtree.label() == 'NP':
        np = " ".join(word for word, pos in subtree.leaves())
        noun_phrases.append(np)

Output results.

for np in noun_phrases:
    print(np)

This is what the whole source code looks like.

import re
from nltk import pos_tag, RegexpParser
 
sample = "i saw the big dog on the hill"
 
words = re.sub(r'[^\w\s]', '', sample).lower().split(' ')

tagged = pos_tag(words)
 
chunker = RegexpParser("""
    NP: {<DT>?<JJ.*>*<NN.*>+}
""")

noun_phrases = []

tree = chunker.parse(tagged)

for subtree in tree.subtrees():
    if subtree.label() == 'NP':
        np = " ".join(word for word, pos in subtree.leaves())
        noun_phrases.append(np)

for np in noun_phrases:
    print(np)

Post Views: 32

Extract Noun Phrases From Sentence With Python

Leave a Reply Cancel reply