dedtech.info

Information about computer technology.

Load A Bigram Language Model From A CSV File With Python

This blog post will explain how to load a bigram language model with Python. A language model is something that can be used to generate random text. It is easy to store a simple statistical language model in a csv file.

This is what the csv file looks like.

word1,word2,count
natural,language,1
language,processing,1
processing,can,1
can,be,2
be,fun,1
fun,language,1
language,models,1
models,can,1
be,statistical,1
statistical,or,1
or,neural,1

The csv library has to be imported for the program to be able to read a csv file. The defaultdict library from collections has to be imported so the program can store the language model that is read from a csv file.

import csv
from collections import defaultdict

An empty defaultdict dictionary is declared to hold the language model that is being extracted from the csv file.

model = defaultdict(dict)

A with file loop will open a csv file. A function named dictreader will read all the rows from a csv file. The rows are traversed and the language model is built.

with open("bigram_model.csv", mode='r', encoding='utf-8') as file:
    rows = csv.DictReader(file)
    for row in rows:
        word1 = row["word1"]
        word2 = row["word2"]
        count = row["count"]
        model[word1][word2] = count

The language model is converted into a dict.

model = dict(model)

A search word will need to be used.

search_word = "models"

The variable next will be equal to the result of the model.get function that uses the search word.

next = model.get(search_word, {})

The matches are printed out.

for word, count in sorted(next.items()):
    print(word,count)

This is what the whole source code looks like.

import csv
from collections import defaultdict

model = defaultdict(dict)

with open("bigram_model.csv", mode='r', encoding='utf-8') as file:
    rows = csv.DictReader(file)
    for row in rows:
        word1 = row["word1"]
        word2 = row["word2"]
        count = row["count"]
        model[word1][word2] = count
            
model = dict(model)

search_word = "models"

next = model.get(search_word, {})

for word, count in sorted(next.items()):
    print(word,count)

Leave a Reply

Your email address will not be published. Required fields are marked *