i am not very used to programming and need some help to solve a problem. I have a .csv with 4 columns and about 5k rows, filled with questions and answers. I want to find word collocations in each cell.
Starting point: Pandas dataframe with 4 columns and about 5k rows. (Id, Title, Body, Body2)
Goal: Dataframe with 7 columns (Id, Title, Title-Collocations, Body, Body_Collocations, Body2, Body2-Collocations) and applied a function on each of its rows.
I have found an example for Bigramm Collocation in the NLTK Documentation.
bigram_measures = nltk.collocations.BigramAssocMeasures()
finder.apply_freq_filter(3)
finder = BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
print (finder.nbest(bigram_measures.pmi, 5))
>>>[('Beer', 'Lahai'), ('Lahai', 'Roi'), ('gray', 'hairs'), ('Most', 'High'), ('ewe', 'lambs')]
I want to adapt this function to my Pandas Dataframe. I am aware of the apply function for Pandas Dataframes, but can't manage to get it work.
This is my test-approach for one of the columns:
df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Body']),axis=1)
but if i print that out for an example row i get
print (df['Body-Collocation'][1])
>>> <nltk.collocations.BigramCollocationFinder object at 0x113c47ef0>
I am not even sure if this is the right way. Can someone point me to the right direction?
If you want to apply BigramCollocationFinder.from_words()
to each value
in the Body
`column, you'd have to do:
df['Body-Collocation'] = df.Body.apply(lambda x: BigramCollocationFinder.from_words(x))
In essence, apply
allows you to loop through the rows
and provide the corresponding value
of the Body
column
to the applied function.
But as suggested in the comments, providing a data sample would make it easier to address your specific case.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments