How to apply a function ( BigramCollocationFinder) to Pandas DataFrame

slm Published at Dev

slm

i am not very used to programming and need some help to solve a problem. I have a .csv with 4 columns and about 5k rows, filled with questions and answers. I want to find word collocations in each cell.

Starting point: Pandas dataframe with 4 columns and about 5k rows. (Id, Title, Body, Body2)

Goal: Dataframe with 7 columns (Id, Title, Title-Collocations, Body, Body_Collocations, Body2, Body2-Collocations) and applied a function on each of its rows.

I have found an example for Bigramm Collocation in the NLTK Documentation.

bigram_measures = nltk.collocations.BigramAssocMeasures()
finder.apply_freq_filter(3)
finder = BigramCollocationFinder.from_words(nltk.corpus.genesis.words('english-web.txt'))
print (finder.nbest(bigram_measures.pmi, 5))
>>>[('Beer', 'Lahai'), ('Lahai', 'Roi'), ('gray', 'hairs'), ('Most', 'High'), ('ewe', 'lambs')]

I want to adapt this function to my Pandas Dataframe. I am aware of the apply function for Pandas Dataframes, but can't manage to get it work.

This is my test-approach for one of the columns:

df['Body-Collocation'] = df.apply(lambda df: BigramCollocationFinder.from_words(df['Body']),axis=1)

but if i print that out for an example row i get

print (df['Body-Collocation'][1])
>>> <nltk.collocations.BigramCollocationFinder object at 0x113c47ef0>

I am not even sure if this is the right way. Can someone point me to the right direction?

Stefan

If you want to apply BigramCollocationFinder.from_words() to each value in the Body `column, you'd have to do:

df['Body-Collocation'] = df.Body.apply(lambda x: BigramCollocationFinder.from_words(x))

In essence, apply allows you to loop through the rows and provide the corresponding value of the Body column to the applied function.

But as suggested in the comments, providing a data sample would make it easier to address your specific case.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at2021-02-24

Comments

0 comments

From Dev

Related Related

Article

How to apply a function ( BigramCollocationFinder) to Pandas DataFrame

How to apply a function ( BigramCollocationFinder) to Pandas DataFrame

pandas DataFrame, how to apply function to a specific column?

How to apply function to multiple pandas dataframe

How to apply function on a dataframe

How to apply a function to every value in a column in a pandas dataframe?

How to efficiently apply a function to each DataFrame of a Pandas Panel

Pandas: How to use applymap/apply function with arguements to a dataframe without looping

Apply function on each column in a pandas dataframe

pandas - apply UTM function to dataframe columns

Apply custom cumulative function to pandas dataframe

Pandas dataframe apply function to entire column

Apply similar function to multilevel pandas dataframe

Pandas dataframe apply function to entire column

looping into dates and apply function to pandas dataframe

Apply function to each cell in DataFrame multithreadedly in pandas

pandas apply function that returns multiple values to rows in pandas dataframe

How can I return multiple rows from a python function to a pandas dataframe using apply?

How do I apply my function which returns a pandas dataframe, to a range of inputs so it returns individual dataframes?

How can I return multiple rows from a python function to a pandas dataframe using apply?

How to apply a python function to splitted 'from the end' pandas sub-dataframes and get a new dataframe?

Pandas how to apply multiple functions to dataframe

How to apply LabelEncoder for a specific column in Pandas dataframe

Pandas how to apply multiple functions to dataframe

Using a shift() function within an apply function to compare rows in a Pandas Dataframe

How to apply a function on every row on a dataframe?

How to apply xts function per row in a dataframe

How to apply a function to a column of a Spark DataFrame?

How to apply a function that creates columns to dataframe in R

How to apply my function to the first row of a dataframe?

How to apply xts function per row in a dataframe