How to sort a Dataframe by the ocurrences in a column in Python (pandas)

user5103234

I'm trying to create a dataframe from my data (scores between chemicals and proteins) with pandas in python.

I want my dataframe to first display the proteins that have the most occurences, so I previously sorted my data. But when I make the dataframe it does not get the expected result.

Here's a sample of my data :

chemicals   prots   scores
CID000000006    10116.ENSRNOP00000003921    196
CID000000051    10116.ENSRNOP00000003921    246
CID000000085    10116.ENSRNOP00000003921    196
CID000000119    10116.ENSRNOP00000003921    247
CID000000134    10116.ENSRNOP00000008952    159
CID000000135    10116.ENSRNOP00000008952    157
CID000000174    10116.ENSRNOP00000008952    439
CID000000175    10116.ENSRNOP00000001021    858
CID000000177    10116.ENSRNOP00000004027    760

As you can see "10116.ENSRNOP00000003921" is the protein with the most occurences in my data.

So i'd like to get something like :

             10116.ENSRNOP00000003921     10116.ENSRNOP00000008952  
CID000000006   196                 
CID000000051   246 
CID000000085   196 
CID000000119   247 
CID000000134                                  159   
CID000000135                                  157   
CID000000174                                  439

And Here's my code :

import pandas as pd

df_rat= pd.read_csv("dt_matrix_rat.csv",sep="\t", header=True)
df_rat.columns = ['chemicals','proteins','scores']
df_rat1 = df_rat.pivot(index='chemicals', columns='proteins', values='scores')

df_rat1.to_csv("rat_matrix.csv", sep='\t', index=True  )
jezrael

I think you need sort_values of notnull of sum and get index to cols. Lasy use subset:

df1 = df.pivot(index='chemicals', columns='proteins', values='scores')

cols = df1.notnull().sum(axis=0).sort_values(ascending=False).index
print cols
Index([u'10116.ENSRNOP00000003921', u'10116.ENSRNOP00000008952',
       u'10116.ENSRNOP00000004027', u'10116.ENSRNOP00000001021'],
      dtype='object', name=u'proteins')

print df1[cols]
proteins      10116.ENSRNOP00000003921  10116.ENSRNOP00000008952  \
chemicals                                                          
CID000000006                     196.0                       NaN   
CID000000051                     246.0                       NaN   
CID000000085                     196.0                       NaN   
CID000000119                     247.0                       NaN   
CID000000134                       NaN                     159.0   
CID000000135                       NaN                     157.0   
CID000000174                       NaN                     439.0   
CID000000175                       NaN                       NaN   
CID000000177                       NaN                       NaN   

proteins      10116.ENSRNOP00000004027  10116.ENSRNOP00000001021  
chemicals                                                         
CID000000006                       NaN                       NaN  
CID000000051                       NaN                       NaN  
CID000000085                       NaN                       NaN  
CID000000119                       NaN                       NaN  
CID000000134                       NaN                       NaN  
CID000000135                       NaN                       NaN  
CID000000174                       NaN                       NaN  
CID000000175                       NaN                     858.0  
CID000000177                     760.0                       NaN  

Or reindex_axis:

print df1.reindex_axis(cols, axis=1)
proteins      10116.ENSRNOP00000003921  10116.ENSRNOP00000008952  \
chemicals                                                          
CID000000006                     196.0                       NaN   
CID000000051                     246.0                       NaN   
CID000000085                     196.0                       NaN   
CID000000119                     247.0                       NaN   
CID000000134                       NaN                     159.0   
CID000000135                       NaN                     157.0   
CID000000174                       NaN                     439.0   
CID000000175                       NaN                       NaN   
CID000000177                       NaN                       NaN   

proteins      10116.ENSRNOP00000004027  10116.ENSRNOP00000001021  
chemicals                                                         
CID000000006                       NaN                       NaN  
CID000000051                       NaN                       NaN  
CID000000085                       NaN                       NaN  
CID000000119                       NaN                       NaN  
CID000000134                       NaN                       NaN  
CID000000135                       NaN                       NaN  
CID000000174                       NaN                       NaN  
CID000000175                       NaN                     858.0  
CID000000177                     760.0                       NaN  

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

How to count ocurrences in a column dataframe Python

From Dev

How to custom sort two pandas dataframe column in Python?

From Dev

sort and count values in a column DataFrame (Python Pandas)

From Java

how to sort pandas dataframe from one column

From Dev

Pandas: how to sort dataframe by column AND by index

From Dev

Pandas Dataframe sort by a column

From Dev

How to sort DataFrame by string column with repeating values based on own idea of sort in Python Pandas?

From Dev

Python, how to sort dataframe by string column in ascending

From Dev

How to sort dataframe based on a column in another dataframe in Pandas?

From Dev

With pandas, how to create a table of average ocurrences, using more than one column?

From Dev

How to groupby column for keep data in new dataframe and sort by datetime in pandas with Python 2.7

From Dev

Sort a pandas DataFrame by a column in another dataframe - pandas

From Java

Sort Pandas Dataframe by substrings of a column

From Dev

Sort lists in a Pandas Dataframe column

From Dev

Sort dataframe by first column, Pandas

From Dev

Pandas DataFrame Sort every Column

From Dev

sort pandas DataFrame with a column with list

From Dev

Pandas: How to sort dataframe on columns with same column labels

From Dev

Pandas: How to sort dataframe rows by date of one column

From Dev

How to sort a pandas dataframe by a column that has both numbers and strings?

From Dev

How to sort column names in pandas dataframe by specifying keywords

From Dev

How to build a sequence based on timestamps ocurrences with Python?

From Dev

Pandas Dataframe: Sort list column in dataframe

From Python

Pandas(Python) - How to sort pandas dataframe by two date columns

From Java

Pandas (python): How to add column to dataframe for index?

From Java

Pandas (python): How to add column to dataframe for index?

From Dev

How to filter python pandas dataframe column by date

From Dev

How to slice column values in Python pandas DataFrame

From Dev

how to fill an empty dataframe column in pandas python

Related Related

  1. 1

    How to count ocurrences in a column dataframe Python

  2. 2

    How to custom sort two pandas dataframe column in Python?

  3. 3

    sort and count values in a column DataFrame (Python Pandas)

  4. 4

    how to sort pandas dataframe from one column

  5. 5

    Pandas: how to sort dataframe by column AND by index

  6. 6

    Pandas Dataframe sort by a column

  7. 7

    How to sort DataFrame by string column with repeating values based on own idea of sort in Python Pandas?

  8. 8

    Python, how to sort dataframe by string column in ascending

  9. 9

    How to sort dataframe based on a column in another dataframe in Pandas?

  10. 10

    With pandas, how to create a table of average ocurrences, using more than one column?

  11. 11

    How to groupby column for keep data in new dataframe and sort by datetime in pandas with Python 2.7

  12. 12

    Sort a pandas DataFrame by a column in another dataframe - pandas

  13. 13

    Sort Pandas Dataframe by substrings of a column

  14. 14

    Sort lists in a Pandas Dataframe column

  15. 15

    Sort dataframe by first column, Pandas

  16. 16

    Pandas DataFrame Sort every Column

  17. 17

    sort pandas DataFrame with a column with list

  18. 18

    Pandas: How to sort dataframe on columns with same column labels

  19. 19

    Pandas: How to sort dataframe rows by date of one column

  20. 20

    How to sort a pandas dataframe by a column that has both numbers and strings?

  21. 21

    How to sort column names in pandas dataframe by specifying keywords

  22. 22

    How to build a sequence based on timestamps ocurrences with Python?

  23. 23

    Pandas Dataframe: Sort list column in dataframe

  24. 24

    Pandas(Python) - How to sort pandas dataframe by two date columns

  25. 25

    Pandas (python): How to add column to dataframe for index?

  26. 26

    Pandas (python): How to add column to dataframe for index?

  27. 27

    How to filter python pandas dataframe column by date

  28. 28

    How to slice column values in Python pandas DataFrame

  29. 29

    how to fill an empty dataframe column in pandas python

HotTag

Archive