我正在JupyterLab中使用python 3.7和pandas并尝试使用reindex在MultiIndex表中添加缺失的行(index应该是具有Col1和Col2元素的产品)。
df2 = pd.DataFrame({'Col1': [10, 11, 12, 12, 12, 11,],
'Col2': ['2012', '2012', '2013', '2014', '2015', '2012'],
'result': [1, 1, 1, 1, 1, 1]})
start, end = df2['Col1'].min(), df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start, end+1), Col2_values ], names=['Col1', 'Col2'],)
df2 = df2.reindex( newIndex, columns=['result'] )
print(df2)
它显示的是:
result
Col1 Col2
10 2012 NaN
2013 NaN
有人可以告诉我如何保存列结果的值吗?
使用DataFrame.set_index
了MultiIndex
来自col1
和col2
针对reindex
通过MultiIndex
,否则reindex
使用默认值RangeIndex
(0-6
),没有价值的比赛,并得到所有NaN
S是输出列:
#last year is 2016
df2 = pd.DataFrame({'Col1': [10, 11, 12, 12, 12, 11,],
'Col2': ['2012', '2012', '2013', '2014', '2015', '2016'],
'result': [1, 1, 1, 1, 1, 1]})
start, end = df2['Col1'].min(), df2['Col1'].max()
Col2_values = df2['Col2'].sort_values().unique()
newIndex = pd.MultiIndex.from_product( [range(start, end+1), Col2_values ],
names=['Col1', 'Col2'],)
print (newIndex)
MultiIndex([(10, '2012'),
(10, '2013'),
(10, '2014'),
(10, '2015'),
(10, '2016'),
(11, '2012'),
(11, '2013'),
(11, '2014'),
(11, '2015'),
(11, '2016'),
(12, '2012'),
(12, '2013'),
(12, '2014'),
(12, '2015'),
(12, '2016')],
names=['Col1', 'Col2'])
df2 = df2.set_index(['Col1','Col2']).reindex( newIndex, columns=['result'] )
print(df2)
result
Col1 Col2
10 2012 1.0
2013 NaN
2014 NaN
2015 NaN
2016 NaN
11 2012 1.0
2013 NaN
2014 NaN
2015 NaN
2016 1.0
12 2012 NaN
2013 1.0
2014 1.0
2015 1.0
2016 NaN
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句