如何在pandas应用中使用返回numpy数组的函数?

我有一个数据框,看起来像这样。

import pandas as pd
df_dict = {'var1': {(1, 1.0, 'obj1'): 1.0, (1, 1.0, 'obj4'): 1.0, (1, 1.0, 'obj3'): 2.0, (1, 1.0, 'obj5'): 2.0, (1, 1.0, 'obj2'): 3.0, (1, 2.0, 'obj1'): 1.0, (1, 2.0, 'obj4'): 1.0, (1, 2.0, 'obj3'): 2.0, (1, 2.0, 'obj5'): 2.0, (1, 2.0, 'obj2'): 3.0, (1, 3.0, 'obj1'): 1.0, (1, 3.0, 'obj4'): 1.0, (1, 3.0, 'obj3'): 2.0, (1, 3.0, 'obj5'): 2.0, (1, 3.0, 'obj2'): 3.0, (1, 4.0, 'obj1'): 1.0, (1, 4.0, 'obj4'): 1.0, (1, 4.0, 'obj3'): 2.0, (1, 4.0, 'obj5'): 2.0, (1, 4.0, 'obj2'): 3.0}, 'var2': {(1, 1.0, 'obj1'): -0.9799804687499858, (1, 1.0, 'obj4'): 0.009998139880948997, (1, 1.0, 'obj3'): -1.0299944196428612, (1, 1.0, 'obj5'): 0.029994419642846992, (1, 1.0, 'obj2'): 1.9999999999999574, (1, 2.0, 'obj1'): -1.0200195312500426, (1, 2.0, 'obj4'): 0.07001023065477341, (1, 2.0, 'obj3'): -0.6900111607143344, (1, 2.0, 'obj5'): -0.03999255952379599, (1, 2.0, 'obj2'): 1.9400111607142634, (1, 3.0, 'obj1'): -1.0599888392857082, (1, 3.0, 'obj4'): 0.1399972098214164, (1, 3.0, 'obj3'): -0.36002604166661456, (1, 3.0, 'obj5'): -0.12002418154757777, (1, 3.0, 'obj2'): 1.8699776785714306, (1, 4.0, 'obj1'): -1.09000651041665, (1, 4.0, 'obj4'): 0.1900111607142918, (1, 4.0, 'obj3'): -0.029994419642918047, (1, 4.0, 'obj5'): -0.2000093005952408, (1, 4.0, 'obj2'): 1.8099888392857366}, 'var3': {(1, 1.0, 'obj1'): 0.0, (1, 1.0, 'obj4'): -1.9899974149816302, (1, 1.0, 'obj3'): -0.020033892463189318, (1, 1.0, 'obj5'): -0.03999597886028994, (1, 1.0, 'obj2'): -0.029979032628659752, (1, 2.0, 'obj1'): 0.050012925091920124, (1, 2.0, 'obj4'): -1.999978458180145, (1, 2.0, 'obj3'): 0.19003475413597926, (1, 2.0, 'obj5'): 0.18996294806989056, (1, 2.0, 'obj2'): -0.029979032628730806, (1, 3.0, 'obj1'): 0.10002585018380472, (1, 3.0, 'obj4'): -2.03001134535846, (1, 3.0, 'obj3'): 0.3900146484375, (1, 3.0, 'obj5'): 0.41001263786760944, (1, 3.0, 'obj2'): -0.040031881893369814, (1, 4.0, 'obj1'): 0.1499669692095651, (1, 4.0, 'obj4'): -2.040010340073515, (1, 4.0, 'obj3'): 0.5999755859375, (1, 4.0, 'obj5'): 0.6100284352022101, (1, 4.0, 'obj2'): -0.05999396829039938}}
df = pd.DataFrame.from_dict(df_dict)

                                 var1      var2      var3
measurement_id repeat_id object                          
1              1.0       obj1     1.0 -0.979980  0.000000
                         obj4     1.0  0.009998 -1.989997
                         obj3     2.0 -1.029994 -0.020034
                         obj5     2.0  0.029994 -0.039996
                         obj2     3.0  2.000000 -0.029979
               2.0       obj1     1.0 -1.020020  0.050013
                         obj4     1.0  0.070010 -1.999978
                         obj3     2.0 -0.690011  0.190035
                         obj5     2.0 -0.039993  0.189963
                         obj2     3.0  1.940011 -0.029979
               3.0       obj1     1.0 -1.059989  0.100026
                         obj4     1.0  0.139997 -2.030011
                         obj3     2.0 -0.360026  0.390015
                         obj5     2.0 -0.120024  0.410013
                         obj2     3.0  1.869978 -0.040032
               4.0       obj1     1.0 -1.090007  0.149967
                         obj4     1.0  0.190011 -2.040010
                         obj3     2.0 -0.029994  0.599976
                         obj5     2.0 -0.200009  0.610028
                         obj2     3.0  1.809989 -0.059994

我想把var2用 scipy.signal.savgol_filter 但我需要对后续的对象进行这样的操作。所以我的调用是这样的。

import scipy.signal as signal
df.groupby(['measurement_id', 'object'])['var2'].apply(lambda x: signal.savgol_filter(x, window_length=3, polyorder=2))

measurement_id  object
1               obj1      [-0.9799804687499857, -1.0200195312500429, -1....
                obj2      [1.9999999999999565, 1.9400111607142636, 1.869...
                obj3      [-1.0299944196428608, -0.6900111607143345, -0....
                obj4      [0.009998139880949027, 0.07001023065477342, 0....
                obj5      [0.02999441964284698, -0.039992559523796, -0.1...
Name: var2, dtype: object

然而,作为输出的 savgol_filternp.ndarray我真的不知道如何正确地将输出分配为一个新的列。var4. 我试过用大熊猫 explode 但我仍然缺乏正确的赋值顺序,我有一个数据框,是这样的: import pandas as pd df_dict = { var1::}。

解决方案:

我想你需要 GroupBy.transform 用于将numpy数组转换为 Series:

df['var4'] = (df.groupby(['measurement_id', 'object'])['var2']
                .transform(lambda x: signal.savgol_filter(x, window_length=3, polyorder=2)))

另一种方法是创建自定义函数,并分配给新列。

import scipy.signal as signal

def func(x):
    x['var4'] = signal.savgol_filter(x['var2'], window_length=3, polyorder=2)
    return x

df = df.groupby(['measurement_id', 'object']).apply(func)

print (df)
                                 var1      var2      var3      var4
measurement_id repeat_id object                                    
1              1.0       obj1     1.0 -0.979980  0.000000 -0.979980
                         obj4     1.0  0.009998 -1.989997  0.009998
                         obj3     2.0 -1.029994 -0.020034 -1.029994
                         obj5     2.0  0.029994 -0.039996  0.029994
                         obj2     3.0  2.000000 -0.029979  2.000000
               2.0       obj1     1.0 -1.020020  0.050013 -1.020020
                         obj4     1.0  0.070010 -1.999978  0.070010
                         obj3     2.0 -0.690011  0.190035 -0.690011
                         obj5     2.0 -0.039993  0.189963 -0.039993
                         obj2     3.0  1.940011 -0.029979  1.940011
               3.0       obj1     1.0 -1.059989  0.100026 -1.059989
                         obj4     1.0  0.139997 -2.030011  0.139997
                         obj3     2.0 -0.360026  0.390015 -0.360026
                         obj5     2.0 -0.120024  0.410013 -0.120024
                         obj2     3.0  1.869978 -0.040032  1.869978
               4.0       obj1     1.0 -1.090007  0.149967 -1.090007
                         obj4     1.0  0.190011 -2.040010  0.190011
                         obj3     2.0 -0.029994  0.599976 -0.029994
                         obj5     2.0 -0.200009  0.610028 -0.200009
                         obj2     3.0  1.809989 -0.059994  1.809989

给TA打赏
共{{data.count}}人
人已打赏
未分类

如何发送非持久性值到云火库?

2022-9-8 17:36:21

未分类

通过不同变量的不同功能创建多列。

2022-9-8 17:47:35

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索