我有这段代码,可以针对一个数据帧运行。但是,我希望能够在数据帧列表上循环它。
这是基本代码:
# Run RFM Analysis on df_0
df_1 <- rfm_table_order(df_0, customer = customer_id, order_date = txn_date, revenue = txn_price, analysis_date = analysis_date,
recency_bins=3, frequency_bins=3, monetary_bins=3)
df_2 <- as.data.frame(df_1$rfm)
# Add weighting to the scores
df_2$finalscore <- (df_2 $recency_score*3 + df_2 $frequency_score*2 + df_2 $monetary_score*3)/8
# Add labels according to weighted score
df_2<- df_2 %>%
mutate(segment = case_when(
.$finalscore >= 2.5 ~ "Loyal",
.$finalscore <= 1.5 ~ "Lapsed",
TRUE ~ "Regular"
))
# Add the analysis date
df_2$analysis_date <- rep(analysis_date,nrow(df_2))
# Output the final dataset with required rows
df_final <- df_2[,c("customer_id","segment","analysis_date")]
df_0看起来像这样:
customer_id txn_date txn_price category
123 01/01/2019 12 a
456 01/02/2019 7 b
...
运行以上代码后,df_final如下所示:
customer_id segment analysis_date
123 Loyals 01/05/2019
456 Loyals 01/05/2019
...
我想看看如果使用类别作为因素,结果将有何不同。因此,我制作了一个数据框列表。
cat_list <- split(df_0, as.factor(df_0$category))
我需要添加一个针对数据框列表运行的循环。循环的最后一步还应将数据帧的名称附加到结果中。
所需的输出:
customer_id segment category analysis_date
123 Loyals a 01/05/2019
456 Loyals b 01/05/2019
...
简单地概括一下将数据帧作为输入并运行by
(大致相当于split
+ lapply
)的过程,即可按类别对主要数据帧进行子集化,并将子集传递给函数。还考虑within
和ifelse
用于添加需要的列(基R或的tinyverse版本mutate
和case_when
)
功能
my_func <- function(sub_df) {
# Run RFM Analysis on df
df_1 <- rfm_table_order(sub_df, customer = customer_id, order_date = txn_date,
revenue = txn_price, analysis_date = analysis_date,
recency_bins=3, frequency_bins=3, monetary_bins=3)
df_2 <- within(as.data.frame(df_1$rfm), {
# Add weighting to the scores
finalscore <- (recency_score*3 + frequency_score*2 + monetary_score*3)/8
# Add labels according to weighted score
segment <- ifelse(finalscore >= 2.5, "Loyal",
ifelse(finalscore <= 1.5, "Lapsed", "Regular")
)
# Add the analysis date
analysis_date <- analysis_date
# Add category
category <- sub_df$category[[1]]
})
# Output the final dataset with required rows
df_final <- df_2[,c("customer_id", "segment", "category", "analysis_date")]
return(df_final)
}
呼叫
cat_list <- by(df_0, df_0$category, my_func)
# cat_list <- lapply(split(df_0, df_0$category), my_func)
本文收集自互联网,转载请注明来源。
如有侵权,请联系[email protected] 删除。
我来说两句