Rで複数の変数にわたってテーブル関数を実行し、関数を使用して結果を新しいデータセットにコンパイルするにはどうすればよいですか？

debugcn 投稿 Dev

ellejay

約100個の変数のデータセットがあり、これらの変数の約30個の要約を含むテーブルを作成したいと思います。これを行うために、これらの変数に対してテーブルと他の関数を手動で実行し、結果をバインドしました。ただし、30以上の変数に対してこれを行う必要があるため、関数を使用してプロセスを自動化したいと思います。

データセットの例を次に示します。


df <- data.frame(v1=c('a','b','c','c','b'),v2=c('d','d','e','e','e'),cat=c('1low','1low','2med','3high','2med'))

目標は、以下のようなテーブルを作成することです（NAなし）。ファイナルテーブルの例

以下は私のコードです：

library(formattable)

# For var1 & var2, apply the table function and convert to dataframe so that the row labels are incorporated into dataset
var1.df <- as.data.frame(table(df$v1, df$cat))

# reshape to achive wide format (goal to view the count of each var1 level across low, med, high cats)
var1.df <- reshape(var1.df, idvar = "Var1", timevar = "Var2", direction = "wide")

# add col names
names(var1.df) <- c("vcat","low","med","high"); var1.df

# repeat above steps for next variable. in true dataset, I will need to repeat for 30 vars...
var2.df <- as.data.frame(table(df$v2, df$cat))
var2.df <- reshape(var2.df, idvar = "Var1", timevar = "Var2", direction = "wide")
names(var2.df) <- c("vcat","low","med","high")

# Create variable headings
var1.heading <- data.frame("variable 1",NA,NA,NA) # ideally, the NAs are blanks
names(var1.heading) <- c("vcat","low","med","high")

var2.heading <- data.frame("variable 2","","","")
names(var2.heading) <- c("vcat","low","med","high")

# Rbind the category headings and the table result data
table01 <- do.call("rbind", list(var1.heading, var1.df, 
                                 var2.heading, var2.df))

# Format the table for presentation
heading.list <- c("variable 1", "variable 2")
x <- formattable(table01, 
                 align =c("l","c","c","c","c"),
                 list(vcat = formatter("span", style = x ~ ifelse(x %in% heading.list, 
                                                                  style(font.weight = "bold"), NA))))

上記のコードを自動化するための以下の試みは、不完全（a）であるか、正しく実行されません（b）

# (a)
lapply(df, function(x) as.data.frame(table(x, df$cat)))

# (b)
myfxn <- function(x){
  y <- as.data.frame(table(x, df$cat))
  y <- reshape(y, idvar = "x", timevar = "Var2", direction = "wide")
  names(y) <- c("vcat","low","med","high")
}
lapply(df, myfxn(x))

さらにいくつかの変数についてこのプロセスを自動化する方法について何か提案はありますか？また、1行のデータフレームの挿入を手動で作成する以外に、テーブルにカテゴリ見出しを挿入する別の方法はありますか？これは最初のデータフレームであるため、var1.headingにNAを挿入したことに注意してください。空白の代わりに「」を挿入しようとすると（var2.headingなど）、後続のデータフレームは文字ではなく因子変数であるためバインドされませんでした。事前にどうもありがとうございました！

rawr

それは非常に近かったので、私はあなたのbの試みから始めます。形を変えている唯一の理由はdata.frame(table()、クラス「テーブル」をtablesから削除した場合に行う必要がないためだと思います。

また、関数内の1つの変数の操作全体を完了しようとします。つまり、ヘッダーやラベルなどを追加します。このようにして、単一の変数で関数をテストして、目的どおりに機能していることを確認してから、すべての変数をループします。

# (b)
myfxn <- function(x, header = 'variable') {
  y <- unclass(table(x, df$cat))
  colnames(y) <- gsub('\\d', '', colnames(y))
  y <- data.frame(vcat = rownames(y), y, stringsAsFactors = FALSE)
  rbind(c(header, rep('', ncol(y) - 1)), y)
}

myfxn(df$v1)
#       vcat low med high
# 1 variable             
# a        a   1   0    0
# b        b   1   1    0
# c        c   0   1    1

次に、Mapまたはのmapply代わりにlapply複数の引数をに渡しますmyfxn

l <- Map(myfxn, df[-3], heading.list)

formattable(
  do.call('rbind', l), row.names = FALSE,
  align = c('l', rep('c', nlevels(df$cat))),
  list(
    vcat = formatter('span', style = x ~ ifelse(x %in% heading.list, style(font.weight = 'bold'), NA))
  )
)

## apply for 30 variables
heading.list <- sprintf('variable %s', 1:30)
l <- Map(myfxn, df[sample(1:2, 30, TRUE)], heading.list)

formattable(
  do.call('rbind', l), row.names = FALSE,
  align = c('l', rep('c', nlevels(df$cat))),
  list(
    vcat = formatter('span', style = x ~ ifelse(x %in% heading.list, style(font.weight = 'bold'), NA))
  )
)

この記事はインターネットから収集されたものであり、転載の際にはソースを示してください。

侵害の場合は、連絡してください[email protected]

編集2021-06-12

コメントを追加

サインイン

Related 関連記事

記事