연관성분석 transactions class

2022-03-13 1 분 소요

연관성 분석에 사용되는 데이터 유형은 idi형태가 아닌 itl 형태의 데이타가 사용된다.

데이터의 형태 비교

data frame
Id	Data
001	apple
001	orange
001	banna
002	carrot
002	pickle

transaction data frame
Id	Data
001	{apples, orange, banna}
002	{carrot, pickle}
003	...
004
005

통계 패키지 R에서는 read.transactions 형태로 읽어 들이거나, as(data,”transactions”)를 가지고 변형 해서 사용한다.

데이터의 형태 변형방법

1) 첫번재 Case ( 리스트를 transaction 데이터로 변형 ) 예제

> buylist <- list( c("우유","버터","시리얼") , c("우유","시리얼"),  c("우유","빵"), c("버터","맥주","오징어") )
> buylist <- as(buylist,"transactions")
> inspect(buylist)
    items               
[1] {버터, 시리얼, 우유}
[2] {시리얼, 우유}      
[3] {빵, 우유}          
[4] {맥주, 버터, 오징어}

2) 두번재 Case ( 파일첨부 )

※ 파일형태: 세로ID/ 가로 상품 Dummy 명칭 나열 형태

> tmpath <- file( 'transact_ex.csv' , encoding = "EUC-KR")  #Save with Encoding... 세팅은 ECU-KR 로 되어 있어야 함
> buylist <- read.transactions( tmpath , format="basket" ,sep =",", header=TRUE ) # unix에서 한글이 깨지는 경우 처리 방법
> inspect(buylist)
    items         
[1] {배, 사과}    
[2] {감}          
[3] {배, 사과}    
[4] {감, 사과}    
[5] {감, 배}      
[6] {감, 배, 사과}
[7] {감, 배, 사과}

3) 세번재 Case ( 파일첨부 )

※ 파일형태: ID는 중복되며 상품을 세로 형태로 배치

> buylist <- read.transactions( 'transact_ex2.csv' , format="single", cols=c(1,2) ,sep =",", rm.duplicates=T ,header=TRUE )
> inspect(buylist)
    items          transactionID
[1] {사과}         1            
[2] {배, 사과}     2            
[3] {감}           3            
[4] {배, 사과}     4            
[5] {감, 사과}     5            
[6] {감, 배}       6            
[7] {감, 배, 사과} 7            
[8] {감, 배, 사과} 8

4) 네번재 Case ( 파일첨부 )

※ 파일형태: ID는 unique 상품데이터를 T/F boolean 형태로 저장 , 이 경우는 read.transactions 으로 안읽어짐

> tempfile <- read.csv('transact_ex3.csv', fileEncoding = "EUC-KR", header = TRUE)
> buylist <- as(tempfile[2:4], "transactions")
> inspect(buylist)
    items          transactionID
[1] {사과}         1            
[2] {사과, 배}     2            
[3] {감}           3            
[4] {사과, 배}     4            
[5] {사과, 감}     5            
[6] {배, 감}       6            
[7] {사과, 배, 감} 7            
[8] {사과, 배, 감} 8

[참고] T/F 상품데이터를 리스트 형으로 변경해서 as(data,”transactions”) 하는 방법

> totaltemp <- list()
> for( i in 1:nrow(tempfile) ) {
   x <- c()
   for(j in 2:ncol(tempfile)) { 
     if ( tempfile[i,j] == TRUE) {
       x <- c(x,names(tempfile[j]))
     }
   }
   totaltemp[[i]] = x
 }
buylist <- as(totaltemp,"transactions")

Twitter Facebook LinkedIn

연관성분석 transactions class

데이터의 형태 비교

데이터의 형태 변형방법

공유하기

댓글남기기

참고

electron 에서 sqlite3설치후 Cannot find module node_sqlite3.node 오류 발생시 해결 방법

텍스트마이닝 토픽모델, LDA(Latent Dirichlet Allocation)

[d3.js] d3.js 오류

[hive.connect] thrift.transport.TTransport.TTransportException 오류 발생