텍스트마이닝 예제 (R을 활용) - 2

2022-03-25 최대 1 분 소요

국문에 대해 간단한 텍스트 마이닝을 통해 워드 클라우드를 만드는 예제

1. 필요라이브러리

> library(KoNLP)
> library(RColorBrewer)
> library(wordcloud)

2. 데이터 읽기

> result <- file("tax.txt", encoding="UTF-8")
> result2 <- readLines(result)
> head(result2, 3)

-> 국문으로 된 소득공제 관련 예제문구 파일첨부

3. 필요없는 단어를 제거

> result2 <- gsub("and", "", result2)
> result2 <- gsub("of", "", result2)
> result2 <- gsub("is", "", result2)

4. 명사를 추출 및 확인



> result3 <- sapply(result2, extractNoun, USE.NAMES=F)
> head(unlist(result3), 20)

> write(unlist(result3), "tax_word.txt")
> myword <- read.table("tax_word.txt")
> nrow(myword)
> wordcount <- table(myword)
> head(sort(wordcount, decreasing=T), 20)

5. 추출된 명사를 그래프화

palete <- brewer.pal(9, "Set1")

wordcloud(
  names(wordcount),
  freq=wordcount,
  scale=c(5, 1),
  rot.per=0.5,
  min.freq=4,
  random.order=F,
  random.color=T,
  colors=palete
)

Twitter Facebook LinkedIn

electron 에서 sqlite3설치후 Cannot find module node_sqlite3.node 오류 발생시 해결 방법

2022-08-26 최대 1 분 소요

Electron Vue 환경에서 sqlite3 설치시 다음과 같은 에러가 발생하는 경우가 있다.

텍스트마이닝 토픽모델, LDA(Latent Dirichlet Allocation)

2022-08-20 2 분 소요

토픽모델이란 문서와 단어로 구성된 행렬(Document Term Matrix)를 기반으로 문서에 잠재된 토픽의 등장확률을 추정하는 기법으로 분석결과를 직관적이고 정교하게 보여주게 된다. 즉 토픽과 단어의 확률 분포를 바탕으로 새로운 문서를 생성하기 때문에 생성모델 이라고도 한다. ...

[d3.js] d3.js 오류

2022-08-20 최대 1 분 소요

d3 버전차이에 의한 오류

[hive.connect] thrift.transport.TTransport.TTransportException 오류 발생

2022-08-15 최대 1 분 소요

파이썬으로 hive.connect 오류 발생시 포트나 호스트 IP를 체크 해야 합니다. port 10000 번을 사용하기 때문에 10000번 포트가 열려 있는지 확인하세요.