44일차.

conda activate torch313

로 입력하고 커서켜기

https://parksrazor.tistory.com/93

파이썬/자연어/2020-05-09/ 삼성 2018 보고서 분석하기

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 8

parksrazor.tistory.com

데이터 파일 압축 풀고 세개파일 넣기

엠마 부분에서 def __init__ 이부분 긁어 오기

def extract_noun(self):

# 삼성전자의 스마트폰은 -> 삼성전자 스마트폰

noun_tokens = []

tokens = self.change_token(self.extract_hangeul(http://self.read_file()))

for i in tokens:

pos = self.okt.pos(i)

temp = [j[0] for j in pos if j[1] == 'Noun']

if len(''.join(temp)) > 1 :

noun_tokens.append(''.join(temp))

texts = ' '.join(noun_tokens)

ic(texts[:100])

return texts

def read_stopword(self):

self.okt.pos("삼성전자 글로벌센터 전자사업부", stem=True)

fname = './data/stopwords.txt'

with open(fname, 'r', encoding='utf-8') as f:

stopwords = http://f.read()

return stopwords

def remove_stopword(self):

texts = self.extract_noun()

tokens = self.change_token(texts)

# print('------- 1 명사 -------')

# print(texts[:30])

stopwords = http://self.read_stopword()

# print('------- 2 스톱 -------')

# print(stopwords[:30])

# print('------- 3 필터 -------')

texts = [text for text in tokens

if text not in stopwords]

# print(texts[:30])

return texts

def find_freq(self):

texts = self.remove_stopword()

freqtxt = pd.Series(dict(FreqDist(texts))).sort_values(ascending=False)

ic(freqtxt[:30])

return freqtxt

def draw_wordcloud(self):

texts = self.remove_stopword()

wcloud = WordCloud('./data/D2Coding.ttf', relative_scaling=0.2,

background_color='white').generate(" ".join(texts))

plt.figure(figsize=(12, 12))

plt.imshow(wcloud, interpolation='bilinear')

plt.axis('off')

http://plt.show()

강사님거 랩장에서 붙여넣음

내거 인트가 좀 달라서 민솔씨거 붙여넣음

class SamsungWordCloud:

"""

Generate a word cloud from the Gutenberg "Emma" corpus.

The class downloads required NLTK resources on first use and

produces a word cloud image file based on proper nouns frequency.

"""

def __init__(self, quiet: bool = True):

"""

초기화 메서드

Args:

quiet: NLTK 다운로드 시 출력 여부 (기본값: True)

"""

# NLTK 데이터 다운로드 (word_tokenize 사용을 위해 필요)

try:

nltk.download('punkt', quiet=quiet)

nltk.download('punkt_tab', quiet=quiet) # 최신 NLTK 버전에서 필요

nltk.download('stopwords', quiet=quiet)

except Exception as e:

# 다운로드 실패 시 경고만 출력하고 계속 진행

import warnings

warnings.warn(f"NLTK 리소스 다운로드 중 오류 발생: {e}")

self.okt = Okt()

임포트 해줘야함 탭으로

@ai.minsol.kr/mlservice/app/nlp/data/kr-Report_2018.txt 여기에서 한국어 BoW 를 만드는데 사용할 국어사전 라이브러리를 다운 받고 싶은데, 어느 것을 추천하고, 한 번 설정해서 지속적으로 사용할 수 있는 전략을 알려줘

지금은 Okt 사용해서 필요 없음 나중에 회사가서 참고

글자깨지면

d2코딩 참고해서 해줘

@ai.seoeunjin.com/mlservice/app/nlp/samsung/samsung_wordcloud.py @ai.seoeunjin.com/mlservice/app/nlp/save 여기에서 save 폴더에 저장할 경로를 추가로 코딩해줘

@ai.minsol.kr/mlservice/app/nlp/nlp_router.py 여기에 @ai.minsol.kr/mlservice/app/nlp/samsung/samsung_wordcloud.py 이 메소드를 호출하여 localhost:8080/api/ml~~service~~/nlp/samsung에서 작동하도록 코딩해주고, 워드클라우드로 생성된 파일을 @ai.minsol.kr/mlservice/app/nlp/save 여기에 저장되도록 해줘

@ai.seoeunjin.com/mlservice/app/nlp/nlp_router.py 여기에 @ai.seoeunjin.com/mlservice/app/nlp/samsung/samsung_wordcloud.py 이 메소드를 호출하여 localhost:8080/api/ml/samsung에서 작동하도록 코딩해주고, 워드클라우드로 생성된 파일을 @ai.seoeunjin.com/mlservice/app/nlp/save 여기에 저장되도록 해줘

사용 방법:

GET localhost:8080/api/mlservice/nlp/samsung - 기본 경로(app/nlp/save/samsung_wordcloud.png)에 저장

GET localhost:8080/api/mlservice/nlp/samsung?save=경로/파일명.png - 지정한 경로에 저장

중요: 서버를 재시작해야 변경사항이 적용됩니다. 재시작 후 Postman에서 localhost:8080/api/mlservice/nlp/samsung을 호출하면 파일이 app/nlp/save/samsung_wordcloud.png에 저장됩니다.

1) mlservice만 재빌드/재시작

conda activate torch313

torch313 들어가서 도커컴포즈업!!

폴더와 파일 생성하고 코퍼스에 데이터 50개 정도 다운받기

https://github.com/e9t/nsmc/

GitHub - e9t/nsmc: Naver sentiment movie corpus

Naver sentiment movie corpus. Contribute to e9t/nsmc development by creating an account on GitHub.

github.com

여기서 raw폴더에 json파일 50개 정도 다운 받기

https://huggingface.co/monologg/koelectra-small-v3-discriminator/tree/main

파일 4개 다운받기 .giattributes 는 필요없음

'Project ESG+AI > [삼정KPMG]ESG 데이터 활용 풀스텍 개발' 카테고리의 다른 글

46일차. (1)	2025.12.16
45일차. (0)	2025.12.15
43일차. 서울시 범죄 발생현황&자연어 처리 (0)	2025.12.11
42일차. (0)	2025.12.10
41일차. (1)	2025.12.09

Net:SEED Lab 🌱🤍

44일차.

사용 방법:

'Project ESG+AI > [삼정KPMG]ESG 데이터 활용 풀스텍 개발' 카테고리의 다른 글

티스토리툴바

44일차.

사용 방법:

'Project ESG+AI > [삼정KPMG]ESG 데이터 활용 풀스텍 개발' 카테고리의 다른 글

관련글

티스토리툴바