๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Project ESG+AI/[์‚ผ์ •KPMG]ESG ๋ฐ์ดํ„ฐ ํ™œ์šฉ ํ’€์Šคํ… ๊ฐœ๋ฐœ

40์ผ์ฐจ.

by GreenJin_S2 2025. 12. 8.

@ai.seoeunjin.com/mlservice/app/titanic/train.csv ์„ DF๋กœ ์ „ํ™˜ํ•˜๋ ค๊ณ  ํ•ด @ai.seoeunjin.com/mlservice/app/titanic/titanic_method.py ์ด ์ฝ”๋“œ๋ฅผ ์—ฌ๊ธฐ์— pd.read_csv๋ฅผ ๋ฆฌํ„ดํ•˜๋Š” ๋ฉ”์†Œ๋“œ๋กœ ์ž‘์„ฑํ•ด์ค˜

 

 

@titanic_method.py (31-43) ์—ฌ๊ธฐ์— @titanic_method.py (16-30) ์ด DF์—์„œ Survived ๊ฐ’์„ ์ œ๊ฑฐํ•˜๋Š” ๊ฐ’์„ ๋ฆฌํ„ดํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•ด์ค˜

@titanic_method.py (47-59) ์—ฌ๊ธฐ์— @titanic_method.py (16-29) ์ด DF์—์„œ Survived์„ ๋ผ๋ฒจ๋งŒ ๋ถ„๋ฆฌ์‹œํ‚จ DF๋ฅผ ๋งŒ๋“ค์–ด์ค˜

 

@titanic_method.py (46-61) ์ด ๋ฉ”์†Œ๋“œ ๋ฐ‘์— ํ”ผ์ณ๋ฅผ ์‚ญ์ œํ•˜๋Š” ๋ฉ”์†Œ๋“œ์™€ ๋„์„ ์ฒดํฌํ•˜๋Š” ๋ฉ”์†Œ๋“œ๋ฅผ ์ด๋ฆ„๋งŒ ๋„ˆ๊ฐ€ ์ƒ์„ฑํ•˜์—ฌ ๋‚ด๋ถ€๋Š” pass๋กœ ๋งŒ๋“ค์–ด์ค˜

 

@titanic_method.py (67-69) df์—์„œ ๋„์˜ ๊ฐฏ์ˆ˜๋ฅผ ๋ฆฌํ„ดํ•˜๋Š” ๋ฉ”์†Œ๋“œ๋ฅผ ์ฝ”๋”ฉํ•ด์ค˜

 

 

 

 

    def preprocess(self):
        ic("์ „์ฒ˜๋ฆฌ ์‹œ์ž‘")
        the_method = TitanicMethod()
        ic(f'1. Train ์˜ type \n {type(this.train)} ')
        ic(f'2. Train ์˜ column \n {this.train.columns} ')
        ic(f'3. Train ์˜ ์ƒ์œ„ 1๊ฐœ ํ–‰\n {this.train.head()} ')
        ic(f'4. Train ์˜ null ์˜ ๊ฐฏ์ˆ˜\n {this.train.isnull().sum()}๊ฐœ')
        ic(f'5. Test ์˜ type \n {type(this.test)}')
        ic(f'6. Test ์˜ column \n {this.test.columns}')
        ic(f'7. Test ์˜ ์ƒ์œ„ 1๊ฐœ ํ–‰\n {this.test.head()}๊ฐœ')
        ic(f'8. Test ์˜ null ์˜ ๊ฐฏ์ˆ˜\n {this.test.isnull().sum()}๊ฐœ')
        ic("์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ")
 

์ด๋ถ€๋ถ„ ๋„ฃ๊ธฐ

 

 

from pathlib import Path
import pandas as pd
from app.titanic.titanic_dataset import TitanicDataSet
from icecream import ic

class TitanicMethod(object):

    def __init__(self):
        self.dataset = TitanicDataSet()

    def new_model(self, fname: str) -> pd.DataFrame:
        return pd.read_csv(fname)

    def create_train(self) -> pd.DataFrame:
        return self.new_model().drop(columns=['Survived'])

    def create_label(self) -> pd.DataFrame:
        return self.new_model()[['Survived']]

    def drop_feature(self, *feature: str) -> pd.DataFrame:
        df_train = self.create_train()
        feature_list = list(feature)
        df_dropped = df_train.drop(columns=feature_list)
        return df_dropped

    def null_check(self) -> int:
        ic('๐Ÿ” ๋ฐ์ดํ„ฐ ๊ฒฐ์ธก์น˜ ํ™•์ธ')
        df_train = self.create_train()
        null_count = df_train.isnull().sum().sum()
        return int(null_count)

 

๋ฉ”์†Œ๋“œ ํŒŒ์ผ ์ด๊ฑธ๋กœ ๋ฐ”๊ฟˆ

 

from pathlib import Path
import pandas as pd
from app.titanic.titanic_dataset import TitanicDataSet
from icecream import ic

class TitanicMethod(object):

    def __init__(self):
        self.dataset = TitanicDataSet()

    def new_model(self, fname: str) -> pd.DataFrame:
        return pd.read_csv(fname)

    def create_train(self, df: DataFrame, label: str) -> pd.DataFrame:
        return self.new_model().drop(columns=[label])

    def create_label(self, df: DataFrame, label: str) -> pd.DataFrame:
        return self.new_model()[[label]]

    def drop_feature(self, df: DataFrame, *feature: str) -> pd.DataFrame:
        feature_list = list(feature)
        df_dropped = df.drop(columns=feature_list)
        return df_dropped

    def null_check(self) -> int:
        return int(df.isnull().sum().sum())
 

ํ•œ๋ฒˆ๋” ๋‚ด์šฉ ๋ฐ”๊ฟˆ

 

 

 

@titanic_method.py (19-23) ์ด๊ฒƒ์„ ๋ฆฌ์ŠคํŠธ ์ปดํ”„๋ฆฌํ—จ์…˜์œผ๋กœ ๋ณ€๊ฒฝํ•ด์ค˜.

 

๋ณ€๊ฒฝ ์‚ฌํ•ญ:

์ด์ „:

๋ณ€๊ฒฝ ํ›„:

 

๋ฆฌ์ŠคํŠธ ์ปดํ”„๋ฆฌํ—จ์…˜์œผ๋กœ ๋ณ€๊ฒฝํ–ˆ์Šต๋‹ˆ๋‹ค. ๋™์ž‘์€ ๋™์ผํ•˜๋ฉฐ, *feature ํŠœํ”Œ์˜ ๊ฐ ์š”์†Œ๋ฅผ ๋ฆฌ์ŠคํŠธ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

 

 

 

251208 ์Šคํ”„๋ง, ํŒŒ์ด์ฌ, ๋„ฅ์ŠคํŠธ ๋ชจ๋‘ ํ†ตํ•ฉ๋œ msa ๊ตฌ์กฐ์—์„œ ํ•„์š”ํ•œ ๊ฒƒ๋งŒ ์ปจํ…Œ์ด๋„ˆ๋กœ ๋„์›Œ์„œ ๋ฆฌ์†Œ์Šค๋ฅผ ์•„๋ผ๋Š” ๋ฐฉ๋ฒ• ๊ธฐ์กด์˜ docker-compose.yaml ์„ docker-compose.local.yaml ๋กœ ์ž„์‹œ ์ด๋ฆ„์œผ๋กœ ๋ณ€๊ฒฝํ•˜๊ณ  , ์›๋ณธ์˜ ์นดํ”ผ๋ณธ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ”„๋กฌํ”„ํŠธ๋กœ ์‚ฌ์šฉํ•  ๊ฒƒ๋“ค๋งŒ ์ปจํ…Œ์ด๋„ˆ๋กœ ๋งŒ๋“œ๋Š” ๊ฒƒ์œผ๋กœ ํŽธ์ง‘์„ ์š”์ฒญํ•œ ํ›„ ์ด๊ฒƒ์„ docker compose up ์„ ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค

 

@docker-compose.yaml @api.seoeunjin.com @ai.seoeunjin.com ์ด ๋‘๊ฐœ์˜ ์ปจํ…Œ์ด๋„ˆ๋งŒ ์ƒ์„ฑ๋˜๋„๋ก ์ฝ”๋“œ๋ฅผ ๋ณ€๊ฒฝํ•ด์ค˜

 

 

@ai.seoeunjin.com/mlservice/app/titanic/router.py ์—ฌ๊ธฐ์— @ai.seoeunjin.com/mlservice/app/main.py ์—ฌ๊ธฐ๋ฅผ ๊ฑฐ์ณ @titanic_service.py (103-114) ์ด ๋กœ๊ทธ๊ฐ€ ๋ณด์—ฌ์ฃผ๋Š” ๋ผ์šฐํ„ฐ๋ฅผ ์ž‘์„ฑํ•ด์ค˜ ์ด๋ฆ„์€ /titanic ์œผ๋กœ ํ•ด์ค˜

 

ํฌ์ŠคํŠธ๋งจ์—์„œ ๋งํฌ๋ฅผ ๋ˆ„๋ฅด๋ฉด ํ„ฐ๋ฏธ๋„์—์„œ ์œ„์— ๋‚ด์šฉ์ด ๋œจ๊ฒŒ ํ•˜๊ธฐ!

์—๋Ÿฌ ๊ณ„์† ์žก์œผ๋ฉด์„œ ๊ตฌํ˜„ํ•˜์˜€์Œ

 


@titanic_method.py (8-26) ์ด๊ฑฐ๋ฅผ Titanic ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ์šฉ ํด๋ž˜์Šค๋ฅผ ๊ตฌํ˜„ํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.

์•„๋ž˜ ๋ฉ”์„œ๋“œ ์‹œ๊ทธ๋‹ˆ์ฒ˜์™€ ์ฃผ์„์„ ๊ธฐ์ค€์œผ๋กœ, ๊ฐ ํ”ผ์ฒ˜์˜ ์ฒ™๋„(nominal, ordinal, ratio/interval)์— ๋งž๊ฒŒ ์ „์ฒ˜๋ฆฌ๋ฅผ ๊ตฌํ˜„ํ•ด ์ฃผ์„ธ์š”.

 

์ „์ œ ์กฐ๊ฑด

- ์ž…๋ ฅ df๋Š” Titanic ๋ฐ์ดํ„ฐ์…‹์˜ pandas DataFrame์ž…๋‹ˆ๋‹ค.

- ์ฃผ์š” ์ปฌ๋Ÿผ: ["Pclass", "Fare", "Embarked", "Sex", "Age", "Name"] ๊ฐ€ ์กด์žฌํ•œ๋‹ค๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

- ๋ชจ๋“  ๋ฉ”์„œ๋“œ๋Š” df๋ฅผ ์ˆ˜์ •ํ•œ ๋’ค, ์ˆ˜์ •๋œ df ์ž์ฒด๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋„๋ก ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

- ๊ฐ€๋Šฅํ•œ ํ•œ in-place ์—ฐ์‚ฐ์„ ์‚ฌ์šฉํ•˜๋˜, ์ฒด์ด๋‹ ๊ฐ€๋Šฅ์„ฑ์„ ์œ„ํ•ด df๋ฅผ ๋ช…์‹œ์ ์œผ๋กœ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

- ๊ฒฐ์ธก์น˜ ์ฒ˜๋ฆฌ ์ „๋žต๋„ ๊ฐ„๋‹จํžˆ ํฌํ•จํ•ด ์ฃผ์„ธ์š”.

 

์•„๋ž˜๋Š” ํด๋ž˜์Šค์˜ ๊ณจ๊ฒฉ์ž…๋‹ˆ๋‹ค.

 

```python

import numpy as np

import pandas as pd

from pandas import DataFrame

 

class TitanicPreprocessor:

 

def pclass_ordinal(self, df: DataFrame) -> pd.DataFrame:

"""

Pclass: ๊ฐ์‹ค ๋“ฑ๊ธ‰ (1, 2, 3)

- ์„œ์—ดํ˜• ์ฒ™๋„(ordinal)๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

- 1๋“ฑ์„ > 2๋“ฑ์„ > 3๋“ฑ์„์ด๋ฏ€๋กœ, ์ƒ์กด๋ฅ  ๊ด€์ ์—์„œ 1์ด ๊ฐ€์žฅ ์ข‹๊ณ  3์ด ๊ฐ€์žฅ ์•ˆ ์ข‹์Šต๋‹ˆ๋‹ค.

๊ตฌํ˜„ ์š”๊ตฌ์‚ฌํ•ญ:

- df["Pclass"]๊ฐ€ int ๋˜๋Š” category๋ผ๊ณ  ๊ฐ€์ •ํ•ฉ๋‹ˆ๋‹ค.

- ๋ณ„๋„์˜ ์ธ์ฝ”๋”ฉ ์ปฌ๋Ÿผ๋ช…์„ ์‚ฌ์šฉํ• ์ง€, ๊ธฐ์กด Pclass๋ฅผ ๋ฎ์–ด์“ธ์ง€๋Š” ํ•ฉ๋ฆฌ์ ์œผ๋กœ ์„ ํƒํ•˜์„ธ์š”.

(์˜ˆ: "Pclass_ordinal" ์ปฌ๋Ÿผ์„ ์ƒˆ๋กœ ๋งŒ๋“ค๊ณ , ๊ธฐ์กด Pclass๋Š” ์œ ์ง€ํ•ด๋„ ์ข‹์Šต๋‹ˆ๋‹ค.)

"""

pass

 

def fare_ordinal(self, df: DataFrame) -> pd.DataFrame:

"""

Fare: ์š”๊ธˆ (์—ฐ์†ํ˜• ratio ์ฒ™๋„์ด์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ๊ตฌ๊ฐ„ํ™”ํ•˜์—ฌ ์„œ์—ดํ˜•์œผ๋กœ ์‚ฌ์šฉํ•˜๋ ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค.)

๊ตฌํ˜„ ์š”๊ตฌ์‚ฌํ•ญ:

- ๊ฒฐ์ธก์น˜๊ฐ€ ์žˆ์œผ๋ฉด ์ค‘์•™๊ฐ’์œผ๋กœ ์ฑ„์›๋‹ˆ๋‹ค.

- Fare๋ฅผ ์‚ฌ๋ถ„์œ„์ˆ˜ ๋˜๋Š” ์ ์ ˆํ•œ ๊ตฌ๊ฐ„์œผ๋กœ binning ํ•˜์—ฌ ordinal ํ”ผ์ฒ˜๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค.

์˜ˆ: pd.qcut(df["Fare"], q=4, labels=[0,1,2,3]) ์™€ ๊ฐ™์€ ๋ฐฉ์‹.

- ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ๋ช… ์˜ˆ: "Fare_band" ๋˜๋Š” "Fare_ordinal".

- ์›๋ž˜ Fare ์ปฌ๋Ÿผ์€ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•˜๊ณ , band ์ปฌ๋Ÿผ๋งŒ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

"""

pass

 

def embarked_ordinal(self, df: DataFrame) -> pd.DataFrame:

"""

Embarked: ํƒ‘์Šน ํ•ญ๊ตฌ (C, Q, S)

- ๋ณธ์งˆ์ ์œผ๋กœ๋Š” nominal(๋ช…๋ชฉ) ์ฒ™๋„์ž…๋‹ˆ๋‹ค. ๋‹ค๋งŒ ๋ฉ”์„œ๋“œ ์ด๋ฆ„์€ embarked_ordinal๋กœ ๋‚จ๊ธฐ๋˜,

์‹ค์ œ ์ „์ฒ˜๋ฆฌ๋Š” ๋ช…๋ชฉํ˜•์— ๋งž๊ฒŒ one-hot encoding์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ๋” ํƒ€๋‹นํ•ฉ๋‹ˆ๋‹ค.

๊ตฌํ˜„ ์š”๊ตฌ์‚ฌํ•ญ:

- ๊ฒฐ์ธก์น˜๋Š” ๊ฐ€์žฅ ๋งŽ์ด ๋“ฑ์žฅํ•˜๋Š” ๊ฐ’์œผ๋กœ ์ฑ„์›๋‹ˆ๋‹ค (mode).

- df["Embarked"]๋ฅผ ์ด์šฉํ•ด one-hot ์ปฌ๋Ÿผ์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ: "Embarked_C", "Embarked_Q", "Embarked_S"

- ์›๋ž˜ "Embarked" ์ปฌ๋Ÿผ์€ ๋‚จ๊ฒจ๋‘ฌ๋„ ๋˜๊ณ , ์‚ญ์ œํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค. ๋‘˜ ์ค‘ ํ•ฉ๋ฆฌ์ ์ธ ์ชฝ์œผ๋กœ ๊ตฌํ˜„ํ•˜์„ธ์š”.

- pandas.get_dummies๋ฅผ ์‚ฌ์šฉํ•ด๋„ ๋ฉ๋‹ˆ๋‹ค.

"""

pass

 

def gender_nominal(self, df: DataFrame) -> pd.DataFrame:

"""

Sex: ์„ฑ๋ณ„ (male, female)

- nominal ์ฒ™๋„์ž…๋‹ˆ๋‹ค.

๊ตฌํ˜„ ์š”๊ตฌ์‚ฌํ•ญ:

- df["Sex"]๋ฅผ ์ด์ง„ ์ธ์ฝ”๋”ฉ ๋˜๋Š” one-hot encoding์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

- ์˜ˆ1) "Sex_male" = 1, "Sex_female" = 0

- ์˜ˆ2) get_dummies(df["Sex"], prefix="Sex") ์‚ฌ์šฉ

- ๊ธฐ์กด "Sex" ์ปฌ๋Ÿผ์„ ์œ ์ง€ํ• ์ง€ ์—ฌ๋ถ€๋Š” ๊ตฌํ˜„์—์„œ ํ•ฉ๋ฆฌ์ ์œผ๋กœ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค.

- ๋ฐ˜ํ™˜ ์‹œ df์—๋Š” ํ•™์Šต์— ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์ˆ˜์น˜ํ˜• ์ปฌ๋Ÿผ์ด ํฌํ•จ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

"""

pass

 

def age_ratio(self, df: DataFrame) -> pd.DataFrame:

"""

Age: ๋‚˜์ด

- ์›๋ž˜๋Š” ratio ์ฒ™๋„์ง€๋งŒ, ์—ฌ๊ธฐ์„œ๋Š” ๋‚˜์ด๋ฅผ ๊ตฌ๊ฐ„์œผ๋กœ ๋‚˜๋ˆˆ ordinal ํ”ผ์ฒ˜๋ฅผ ๋งŒ๋“ค๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค.

- ์ด๋ฏธ bins ๋ฆฌ์ŠคํŠธ๊ฐ€ ์•„๋ž˜์™€ ๊ฐ™์ด ์ œ๊ณต๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.

"""

bins = [-1, 0, 5, 12, 18, 24, 35, 60, np.inf]

 

"""

๊ตฌํ˜„ ์š”๊ตฌ์‚ฌํ•ญ:

- Age ๊ฒฐ์ธก์น˜๋Š” ์ค‘์•™๊ฐ’ ๋˜๋Š” ํ‰๊ท ์œผ๋กœ ์ฑ„์›๋‹ˆ๋‹ค (์ค‘์•™๊ฐ’ ์ถ”์ฒœ).

- ์œ„ bins๋ฅผ ์‚ฌ์šฉํ•ด์„œ ๋‚˜์ด๋ฅผ ๊ตฌ๊ฐ„ํ™”ํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ: pd.cut(df["Age"], bins=bins, labels=False) ๋“ฑ

- ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ๋ช… ์˜ˆ: "Age_band" ๋˜๋Š” "Age_ordinal".

- ํ•„์š”ํ•˜๋‹ค๋ฉด ๋ฒ”์ฃผ(label)์™€ ์˜๋ฏธ(์œ ์•„/์–ด๋ฆฐ์ด/์ฒญ์†Œ๋…„/์„ฑ์ธ ๋“ฑ)๋ฅผ ์ฃผ์„์œผ๋กœ ๋‚จ๊ฒจ์ฃผ์„ธ์š”.

- ์›๋ณธ Age ์ปฌ๋Ÿผ์€ ์œ ์ง€ํ•˜๊ณ , band ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ตฌํ˜„ํ•ฉ๋‹ˆ๋‹ค.

"""

pass

 

def title_nominal(self, df: DataFrame) -> pd.DataFrame:

"""

Title: ๋ช…์นญ (Mr, Mrs, Miss, Master, Dr, etc.)

- Name ์ปฌ๋Ÿผ์—์„œ ์ถ”์ถœํ•œ ํƒ€์ดํ‹€์ž…๋‹ˆ๋‹ค.

- nominal ์ฒ™๋„์ž…๋‹ˆ๋‹ค.

๊ตฌํ˜„ ์š”๊ตฌ์‚ฌํ•ญ:

- df["Name"] ์ปฌ๋Ÿผ์—์„œ ์ •๊ทœํ‘œํ˜„์‹ ๋“ฑ์„ ์จ์„œ Title์„ ์ถ”์ถœํ•ฉ๋‹ˆ๋‹ค.

์˜ˆ: df["Name"].str.extract(r',\s*([^\.]+)\.', expand=False)

- ํฌ์†Œํ•œ ํƒ€์ดํ‹€์€ "Rare" ๊ทธ๋ฃน์œผ๋กœ ๋ฌถ์Šต๋‹ˆ๋‹ค.

์˜ˆ: ["Lady", "Countess", "Capt", "Col", "Don", ...] → "Rare"

- ์ตœ์ข…์ ์œผ๋กœ df["Title"] ์ปฌ๋Ÿผ์„ ๋งŒ๋“ค๊ฑฐ๋‚˜ ์ •๋ฆฌํ•˜๊ณ ,

์ด๋ฅผ one-hot encoding ๋˜๋Š” LabelEncoding ๋“ฑ ์ˆ˜์น˜ํ˜•์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.

- ์ „์ฒ˜๋ฆฌ ๊ฒฐ๊ณผ๊ฐ€ ๋ชจ๋ธ์— ๋ฐ”๋กœ ๋„ฃ์„ ์ˆ˜ ์žˆ๋Š” ํ˜•ํƒœ๊ฐ€ ๋˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

"""

pass

 

--์ด ์ „๋žต์„ ์Šน์ธํ•˜๋ฏ€๋กœ ์ปค์„œ๋Š” ์ด๋Œ€๋กœ ์ž‘์—…ํ•ด์ค˜

 

 

def gender_nominal(self, df: DataFrame) -> pd.DataFrame:
        """
        Sex: ์„ฑ๋ณ„ (male, female)
        - nominal ์ฒ™๋„์ž…๋‹ˆ๋‹ค.
        - one-hot encoding์œผ๋กœ ๋ณ€ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
        - ๊ธฐ์กด "Sex" ์ปฌ๋Ÿผ์€ ์œ ์ง€ํ•˜๊ณ , one-hot ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
        """
        df = df.copy()
        # one-hot encoding
        sex_dummies = pd.get_dummies(df["Sex"], prefix="Sex")
        df = pd.concat([df, sex_dummies], axis=1)
        return df

 

@titanic_method.py (88-99) ์ฃผ์„๋Œ€๋กœ ํ”ผ์ฒ˜ ์ด๋ฆ„์„ Gender ๋กœ ๋ณ€๊ฒฝํ•ด์ค˜

 

 


 

 

 

    def preprocess(self):
        ic("๐Ÿ˜Ž๐Ÿ˜Ž ์ „์ฒ˜๋ฆฌ ์‹œ์ž‘")
        the_method = TitanicMethod()
        df_train = the_method.new_model('train.csv')
        this_train = the_method.create_train(df_train, 'Survived')
        ic(f'1. Train ์˜ type \n {type(this_train)} ')
        ic(f'2. Train ์˜ column \n {this_train.columns} ')
        ic(f'3. Train ์˜ ์ƒ์œ„ 5๊ฐœ ํ–‰\n {this_train.head(5)} ')
        ic(f'4. Train ์˜ null ์˜ ๊ฐฏ์ˆ˜\n {the_method.check_null(this_train)}๊ฐœ')
        drop_features = ['SibSp', 'Parch', 'Cabin', 'Ticket']
        this_train = the_method.drop_feature(this_train, *drop_features)
        this_train = the_method.pclass_ordinal(this_train)
        this_train = the_method.fare_ordinal(this_train)
        this_train = the_method.embarked_ordinal(this_train)
        this_train = the_method.gender_nominal(this_train)
        this_train = the_method.age_ratio(this_train)
        this_train = the_method.title_nominal(this_train)
        drop_name = ['Name']
        this_train = the_method.drop_feature(this_train, *drop_name)
        ic("๐Ÿ˜Ž๐Ÿ˜Ž ์ „์ฒ˜๋ฆฌ ์™„๋ฃŒ")
        ic(f'1. Train ์˜ type \n {type(this_train)} ')
        ic(f'2. Train ์˜ column \n {this_train.columns} ')
        ic(f'3. Train ์˜ ์ƒ์œ„ 5๊ฐœ ํ–‰\n {this_train.head(5)} ')
        ic(f'4. Train ์˜ null ์˜ ๊ฐฏ์ˆ˜\n {the_method.check_null(this_train)}๊ฐœ')
 

 

 

docker compose up gateway

docker compose up mlservice

 

 

 

 

๋‹ค์‹œ ์ˆ˜์ •ํ•˜๊ณ  ํ•  ๋•Œ

docker compose build mlservice
docker compose up -d mlservice

 

 

 

 

 

์™œ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•œ๊ฑฐ์•ผ? mlservice ์ปจํ…Œ์ด๋„ˆ์˜ ๋กœ๊ทธ๋ฅผ ๋ณด์—ฌ์ค˜

 

 

Error" -Context 5

 

INFO: 172.18.0.1:43250 - "GET /docs HTTP/1.1" 200 OK

INFO: 172.18.0.1:43250 - "GET /openapi.json HTTP/1.1" 200 OK

> INFO: 172.18.0.1:52272 - "GET /titanic/preprocess HTTP/1.1" 500 Internal

Server Error

INFO: 172.18.0.1:52388 - "GET /docs HTTP/1.1" 200 OK

INFO: 172.18.0.1:52388 - "GET /openapi.json HTTP/1.1" 200 OK

> INFO: 172.18.0.1:52392 - "GET /titanic/preprocess HTTP/1.1" 500 Internal

Server Error

INFO: 172.18.0.1:40128 - "GET /docs HTTP/1.1" 200 OK

INFO: 172.18.0.1:40128 - "GET /openapi.json HTTP/1.1" 200 OK

> INFO: 172.18.0.1:40128 - "GET /titanic/preprocess HTTP/1.1" 500 Internal

Server Error

> INFO: 172.18.0.1:40128 - "GET /titanic/preprocess HTTP/1.1" 500 Internal

Server Error

INFO: 172.18.0.1:51412 - "GET /docs HTTP/1.1" 200 OK

INFO: 172.18.0.1:51412 - "GET /openapi.json HTTP/1.1" 200 OK

INFO: 172.18.0.1:51412 - "GET /docs HTTP/1.1" 200 OK

INFO: 172.18.0.1:51412 - "GET /openapi.json HTTP/1.1" 200 OK

> INFO: 172.18.0.1:48824 - "GET /titanic/preprocess HTTP/1.1" 500 Internal -์ด ๋กœ๊ทธ๋ฅผ ๋ณด๊ณ  ์—๋Ÿฌ๋ฅผ ์ˆ˜์ •ํ•ด์ค˜

 

 

 

@titanic_method.py (91-102) male์€ 0, female ์€ 1๋กœ ๋‚˜์˜ค๋„๋ก ๋งคํ•‘ํ•ด์ค˜. ํ˜„์žฌ๋Š” male, female๋กœ ๋‚˜์˜ค๊ณ  ์žˆ์–ด.

 

gender๊ฐ€ ์—†์–ด์ง

 

 

'Project ESG+AI > [์‚ผ์ •KPMG]ESG ๋ฐ์ดํ„ฐ ํ™œ์šฉ ํ’€์Šคํ… ๊ฐœ๋ฐœ' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

42์ผ์ฐจ.  (0) 2025.12.10
41์ผ์ฐจ.  (1) 2025.12.09
39์ผ์ฐจ.  (0) 2025.12.05
38์ผ์ฐจ.  (0) 2025.12.04
37์ผ์ฐจ.  (0) 2025.11.28