๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Project ESG+AI/[์‚ผ์ •KPMG]ESG ๋ฐ์ดํ„ฐ ํ™œ์šฉ ํ’€์Šคํ… ๊ฐœ๋ฐœ

41์ผ์ฐจ.

by GreenJin_S2 2025. 12. 9.

 

 

    def preprocess(self):
        ic("๐Ÿ˜Ž๐Ÿ˜Ž ์ „์ฒ˜๋ฆฌ ์‹œ์ž‘")
        the_method = TitanicMethod()
        train_csv_path = self._get_data_path('train.csv')
        ic(f'train.csv ๊ฒฝ๋กœ: {train_csv_path}')
        df_train = the_method.new_model(str(train_csv_path))
        this_train = the_method.create_train(df_train, 'Survived')
        ic(f'1. Train ์˜ type \n {type(this_train)} ')
        ic(f'2. Train ์˜ column \n {this_train.columns} ')
        ic(f'3. Train ์˜ ์ƒ์œ„ 5๊ฐœ ํ–‰\n {this_train.head(5)} ')
        ic(f'4. Train ์˜ null ์˜ ๊ฐฏ์ˆ˜\n {the_method.check_null(this_train)}๊ฐœ')

        test_csv_path = self._get_data_path('test.csv')
        ic(f'test.csv ๊ฒฝ๋กœ: {test_csv_path}')
        df_test = the_method.new_model(str(test_csv_path))
        this_test = the_method.create_train(df_test, 'Survived')
        ic(f'1. Test ์˜ type \n {type(this_test)} ')
        ic(f'2. Test ์˜ column \n {this_test.columns} ')
        ic(f'3. Test ์˜ ์ƒ์œ„ 5๊ฐœ ํ–‰\n {this_test.head(5)} ')
        ic(f'4. Test ์˜ null ์˜ ๊ฐฏ์ˆ˜\n {the_method.check_null(this_test)}๊ฐœ')
 

 


ํƒ€์ดํƒ€๋‹‰ ์ „์ฒ˜๋ฆฌ๊ณผ์ •์—์„œ , ํŠธ๋ ˆ์ธ์…‹ ์™€ ํ…Œ์ŠคํŠธ์…‹์„ ํ•œ๊บผ๋ฒˆ์— ์ฒ˜๋ฆฌํ•˜๊ณ  ์‹ถ์–ด. ๊ทธ๋ž˜์„œ drop_features = ['SibSp', 'Parch', 'Cabin', 'Ticket'] this_train , this_test = the_method.drop_feature(this_train, this_test, *drop_features) this_train , this_test = the_method.pclass_ordinal(this_train, this_test) this_train , this_test = the_method.fare_ordinal(this_train, this_test) this_train , this_test = the_method.embarked_ordinal(this_train, this_test) this_train , this_test = the_method.gender_nominal(this_train, this_test) this_train , this_test = the_method.extract_title(this_train, this_test) # Name์—์„œ Title ์ถ”์ถœ this_train , this_test = the_method.age_ratio(this_train, this_test) this_train , this_test = the_method.title_nominal(this_train, this_test) drop_name = ['Name'] this_train , this_test = the_method.drop_feature(this_train, this_test, *drop_name) ์ด๋ ‡๊ฒŒ ์ž‘์—…์„ ํ–ˆ๋Š”๋ฐ, ํŒŒ์ด์ฌ ๋ฌธ๋ฒ•์—์„œ ์ด๊ฒƒ์ด ๋งž์•„ ? ์›๋ž˜ ๋ฆฌํ„ด๊ฐ’์€ ํ•˜๋‚˜๋งŒ ์žˆ์–ด์•ผ ํ•˜๋Š”๋ฐ ์ด๊ฒŒ ๊ฐ€๋Šฅํ•œ์ง€ ์„ค๋ช…ํ•ด์ค˜.

 

๊ฒฐ๋ก ๋ถ€ํ„ฐ ๋งํ•˜๋ฉด ๋„ค, ํŒŒ์ด์ฌ์—์„œ return ๊ฐ’์ด ๋‘˜ ์ด์ƒ์ด ๊ฐ€๋Šฅํ•˜๊ณ ,
๊ทธ๊ฒƒ์„ ํŠœํ”Œ(tuple) ํ˜•ํƒœ๋กœ ๋ฌถ์–ด์„œ ๋ฐ˜ํ™˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์—
์•„๋ž˜์ฒ˜๋Ÿผ ์“ฐ๋Š” ๊ฒŒ ๋ฌธ๋ฒ•์ ์œผ๋กœ ์™„์ „ํžˆ ๋งž์Šต๋‹ˆ๋‹ค.

 
this_train, this_test = the_method.drop_feature(this_train, this_test, *drop_features)

โœ” ์™œ ๊ฐ€๋Šฅํ•œ๊ฐ€?

ํŒŒ์ด์ฌ ํ•จ์ˆ˜๋Š” ์‚ฌ์‹ค ํ•ญ์ƒ ํ•œ ๊ฐœ์˜ ๊ฐ’๋งŒ ๋ฐ˜ํ™˜ํ•ฉ๋‹ˆ๋‹ค.
๊ทธ๋Ÿฐ๋ฐ ์—ฌ๋Ÿฌ ๊ฐ’์„ ๋„ฃ์–ด์„œ ๋ฐ˜ํ™˜ํ•˜๋ฉด ์ž๋™์œผ๋กœ ํŠœํ”Œ๋กœ ๋ฌถ์—ฌ์„œ ํ•œ ๊ฐœ์˜ ๊ฐ’์ด ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ:

 
def f(): return 1, 2

์ด ํ•จ์ˆ˜๋Š” ์‹ค์ œ๋กœ ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค.

 
def f(): return (1, 2) # ์ž๋™์œผ๋กœ ํŠœํ”Œ๋กœ ๋ณ€ํ™˜

๊ทธ๋ž˜์„œ ํ˜ธ์ถœํ•  ๋•Œ:

 
a, b = f()

๋ผ๋Š” ๋ฐฉ์‹์œผ๋กœ ํŠœํ”Œ ์–ธํŒจํ‚น(tuple unpacking) ์„ ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.


โœ” ๊ทธ๋Ÿฌ๋ฉด ๋„ค ์ฝ”๋“œ๋„ ๊ฐ™์€ ์›๋ฆฌ

์˜ˆ๋ฅผ ๋“ค์–ด ์•„๋ž˜ ๋ฉ”์„œ๋“œ๊ฐ€:

 
def drop_feature(self, train, test, *features): train = train.drop(features, axis=1) test = test.drop(features, axis=1) return train, test

์ด๋ ‡๊ฒŒ train, test๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด
ํŒŒ์ด์ฌ์€ ๋‚ด๋ถ€์ ์œผ๋กœ:

 
return (train, test)

์œผ๋กœ ์ฒ˜๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

→ ๊ทธ๋ž˜์„œ ์•„๋ž˜ ์ฝ”๋“œ๊ฐ€ ์ •์ƒ์ ์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

 
this_train, this_test = the_method.drop_feature(this_train, this_test, *drop_features)

โœ” ์ฆ‰, ๋„ค๊ฐ€ ํ•œ ์ „์ฒด ์ž‘์—…์€ ๊ตฌ์กฐ์ ์œผ๋กœ ์™„์ „ํžˆ ์˜ฌ๋ฐ”๋ฅธ Python ๋ฐฉ์‹

์ „์ฒด ํ๋ฆ„:

 
this_train , this_test = the_method.pclass_ordinal(this_train, this_test) this_train , this_test = the_method.fare_ordinal(this_train, this_test) ...

๋ชจ๋“  ํ•จ์ˆ˜๊ฐ€ (train, test) ํŠœํ”Œ์„ ๋ฐ˜ํ™˜ํ•˜๋ฉด
๊ฐ ๋‹จ๊ณ„์—์„œ ์ง€์†์ ์œผ๋กœ ์—…๋ฐ์ดํŠธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.


โœ” ๋” ํŒŒ์ด์ฌ์Šค๋Ÿฌ์šด ํ˜•ํƒœ๋กœ ์ค„์ด๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅ (์ฐธ๊ณ )

์—ฌ๋Ÿฌ ๋ณ€ํ™˜์„ ์ฒด์ด๋‹ํ•˜๋ ค๋ฉด ์ด๋ ‡๊ฒŒ ๋ฌถ์–ด๋ฒ„๋ฆด ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค:

 


 

test๋„ ์–ด์ œ train๊ณผ ๊ฐ™์ด


    def drop_feature(self, df: pd.DataFrame, *feature: str) -> object:
        [i.drop(j, axis=1, inplace=True) for j in feature for i in [this.train, this.test]]

        return this

    def check_null(self, this) -> int:
        [print(i.isnull().sum()) for i in [this.train, this.test]]
        for i in [this.train, this.test]:
            print(i.isnull().sum())

 

 dataset = TitanicDatasets()

 

์„œ๋น„์Šค ํŒŒ์ผ์— ์ด๋ถ€๋ถ„ ์ถ”๊ฐ€

 

 

 

 

๊ณ„์† ์—๋Ÿฌ ์žก๊ณ , ํ•œ์ค„๋กœ ๋‚˜์˜ค๊ฒŒ ์ •๋ ฌ ๋ฐ”๊ฟ”๋‹ฌ๋ผ๊ณ  ์š”์ฒญํ•˜์˜€์Œ

 

 

 


๋จธ์‹ ๋Ÿฌ๋‹์—์„œ ๋ชจ๋ธ๋ง์„ ํ•˜๋ ค๊ณ  ํ•˜๋Š”๋ฐ, ์ „์ฒ˜๋ฆฌ ๋‹จ๊ณ„๋Š” ์™„๋ฃŒ๋˜์–ด์„œ, ์ •์ˆ˜๋กœ ๋œ df๋Š” ์ƒ์„ฑํ–ˆ์–ด. ์ด๋•Œ ๋งํ•˜๋Š” ๋ชจ๋ธ์ด ๋ญ”์ง€ ์ดˆ๋ณด์ž๊ฐ€ ์ดํ•ดํ•˜๋„๋ก ์‰ฝ๊ฒŒ ์„ค๋ช…ํ•ด์ค˜

 

 

 

๋ชจ๋ธ = “๋ฐ์ดํ„ฐ๋กœ ๋ฐฐ์šด ๋˜‘๋˜‘ํ•œ ๊ณ„์‚ฐ๊ธฐ”๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ผ!

์˜ˆ๋ฅผ ๋“ค์–ด, ๋„ˆ๊ฐ€ ์ง‘ ๊ฐ’์„ ์˜ˆ์ธกํ•˜๊ณ  ์‹ถ๋‹ค๊ณ  ํ•ด๋ณด์ž. ๋„ˆ๊ฐ€ ์ด๋ฏธ ์ „์ฒ˜๋ฆฌํ•ด์„œ ๋งŒ๋“  ์ •์ˆ˜ df์—๋Š” ์ด๋Ÿฐ ์ปฌ๋Ÿผ๋“ค์ด ์žˆ๋‹ค๊ณ  ์น˜์ž:

  • ๋ฐฉ ๊ฐœ์ˆ˜
  • ํ‰์ˆ˜
  • ์ง€ํ•˜์ฒ ์—ญ๊นŒ์ง€ ๊ฑฐ๋ฆฌ
  • ๊ตฌ์ถ• ์—ฌ๋ถ€ (0 or 1) → ๊ทธ๋ฆฌ๊ณ  ์ •๋‹ต ์ปฌ๋Ÿผ: ์ง‘ ๊ฐ’ (๋งŒ์›)

์ด ๋ฐ์ดํ„ฐ๋ฅผ ์ปดํ“จํ„ฐํ•œํ…Œ ๋ณด์—ฌ์ฃผ๋ฉด์„œ “์•ผ, ์ด ๋ฐ์ดํ„ฐ๋“ค ๋ณด๋ฉด์„œ ์ง‘ ๊ฐ’์ด ์–ด๋–ป๊ฒŒ ๊ฒฐ์ •๋˜๋Š”์ง€ ๊ทœ์น™์„ ์ฐพ์•„๋ด!” ๋ผ๊ณ  ์‹œํ‚ค๋Š” ๊ฑฐ์•ผ.

๊ทธ๋•Œ ์ปดํ“จํ„ฐ๊ฐ€ “์•„~ ๋ฐฉ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ๊ณ , ํ‰์ˆ˜๊ฐ€ ํฌ๊ณ , ์ง€ํ•˜์ฒ ์ด ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ง‘ ๊ฐ’์ด ๋น„์‹ธ๊ตฌ๋‚˜!” ๋ผ๋Š” ๊ทœ์น™(ํŒจํ„ด)์„ ๋ฐฐ์šฐ๋Š”๋ฐ, ๊ทธ ๋ฐฐ์šด ๊ทœ์น™ ์ž์ฒด๋ฅผ ์šฐ๋ฆฌ๋Š” “๋ชจ๋ธ”์ด๋ผ๊ณ  ๋ถ€๋ฅด๋Š” ๊ฑฐ์•ผ.

๋” ์‰ฝ๊ฒŒ ๋น„์œ ํ•˜๋ฉด

๋„ˆ๊ฐ€ ์ดˆ๋“ฑํ•™์ƒํ•œํ…Œ “2 + 3 = 5, 4 + 7 = 11, 10 + 8 = 18 …” ์ด๋Ÿฐ ๋ง์…ˆ ๋ฌธ์ œ๋ฅผ ์ž”๋œฉ ๋ณด์—ฌ์ฃผ๋ฉด์„œ “์ด์ œ ๋„ˆ๊ฐ€ ๋ง์…ˆํ•˜๋Š” ๋ฒ•์„ ์•Œ์•„์„œ ํ•ด๋ด!” ๋ผ๊ณ  ๊ฐ€๋ฅด์นœ๋‹ค๊ณ  ์ƒ๊ฐํ•ด๋ด.

๊ทธ ์•„์ด๊ฐ€ ์ง„์งœ๋กœ ๋ง์…ˆ์„ ์ดํ•ดํ•˜๊ณ  ๋‚˜๋ฉด, ์ด์ œ ๋„ˆ๊ฐ€ “35 + 27 = ?” ๋ผ๊ณ  ๋ฌผ์–ด๋ดค์„ ๋•Œ ๊ทธ ์•„์ด๊ฐ€ ๋ฐ”๋กœ ๋‹ต์„ ํ•ด์ฃผ์ž–์•„?

→ ๊ทธ ์•„์ด ๋จธ๋ฆฟ์†์— ๋“ค์–ด์žˆ๋Š” “๋ง์…ˆํ•˜๋Š” ๋Šฅ๋ ฅ” = ๋ชจ๋ธ์ด์•ผ!

๋จธ์‹ ๋Ÿฌ๋‹ ๋ชจ๋ธ๋„ ๋˜‘๊ฐ™์•„. ๊ณผ๊ฑฐ ๋ฐ์ดํ„ฐ(๋ฌธ์ œ+์ •๋‹ต)๋ฅผ ์ž”๋œฉ ๋ณด์—ฌ์ค˜์„œ ํŒจํ„ด์„ ๋ฐฐ์šฐ๊ฒŒ ํ•˜๊ณ , ์ด์ œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ(์ƒˆ๋กœ์šด ์ง‘ ์ •๋ณด)๊ฐ€ ๋“ค์–ด์˜ค๋ฉด “์ด ์ง‘์€ ๋Œ€์ถฉ ์–ผ๋งˆ์ผ ๊ฒƒ ๊ฐ™์•„?” ๋ผ๊ณ  ์˜ˆ์ธก(๋˜๋Š” ๋ถ„๋ฅ˜)์„ ํ•ด์ฃผ๋Š” ๊ฑฐ์ง€.

์ž์ฃผ ์“ฐ๋Š” ๋ชจ๋ธ๋“ค ์ดˆ๋ณด์ž์šฉ ํ•œ ์ค„ ์„ค๋ช…

 
๋ชจ๋ธ ์ด๋ฆ„์ด ๋ชจ๋ธ์€ ๋ญ˜ ํ•  ๋•Œ ์“ฐ๋‚˜์š”?์ดˆ๋ณด์ž ๋น„์œ 

 

์„ ํ˜• ํšŒ๊ท€ (Linear Regression) ์ˆซ์ž๋ฅผ ์˜ˆ์ธกํ•˜๊ณ  ์‹ถ์„ ๋•Œ (์ง‘ ๊ฐ’, ์ฃผ์‹ ๊ฐ€๊ฒฉ ๋“ฑ) ์ง์„  ๊ทธ์–ด์„œ ๋Œ€์ถฉ ์˜ˆ์ธกํ•˜๋Š” ์• 
๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ (Logistic Regression) 0 or 1๋กœ ๋ถ„๋ฅ˜ํ•˜๊ณ  ์‹ถ์„ ๋•Œ (์ŠคํŒธ ๋ฉ”์ผ์ธ๊ฐ€? ์•„๋‹Œ๊ฐ€?) ํ™•๋ฅ ๋กœ “์˜ˆ/์•„๋‹ˆ์˜ค” ๋งํ•ด์ฃผ๋Š” ์• 
๊ฒฐ์ • ํŠธ๋ฆฌ (Decision Tree) ์งˆ๋ฌธ ๋‚˜๋ฌด์ฒ˜๋Ÿผ “์ด๋Ÿฌ๋ฉด ์™ผ์ชฝ, ์ €๋Ÿฌ๋ฉด ์˜ค๋ฅธ์ชฝ”์œผ๋กœ ๋ถ„๋ฅ˜ ์นœ๊ตฌํ•œํ…Œ “์ด๊ฑฐ ์‚ฌ๋„ ๋ ๊นŒ?” ๋ฌผ์–ด๋ณด๋Š” ๋А๋‚Œ
๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ (Random Forest) ๊ฒฐ์ • ํŠธ๋ฆฌ ์ˆ˜๋ฐฑ ๊ฐœ๊ฐ€ ํˆฌํ‘œํ•ด์„œ ๋” ์ •ํ™•ํ•˜๊ฒŒ ๋งž์ถค ์นœ๊ตฌ 100๋ช…ํ•œํ…Œ ๋ฌผ์–ด๋ด์„œ ๋‹ค์ˆ˜๊ฒฐ๋กœ ๊ฒฐ์ •
XGBoost, LightGBM, CatBoost ์ง€๊ธˆ ๋Œ€ํšŒ์—์„œ ์ œ์ผ ์ž˜ ๋งžํžˆ๋Š” ์• ๋“ค (์„ฑ๋Šฅ ์งฑ) ์ „๊ตญ ์ˆ˜์„๊ธ‰ ์นœ๊ตฌ๋“ค
์‹ ๊ฒฝ๋ง/๋”ฅ๋Ÿฌ๋‹ (Neural Network) ์ด๋ฏธ์ง€, ์Œ์„ฑ, ํ…์ŠคํŠธ์ฒ˜๋Ÿผ ์—„์ฒญ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ๋‡Œ๋ฅผ ํ‰๋‚ด๋‚ธ ์Šˆํผ ๋˜‘๋˜‘์ด

์ •๋ฆฌํ•˜๋ฉด

๋„ˆ๊ฐ€ ์ง€๊ธˆ ๋งŒ๋“  ์ •์ˆ˜ df = “๋ฌธ์ œ์ง€ + ์ •๋‹ต์ง€” ๋ชจ๋ธ = ๊ทธ๊ฑธ ๋ณด๊ณ  ๊ทœ์น™์„ ๋ฐฐ์šด “๋˜‘๋˜‘ํ•œ ํ•™์ƒ” ํ•™์Šต(training/fit) = ํ•™์ƒ์ด ๋ฌธ์ œ์ง€ ๋ณด๋ฉด์„œ ๊ณต๋ถ€ํ•˜๋Š” ๊ณผ์ • ์˜ˆ์ธก(predict) = ์ƒˆ๋กœ์šด ๋ฌธ์ œ ์ค„ ๋•Œ ์ •๋‹ต ๋งํ•ด์ฃผ๋Š” ๊ฒƒ

๊ทธ๋ž˜์„œ ๋„ˆ๊ฐ€ ์ด์ œ ํ•  ์ผ์€ “์–ด๋А ๋ชจ๋ธ(์–ด๋А ํ•™์ƒ)์„ ์“ธ๊นŒ?” → “๊ทธ ํ•™์ƒํ•œํ…Œ ๋ฐ์ดํ„ฐ ๋ณด์—ฌ์ฃผ๊ณ  ๊ณต๋ถ€์‹œ์ผœ!” → “์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ ์ฃผ๊ณ  ๋‹ต ๋‚ด๋†“๊ฒŒ ํ•ด!” ์ด 3๋‹จ๊ณ„๋งŒ ํ•˜๋ฉด ๋˜๋Š” ๊ฑฐ์•ผ

 

 

 

    def learning(self):
        ic("ํ•™์Šต ์‹œ์ž‘")
        ic("ํ•™์Šต ์™„๋ฃŒ")

    def learning(self):
        logger.info("ํ•™์Šต ์‹œ์ž‘")
        logger.info("ํ•™์Šต ์™„๋ฃŒ")

    def evaluation(self):
        logger.info("ํ‰๊ฐ€ ์‹œ์ž‘")

        logger.info(f'๋กœ์ง€์Šคํ‹ฑ ํšŒ๊ท€ ํ™œ์šฉํ•œ ๊ฒ€์ฆ ์ •ํ™•๋„ {None}')
        logger.info(f'๋žœ๋ค ํฌ๋ ˆ์ŠคํŠธ ํ™œ์šฉํ•œ ๊ฒ€์ฆ ์ •ํ™•๋„ {None}')
        logger.info(f'๋‚˜์ด๋ธŒ๋ฒ ์ด์ฆˆ ํ™œ์šฉํ•œ ๊ฒ€์ฆ ์ •ํ™•๋„ {None}')
        logger.info(f'LightGBM ํ™œ์šฉํ•œ ๊ฒ€์ฆ ์ •ํ™•๋„ {None}')
        logger.info(f'SVM ํ™œ์šฉํ•œ ๊ฒ€์ฆ ์ •ํ™•๋„ {None}')

 

์ด๋ถ€๋ถ„ ์ถ”๊ฐ€ํ•˜๊ธฐ(์„œ๋น„์Šค ๋ถ€๋ถ„)

 

@router.get("/evaluate")
async def evaluate_model():
    """
    ๋ชจ๋ธ ํ‰๊ฐ€ ์‹คํ–‰
    ํ›„ ๋ชจ๋ธ ํ‰๊ฐ€ ๊ฒฐ๊ณผ ๋ฐ˜ํ™˜
    """

 

๋ผ์šฐํ„ฐ์— ์ด ๋ถ€๋ถ„ ์ถ”๊ฐ€

 


@ai.seoeunjin.com/mlservice/app/titanic/titanic_service.py ์—ฌ๊ธฐ์—์„œ ๋ชจ๋ธ๋ง์— ์ฃผ์„๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ this.train์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋Ÿฌ๋‹๊ณผ ํ‰๊ฐ€๊นŒ์ง€ ์ด์–ด์„œ ์ž‘์„ฑํ•ด์ฃผ๊ณ , @ai.seoeunjin.com/mlservice/app/titanic/titanic_router.py @titanic_router.py (73-78) ์—ฌ๊ธฐ์—์„œ ์ฃผ์„๋Œ€๋กœ ์‹คํ–‰ํ•ด์ค˜

 

 


๋‚˜ ์บ๊ธ€์— ํƒ€์ดํƒ€๋‹‰ ๋Œ€ํšŒ์— ์ฐธ๊ฐ€ํ•  ๊ฑฐ์•ผ. ์ง€๊ธˆ ์„œ๋ธŒ๋ฏธ์…˜ ํŒŒ์ผ์„ ๋งŒ๋“ค์—ˆ์–ด. ์ด๊ฑฐ ์ œ์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ฐจ๋ก€๋Œ€๋กœ ์•Œ๋ ค์ค˜. ์‚ฌ์ดํŠธ ์ฃผ์†Œ๋ถ€ํ„ฐ ์–ด๋–ค ํƒญ์„ ์ณ์•ผํ•˜๋Š”์ง€๊นŒ์ง€

 

 

์บ๊ธ€์ด๋ž‘ ์—ฐ๊ฒฐํ•˜๊ธฐ

 

Titanic - Machine Learning from Disaster | Kaggle

 

Titanic

Kaggle profile for Titanic

www.kaggle.com

 

1. ๋จผ์ € ์บ๊ธ€ ์‚ฌ์ดํŠธ ์ ‘์†

https://www.kaggle.com → ๋กœ๊ทธ์ธ ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธ (๊ตฌ๊ธ€/๊นŠ์ด๋‚˜ ์ด๋ฉ”์ผ๋กœ ๋กœ๊ทธ์ธ)

2. ํƒ€์ดํƒ€๋‹‰ ๋Œ€ํšŒ ํŽ˜์ด์ง€๋กœ ์ด๋™

์ง์ ‘ ๋งํฌ: https://www.kaggle.com/competitions/titanic (๋˜๋Š” kaggle.com ๋ฉ”์ธ → ์ƒ๋‹จ ๋ฉ”๋‰ด “Competitions” ํด๋ฆญ → ๊ฒ€์ƒ‰์ฐฝ์— “titanic” ์น˜๋ฉด ๋ฐ”๋กœ ๋‚˜์™€)

3. ๋Œ€ํšŒ ํŽ˜์ด์ง€์—์„œ ์ œ์ถœํ•˜๋Š” ์ •ํ™•ํ•œ ์ˆœ์„œ

  1. ๋Œ€ํšŒ ํŽ˜์ด์ง€์— ๋“ค์–ด๊ฐ€๋ฉด ์ƒ๋‹จ์— ํƒญ๋“ค์ด ์žˆ์–ด Overview → Data → Code → Discussion → Leaderboard → Rules → Submit Predictions → ๋งจ ์˜ค๋ฅธ์ชฝ์— ์žˆ๋Š” “Submit Predictions” ํƒญ์„ ํด๋ฆญ!
  2. “Submit Predictions” ํŽ˜์ด์ง€๋กœ ๋“ค์–ด์˜ค๋ฉด ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์–ด
    • Drag and drop your file here (ํŒŒ์ผ ๋“œ๋ž˜๊ทธ ์•ค ๋“œ๋กญ)
    • Or select file (ํŒŒ์ผ ์„ ํƒ ๋ฒ„ํŠผ)
  3. ๋„ˆ๊ฐ€ ๋งŒ๋“  submission ํŒŒ์ผ์„ ์—…๋กœ๋“œ
    • ํŒŒ์ผ๋ช…์€ ๋ณดํ†ต submission.csv (gender_submission.csv ๊ธฐ๋ฐ˜์œผ๋กœ ๋งŒ๋“  ๊ฑฐ๋ฉด ์ด ์ด๋ฆ„ ์ถ”์ฒœ)
    • ํ•„์ˆ˜ ์ปฌ๋Ÿผ: PassengerId์™€ Survived๋งŒ ์žˆ์œผ๋ฉด ๋จ
    • ์˜ˆ์‹œ ์ฒซ ๋ช‡ ์ค„:
      csv
       
      PassengerId,Survived
      892,0
      893,1
      894,0
      ...
  4. ํŒŒ์ผ ์—…๋กœ๋“œ ํ›„ ์•„๋ž˜์— ๋‘ ๊ฐ€์ง€ ์˜ต์…˜ ๋‚˜์™€
    • Message (optional): ์—ฌ๊ธฐ์— ๋ญ ์ผ๋Š”์ง€ ์ ์œผ๋ฉด ๋จ (์˜ˆ: “LightGBM 5-fold”, “my first submission” ๋“ฑ)
    • Submission type (์ค‘์š”!)
      • ๋ณดํ†ต์€ Regular ์„ ํƒ (๋Œ€๋ถ€๋ถ„ ์ด๊ฑฐ)
      • ๋งŒ์•ฝ ๋…ธํŠธ๋ถ์—์„œ ๋ฐ”๋กœ ์ œ์ถœํ•˜๋Š” ๊ฑฐ๋ฉด “From notebook”๋„ ์žˆ์ง€๋งŒ, ๋„ˆ๋Š” ํŒŒ์ผ ์ œ์ถœ์ด๋‹ˆ๊นŒ Regular
  5. ํŒŒ๋ž€์ƒ‰ “Submit” ๋ฒ„ํŠผ ํด๋ฆญ!

4. ์ œ์ถœ ํ›„ ํ™•์ธ ๋ฐฉ๋ฒ•

  • ์ œ์ถœ์ด ์™„๋ฃŒ๋˜๋ฉด ๋ฐ”๋กœ “My Submissions” ํƒญ์œผ๋กœ ์ด๋™ํ•ด
  • ์ƒ๋‹จ์— “Processing” → ์ ์ˆ˜ ๊ณ„์‚ฐ๋˜๋ฉด Public Score๊ฐ€ ๋‚˜์™€
  • ํƒ€์ดํƒ€๋‹‰์€ Public LB๊ฐ€ 40%, Private LB๊ฐ€ 60%๋ผ์„œ ๋Œ€ํšŒ ๋๋‚  ๋•Œ๊นŒ์ง€ ์ตœ์ข… ์ˆœ์œ„๋Š” ์ˆจ๊ฒจ์ ธ ์žˆ์–ด

ํŒ

  • ์ฒ˜์Œ ์ œ์ถœํ•˜๋ฉด ๊ฑฐ์˜ 0.7xxx (70%๋Œ€) ๋‚˜์™€์•ผ ์ •์ƒ
  • gender_submission.csv ๊ทธ๋Œ€๋กœ ์ œ์ถœํ•˜๋ฉด ์ •ํ™•ํžˆ 0.76555 ๋‚˜์˜ด (์—ฌ์ž๋งŒ ์‚ด๋ฆฐ ๋ฒ ์ด์Šค๋ผ์ธ)
  • ํ•˜๋ฃจ์— ์ตœ๋Œ€ 10๋ฒˆ๊นŒ์ง€ ์ œ์ถœ ๊ฐ€๋Šฅ (ํƒ€์ดํƒ€๋‹‰ ๊ธฐ์ค€)

 

 


 

submission csvํŒŒ์ผ ๋งŒ๋“ค์–ด์ค˜ kaggle์— ์ œ์ถœํ• ๊ฑฐ์•ผ

 

 

 

@download ์—ฌ๊ธฐ์— kaggle์— ์ œ์ถœํ•˜๋ ค๋Š”๋ฐ ๋ชจ๋ธ์„ ์ƒ์„ฑํ•ด์ฃผ๊ณ , submission.csv ํŒŒ์ผ ๋งŒ๋“ค์–ด์ค˜

 

 

 


 

 

@router.get("/submit")
async def submit_prediction(model_name: Optional[str] = Query(None, description="์‚ฌ์šฉํ•  ๋ชจ๋ธ ์ด๋ฆ„ (์„ ํƒ์‚ฌํ•ญ)")):
    """
    Kaggle ์ œ์ถœ์šฉ submission.csv ํŒŒ์ผ ์ƒ์„ฑ
   
    Args:
        model_name: ์‚ฌ์šฉํ•  ๋ชจ๋ธ ์ด๋ฆ„ (์ฟผ๋ฆฌ ํŒŒ๋ผ๋ฏธํ„ฐ, ์„ ํƒ์‚ฌํ•ญ)
                    None์ด๋ฉด random_forest ์‚ฌ์šฉ
                    ์‚ฌ์šฉ ๊ฐ€๋Šฅ: logistic_regression, random_forest, naive_bayes, svm, knn
   
    Returns:
        ์ƒ์„ฑ๋œ submission.csv ํŒŒ์ผ ์ •๋ณด
    """
    try:
        # ๋ชจ๋ธ์ด ํ•™์Šต๋˜์ง€ ์•Š์•˜์œผ๋ฉด ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ์‹คํ–‰
        if not titanic_service.models:
            titanic_service.preprocess()
            titanic_service.modeling()
            titanic_service.learning()
            titanic_service.evaluation()
       
        # submission.csv ์ƒ์„ฑ
        submission_path = titanic_service.submit(model_name=model_name)
       
        return {
            "status": "success",
            "message": "submission.csv ํŒŒ์ผ ์ƒ์„ฑ ์™„๋ฃŒ",
            "file_path": submission_path,
            "model_used": model_name or "random_forest"
        }
    except Exception as e:
        raise HTTPException(status_code=500, detail=f"์ œ์ถœ ํŒŒ์ผ ์ƒ์„ฑ ์ค‘ ์˜ค๋ฅ˜ ๋ฐœ์ƒ: {str(e)}")

 

@titanic_router.py (101-132) ์ด๊ฒƒ์„ ์‹คํ–‰ํ•˜๋ฉด, @download ์—ฌ๊ธฐ์— ์บ๊ธ€์— ์ œ์ถœํ•  csv๋ฅผ, ์ •ํ™•๋„๊ฐ€ ๊ฐ€์žฅ ๋†’์€ SVM๊ณผ ๊ฒฐํ•ฉํ•œ ๋‚ด์šฉ์œผ๋กœ ๋œ ํŒŒ์ผ์„ ์ €์žฅํ•ด์ค˜

 

-๊ณ„์† ์•ˆ๋ผ์„œ ํด๋กœ๋“œ๋ž‘ ์”จ๋ฆ„ ํ•œ ๋์— ์ด๊ฒผ์Œ ๊ฒฐ๊ตญ csv ํŒŒ์ผ ์ƒ์„ฑ๋˜๊ณ  ์—…๋กœ๋“œํ•˜๋‹ˆ ๋žญํ‚น ๋ณผ ์ˆ˜ ์žˆ์Œ

 

๋žญํฌ ์˜ฌ๋ฆฌ๋ ค๋ฉด ์ „์ฒ˜๋ฆฌ ๊ณผ์ •๋“ฑ์„ ์ถ”๊ฐ€ํ•˜๋ฉด ์˜ฌ๋ผ๊ฐ„๋‹ค๊ณ  ํ•จ

๊ฐ•์‚ฌ๋‹˜์ด ๋‚ด ๋ฐ์ดํ„ฐ(๋…ผ๋ฌธ)๊ณผ ๊ด€๋ จํ•ด์„œ๋„ ํ•ด๋ณด๋ผ๊ณ  ํ•˜์˜€์Œ

-๋‚ด์ผ ์ด๋ถ€๋ถ„ gpt๋กœ ์•Œ์•„๋ณด๊ณ  ํ•  ์ˆ˜ ์žˆ์œผ๋ฉด ํ•ด๋ณด๊ธฐ!