DataAnalizi




Verinin Analizi – Tabloların Yorulanması ve Birleştirilmesi – Aykırı Gözlemlerin Değerlendirilmesi

Amacımız bir başvuru sahibinin krediyi geri ödeyip ödeyemeyeceğini

tahmin etmek için geçmiş kredi başvuru verilerini, kişisel verilerini vs kullanmaktır.

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

tahmin edilecek kolon ‘TARGET’ kolonudur. Bu kolon application_train tablosunda bulunur.

TARGET 0 veya 1 değerini alır. 0: geri ödendiği — 1: geri ödenmediği anlamına gelir

bu değer train datasında bilindiği için 0 veya 1 dir.

application_test tablosunda ilgili kısmı tahmin etmeye çalışacağız.

bizim yapacağımız işlem tahmin olduğu için 0-1 aralığında herhangi bir değer alabilir

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

Kullandığım dataset kaggle da bir yarışmada sunulan Home Credit Group

isimli finas şirketinin verileridir.

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

İkisi application(test ve train) olmak üzere toplam 8 adet tablo bulunmaktadır.

ayrıca tüm tablolardaki kolonların bilgisini kısmen açıklayan ‘HomeCredit_columns_description’ isimli

bir adet tablo daha vardır

asıl üstünde çalışılacak tablo ‘application_’ tablosudur. Diğer tabloları yeri geldikçe anlatmaya çalışacağım

Kredilerin kimlikleri (“SK_ID_PREV” ve “SK_ID_CURR”) birincil anahtarlarıdır,

tablolar bu anahtarlarla birbirine bağlanır.

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

verinin analizi ve finansal bilgi gerektiği yerlerde burdaki çözümden yararlandım

https://www.kaggle.com/jsaguiar/lightgbm-with-simple-features/code

ilgili dataset’ e burdan ulaşılabilir

https://www.kaggle.com/c/home-credit-default-risk/data

In [1]:

In [2]:

In [3]:

Out[3]:

AMT_ANNUITY AMT_CREDIT AMT_GOODS_PRICE AMT_INCOME_TOTAL AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_YEAR APARTMENTS_AVG APARTMENTS_MEDI APARTMENTS_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI BASEMENTAREA_MODE CNT_CHILDREN CNT_FAM_MEMBERS CODE_GENDER COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE DAYS_BIRTH DAYS_EMPLOYED DAYS_ID_PUBLISH DAYS_LAST_PHONE_CHANGE DAYS_REGISTRATION DEF_30_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE ELEVATORS_AVG ELEVATORS_MEDI ELEVATORS_MODE EMERGENCYSTATE_MODE ENTRANCES_AVG ENTRANCES_MEDI ENTRANCES_MODE EXT_SOURCE_1 EXT_SOURCE_2 EXT_SOURCE_3 FLAG_CONT_MOBILE FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16 FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_2 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9 FLAG_EMAIL FLAG_EMP_PHONE FLAG_MOBIL FLAG_OWN_CAR FLAG_OWN_REALTY FLAG_PHONE FLAG_WORK_PHONE FLOORSMAX_AVG FLOORSMAX_MEDI FLOORSMAX_MODE FLOORSMIN_AVG FLOORSMIN_MEDI FLOORSMIN_MODE FONDKAPREMONT_MODE HOUR_APPR_PROCESS_START HOUSETYPE_MODE LANDAREA_AVG LANDAREA_MEDI LANDAREA_MODE LIVE_CITY_NOT_WORK_CITY LIVE_REGION_NOT_WORK_REGION LIVINGAPARTMENTS_AVG LIVINGAPARTMENTS_MEDI LIVINGAPARTMENTS_MODE LIVINGAREA_AVG LIVINGAREA_MEDI LIVINGAREA_MODE NAME_CONTRACT_TYPE NAME_EDUCATION_TYPE NAME_FAMILY_STATUS NAME_HOUSING_TYPE NAME_INCOME_TYPE NAME_TYPE_SUITE NONLIVINGAPARTMENTS_AVG NONLIVINGAPARTMENTS_MEDI NONLIVINGAPARTMENTS_MODE NONLIVINGAREA_AVG NONLIVINGAREA_MEDI NONLIVINGAREA_MODE OBS_30_CNT_SOCIAL_CIRCLE OBS_60_CNT_SOCIAL_CIRCLE OCCUPATION_TYPE ORGANIZATION_TYPE OWN_CAR_AGE REGION_POPULATION_RELATIVE REGION_RATING_CLIENT REGION_RATING_CLIENT_W_CITY REG_CITY_NOT_LIVE_CITY REG_CITY_NOT_WORK_CITY REG_REGION_NOT_LIVE_REGION REG_REGION_NOT_WORK_REGION SK_ID_CURR TARGET TOTALAREA_MODE WALLSMATERIAL_MODE WEEKDAY_APPR_PROCESS_START YEARS_BEGINEXPLUATATION_AVG YEARS_BEGINEXPLUATATION_MEDI YEARS_BEGINEXPLUATATION_MODE YEARS_BUILD_AVG YEARS_BUILD_MEDI YEARS_BUILD_MODE
count 356219.000000 3.562550e+05 3.559770e+05 3.562550e+05 308687.000000 308687.000000 308687.000000 308687.000000 308687.000000 308687.000000 176307.000000 176307.000000 176307.000000 148671.000000 148671.000000 148671.000000 356255.000000 356253.000000 356255 107895.000000 107895.000000 107895.000000 356255.000000 356255.000000 356255.000000 356254.000000 356255.000000 355205.000000 355205.000000 167175.000000 167175.000000 167175.000000 188291 177848.000000 177848.000000 177848.000000 162345.000000 3.555870e+05 286622.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.00000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255 356255 356255.000000 356255.000000 179914.000000 179914.000000 179914.000000 115147.000000 115147.000000 115147.000000 113163 356255.000000 178339 145411.000000 145411.000000 145411.000000 356255.000000 356255.000000 113276.000000 113276.000000 113276.000000 178353.000000 178353.000000 178353.000000 356255 356255 356255 356255 356255 354052 109394.000000 109394.000000 109394.000000 160489.000000 160489.000000 160489.000000 355205.000000 355205.000000 244259 356255 121014.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 356255.000000 307511.000000 185200.000000 176021 356255 183392.000000 183392.000000 183392.000000 119949.000000 119949.000000 119949.000000
unique NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 2 NaN NaN NaN NaN NaN NaN NaN NaN 4 NaN 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2 5 6 6 8 7 NaN NaN NaN NaN NaN NaN NaN NaN 18 58 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 7 NaN NaN NaN NaN NaN NaN
top NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN F NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN No NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN N Y NaN NaN NaN NaN NaN NaN NaN NaN reg oper account NaN block of flats NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Cash loans Secondary / secondary special Married House / apartment Working Unaccompanied NaN NaN NaN NaN NaN NaN NaN NaN Laborers Business Entity Type 3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Panel TUESDAY NaN NaN NaN NaN NaN NaN
freq NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 235126 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 185607 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 235235 246970 NaN NaN NaN NaN NaN NaN NaN NaN 85954 NaN 175162 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 326537 252379 228715 316513 183307 288253 NaN NaN NaN NaN NaN NaN NaN NaN 63841 78832 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 77309 63652 NaN NaN NaN NaN NaN NaN
mean 27425.560657 5.877674e+05 5.280200e+05 1.701161e+05 0.006281 0.005808 0.231697 0.304399 0.029995 1.911564 0.118138 0.118549 0.114914 0.088673 0.088178 0.087750 0.414316 2.151858 NaN 0.045045 0.044994 0.042930 -16041.248841 64317.231413 -3002.071163 -978.580852 -4983.593527 0.143452 0.100198 0.079819 0.078930 0.075346 NaN 0.150015 0.149494 0.145471 0.501965 5.148900e-01 0.509350 0.998170 0.000020 0.003537 0.000006 0.003043 0.002535 0.001044 0.008570 0.00023 0.007231 0.000514 0.000036 0.000438 0.000289 0.720504 0.000084 0.015065 0.087976 0.000171 0.082346 0.003977 0.071213 0.818498 0.999994 NaN NaN 0.278612 0.200098 0.227331 0.226922 0.223315 0.232817 0.232504 0.228878 NaN 12.055749 NaN 0.066454 0.067296 0.065092 0.178824 0.040847 0.101495 0.102674 0.106382 0.108089 0.109279 0.106641 NaN NaN NaN NaN NaN NaN 0.008868 0.008697 0.008116 0.028503 0.028386 0.027183 1.425729 1.409468 NaN NaN 12.023741 0.020917 2.050506 2.028932 0.078076 0.229661 0.015649 0.051371 278128.000000 0.080729 0.103193 NaN NaN 0.977889 0.977903 0.977239 0.752283 0.755548 0.759452
std 14732.808190 3.986237e+05 3.660650e+05 2.235068e+05 0.104250 0.079736 0.855949 0.786915 0.191374 1.865338 0.108954 0.109824 0.108745 0.082312 0.082017 0.084076 0.720378 0.907937 NaN 0.077045 0.077140 0.075437 4358.803980 141705.532576 1517.901735 835.063902 3526.968986 0.456579 0.368259 0.135249 0.135133 0.133025 NaN 0.100139 0.100450 0.101088 0.210045 1.897531e-01 0.194141 0.042741 0.004433 0.059366 0.002369 0.055077 0.050282 0.032297 0.092175 0.01517 0.084726 0.022659 0.006041 0.020921 0.017001 0.448752 0.009176 0.121812 0.283261 0.013084 0.274891 0.062942 0.257181 0.385434 0.002369 NaN NaN 0.448317 0.400074 0.145051 0.145453 0.144126 0.161909 0.162419 0.161725 NaN 3.267576 NaN 0.081287 0.082267 0.081911 0.383206 0.197936 0.093418 0.094541 0.098779 0.111194 0.112881 0.112555 NaN NaN NaN NaN NaN NaN 0.047876 0.047519 0.046330 0.069880 0.070574 0.070723 2.599914 2.577724 NaN NaN 11.880848 0.013915 0.510947 0.504586 0.268292 0.420616 0.124113 0.220753 102842.104413 0.272419 0.108041 NaN NaN 0.057929 0.058562 0.063165 0.113267 0.112057 0.110112
min 1615.500000 4.500000e+04 4.050000e+04 2.565000e+04 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 NaN 0.000000 0.000000 0.000000 -25229.000000 -17912.000000 -7197.000000 -4361.000000 -24672.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN 0.000000 0.000000 0.000000 0.013458 8.173617e-08 0.000527 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN 0.000000 NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN NaN NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN 0.000000 0.000253 1.000000 -1.000000 0.000000 0.000000 0.000000 0.000000 100001.000000 0.000000 0.000000 NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
25% 16731.000000 2.700000e+05 2.340000e+05 1.125000e+05 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.057700 0.058300 0.052500 0.044500 0.044100 0.041000 0.000000 2.000000 NaN 0.007900 0.007900 0.007300 -19676.000000 -2781.000000 -4318.000000 -1592.000000 -7477.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN 0.069000 0.069000 0.069000 0.335503 3.949551e-01 0.368969 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 NaN NaN 0.000000 0.000000 0.166700 0.166700 0.166700 0.083300 0.083300 0.083300 NaN 10.000000 NaN 0.018700 0.018800 0.016600 0.000000 0.000000 0.050400 0.051300 0.054200 0.045800 0.046200 0.043100 NaN NaN NaN NaN NaN NaN 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN NaN 5.000000 0.010006 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 189064.500000 0.000000 0.041500 NaN NaN 0.976700 0.976700 0.976700 0.687200 0.691400 0.699400
50% 25078.500000 5.002110e+05 4.500000e+05 1.530000e+05 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.088000 0.087400 0.084000 0.076500 0.076100 0.074900 0.000000 2.000000 NaN 0.021300 0.021000 0.019200 -15755.000000 -1224.000000 -3252.000000 -771.000000 -4502.000000 0.000000 0.000000 0.000000 0.000000 0.000000 NaN 0.137900 0.137900 0.137900 0.506155 5.648491e-01 0.533482 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 NaN NaN 0.000000 0.000000 0.166700 0.166700 0.166700 0.208300 0.208300 0.208300 NaN 12.000000 NaN 0.048200 0.048700 0.045900 0.000000 0.000000 0.075600 0.077000 0.077100 0.074900 0.075400 0.073300 NaN NaN NaN NaN NaN NaN 0.000000 0.000000 0.000000 0.003600 0.003100 0.001100 0.000000 0.000000 NaN NaN 9.000000 0.018850 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 278128.000000 0.000000 0.069000 NaN NaN 0.981600 0.981600 0.981600 0.755200 0.758500 0.764800
75% 34960.500000 7.975575e+05 6.750000e+05 2.025000e+05 0.000000 0.000000 0.000000 0.000000 0.000000 3.000000 0.148500 0.149400 0.146000 0.112300 0.111800 0.112700 1.000000 3.000000 NaN 0.051900 0.051800 0.049300 -12425.000000 -290.000000 -1717.000000 -286.000000 -1995.000000 0.000000 0.000000 0.120000 0.120000 0.120800 NaN 0.206900 0.206900 0.206900 0.673344 6.629285e-01 0.665855 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.00000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 1.000000 NaN NaN 1.000000 0.000000 0.333300 0.333300 0.333300 0.375000 0.375000 0.375000 NaN 14.000000 NaN 0.085800 0.087000 0.084300 0.000000 0.000000 0.121000 0.123100 0.131300 0.131000 0.131200 0.125800 NaN NaN NaN NaN NaN NaN 0.003900 0.003900 0.003900 0.027800 0.026800 0.023200 2.000000 2.000000 NaN NaN 15.000000 0.028663 2.000000 2.000000 0.000000 0.000000 0.000000 0.000000 367191.500000 0.000000 0.128700 NaN NaN 0.986600 0.986600 0.986600 0.823200 0.825600 0.823600
max 258025.500000 4.050000e+06 4.050000e+06 1.170000e+08 9.000000 4.000000 27.000000 261.000000 8.000000 25.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20.000000 21.000000 NaN 1.000000 1.000000 1.000000 -7338.000000 365243.000000 0.000000 0.000000 0.000000 34.000000 24.000000 1.000000 1.000000 1.000000 NaN 1.000000 1.000000 1.000000 0.962693 8.549997e-01 0.896010 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.00000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 NaN NaN 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 NaN 23.000000 NaN 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 NaN NaN NaN NaN NaN NaN 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 354.000000 351.000000 NaN NaN 91.000000 0.072508 3.000000 3.000000 1.000000 1.000000 1.000000 1.000000 456255.000000 1.000000 1.000000 NaN NaN 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000

DAYS_EMPLOYED kolonunun max değerinin çok büyük olduğu görünüyor.

Bu aykırı değer ciddi hatalara sebep olabilir.

In [4]:


In [5]:

Out[5]:

üstedede görüldüğü gibi DAYS_EMPLOYED kolonundaki max data 365243 gibi çok gerçek dışı bir sayıdır.

365 e böldüm 1000 yıl gibi bir değer geldi. Bu sebeple bu kolondaki böyle büyük değerlere Nan değeri atayacağım

In [6]:

In [7]:


In [8]:

Out[8]:

In [9]:

Out[9]:

Görüldüğü gibi M ve F dışında XNA isimli bir veri var bunu silmemiz lazım

In [10]:

Label Encoder ve One Hot Encoding

Bu kısım da kategorik verileri modele uygun hale getireceğiz

label encoder ile text olan kategorik veriler 0, 1, 2 gibi etiketlenecek

modelin hızlı çalışması için gerekli bir dönüşüm

ayrıca one hot encoding ise makine öğrenmesi algoritmalarının etkin çalışması için grekli

bu dönüşümde 0, 1 gibi boolean değerlerle kategorik değerler etiketlenir.

ayrıca bu dönüşümler lightGbm deki bundle işlemini etkin kullanmak için de gereklidir

In [11]:

• DAYS_EMPLOYED_PERC: müşterinin yaşına göre istihdam edilen günlerin yüzdesi

• INCOME_CREDIT_PERC: müşterinin gelirine göre kredi miktarının yüzdesi

• INCOME_PER_PERSON : kişi başı gelir yüzdesi

• ANNUITY_INCOME_PERC: müşterinin gelirine göre kredi yıllık taksit yüzdesi

• PAYMENT_RATE : yıllık ödeme oranının yüzdesi

bu özellikler bankacılık bilgisi gerektirdiği için en üstte belirttiğim çözümden alındı

In [12]:

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

In [13]:

Out[13]:

In [14]:


In [15]:

Out[15]:

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

In [16]:

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

bureau ve bureau_balance tablosu

bu tablolar yapılan bir kredi başvurusuna ait geçmiş başvuru bilgilerini tutar

bureau da bir başvurunun geçmişde birçok geçmiş başvurusu olabilir.

müşterinin diğer finansal kurumlardan aldığı önceki kredilere ilişkin verilerden oluşur.

bureau_balance ise herbir geçmiş başvurunun aylık geri ödemesi ile ilgilidir.

ve yine bu geçmiş başvuruları nasıl kullandığı bilgilerini içerir

Bu veri kümesinde müşterinin diğer finansal kurumlardan aldığı önceki kredilere ilişkin verilerden oluşur.

In [17]:

In [18]:

Out[18]:

SK_ID_CURR SK_ID_BUREAU CREDIT_ACTIVE CREDIT_CURRENCY DAYS_CREDIT CREDIT_DAY_OVERDUE DAYS_CREDIT_ENDDATE DAYS_ENDDATE_FACT AMT_CREDIT_MAX_OVERDUE CNT_CREDIT_PROLONG AMT_CREDIT_SUM AMT_CREDIT_SUM_DEBT AMT_CREDIT_SUM_LIMIT AMT_CREDIT_SUM_OVERDUE CREDIT_TYPE DAYS_CREDIT_UPDATE AMT_ANNUITY
0 215354 5714462 Closed currency 1 -497 0 -153.0 -153.0 NaN 0 91323.0 0.0 NaN 0.0 Consumer credit -131 NaN
1 215354 5714463 Active currency 1 -208 0 1075.0 NaN NaN 0 225000.0 171342.0 NaN 0.0 Credit card -20 NaN
2 215354 5714464 Active currency 1 -203 0 528.0 NaN NaN 0 464323.5 NaN NaN 0.0 Consumer credit -16 NaN
3 215354 5714465 Active currency 1 -203 0 NaN NaN NaN 0 90000.0 NaN NaN 0.0 Credit card -16 NaN
4 215354 5714466 Active currency 1 -629 0 1197.0 NaN 77674.5 0 2700000.0 NaN NaN 0.0 Consumer credit -21 NaN

In [19]:

Out[19]:

SK_ID_BUREAU MONTHS_BALANCE STATUS
0 5715448 0 C
1 5715448 -1 C
2 5715448 -2 C
3 5715448 -3 C
4 5715448 -4 C

In [20]:

In [21]:

Out[21]:

In [22]:

Out[22]:

AMT_ANNUITY AMT_CREDIT AMT_GOODS_PRICE AMT_INCOME_TOTAL AMT_REQ_CREDIT_BUREAU_DAY AMT_REQ_CREDIT_BUREAU_HOUR AMT_REQ_CREDIT_BUREAU_MON AMT_REQ_CREDIT_BUREAU_QRT AMT_REQ_CREDIT_BUREAU_WEEK AMT_REQ_CREDIT_BUREAU_YEAR APARTMENTS_AVG APARTMENTS_MEDI APARTMENTS_MODE BASEMENTAREA_AVG BASEMENTAREA_MEDI BASEMENTAREA_MODE CNT_CHILDREN CNT_FAM_MEMBERS CODE_GENDER COMMONAREA_AVG COMMONAREA_MEDI COMMONAREA_MODE DAYS_BIRTH DAYS_EMPLOYED DAYS_ID_PUBLISH DAYS_LAST_PHONE_CHANGE DAYS_REGISTRATION DEF_30_CNT_SOCIAL_CIRCLE DEF_60_CNT_SOCIAL_CIRCLE ELEVATORS_AVG ELEVATORS_MEDI ELEVATORS_MODE ENTRANCES_AVG ENTRANCES_MEDI ENTRANCES_MODE EXT_SOURCE_1 EXT_SOURCE_2 EXT_SOURCE_3 FLAG_CONT_MOBILE FLAG_DOCUMENT_10 FLAG_DOCUMENT_11 FLAG_DOCUMENT_12 FLAG_DOCUMENT_13 FLAG_DOCUMENT_14 FLAG_DOCUMENT_15 FLAG_DOCUMENT_16 FLAG_DOCUMENT_17 FLAG_DOCUMENT_18 FLAG_DOCUMENT_19 FLAG_DOCUMENT_2 FLAG_DOCUMENT_20 FLAG_DOCUMENT_21 FLAG_DOCUMENT_3 FLAG_DOCUMENT_4 FLAG_DOCUMENT_5 FLAG_DOCUMENT_6 FLAG_DOCUMENT_7 FLAG_DOCUMENT_8 FLAG_DOCUMENT_9 FLAG_EMAIL FLAG_EMP_PHONE ORGANIZATION_TYPE_University ORGANIZATION_TYPE_XNA ORGANIZATION_TYPE_nan WALLSMATERIAL_MODE_Block WALLSMATERIAL_MODE_Mixed WALLSMATERIAL_MODE_Monolithic WALLSMATERIAL_MODE_Others WALLSMATERIAL_MODE_Panel WALLSMATERIAL_MODE_Stone, brick WALLSMATERIAL_MODE_Wooden WALLSMATERIAL_MODE_nan WEEKDAY_APPR_PROCESS_START_FRIDAY WEEKDAY_APPR_PROCESS_START_MONDAY WEEKDAY_APPR_PROCESS_START_SATURDAY WEEKDAY_APPR_PROCESS_START_SUNDAY WEEKDAY_APPR_PROCESS_START_THURSDAY WEEKDAY_APPR_PROCESS_START_TUESDAY WEEKDAY_APPR_PROCESS_START_WEDNESDAY WEEKDAY_APPR_PROCESS_START_nan DAYS_EMPLOYED_PERC INCOME_CREDIT_PERC INCOME_PER_PERSON ANNUITY_INCOME_PERC PAYMENT_RATE b_SK_ID_BUREAU b_DAYS_CREDIT b_CREDIT_DAY_OVERDUE b_DAYS_CREDIT_ENDDATE b_DAYS_ENDDATE_FACT b_AMT_CREDIT_MAX_OVERDUE b_CNT_CREDIT_PROLONG b_AMT_CREDIT_SUM b_AMT_CREDIT_SUM_DEBT b_AMT_CREDIT_SUM_LIMIT b_AMT_CREDIT_SUM_OVERDUE b_DAYS_CREDIT_UPDATE b_AMT_ANNUITY b_CREDIT_ACTIVE_Active b_CREDIT_ACTIVE_Bad debt b_CREDIT_ACTIVE_Closed b_CREDIT_ACTIVE_Sold b_CREDIT_CURRENCY_currency 1 b_CREDIT_CURRENCY_currency 2 b_CREDIT_CURRENCY_currency 3 b_CREDIT_CURRENCY_currency 4 b_CREDIT_TYPE_Another type of loan b_CREDIT_TYPE_Car loan b_CREDIT_TYPE_Cash loan (non-earmarked) b_CREDIT_TYPE_Consumer credit b_CREDIT_TYPE_Credit card b_CREDIT_TYPE_Interbank credit b_CREDIT_TYPE_Loan for business development b_CREDIT_TYPE_Loan for purchase of shares (margin lending) b_CREDIT_TYPE_Loan for the purchase of equipment b_CREDIT_TYPE_Loan for working capital replenishment b_CREDIT_TYPE_Microloan b_CREDIT_TYPE_Mobile operator loan b_CREDIT_TYPE_Mortgage b_CREDIT_TYPE_Real estate loan b_CREDIT_TYPE_Unknown type of loan b_buro_count
0 24700.5 406597.5 351000.0 202500.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0247 0.0250 0.0252 0.0369 0.0369 0.0383 0 1.0 1 0.0143 0.0144 0.0144 -9461 -637.0 -2120 -1134.0 -3648.0 2.0 2.0 0.00 0.00 0.0000 0.0690 0.0690 0.0690 0.083037 0.262949 0.139376 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0.067329 0.498036 202500.0 0.121978 0.060749 6153272.125 -874.00 0.0 -349.0 -697.500000 1681.029 0.0 108131.945625 49156.2 7997.14125 0.0 -499.875 0.0 0.25 0.0 0.75 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 8.0
1 35698.5 1293502.5 1129500.0 270000.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0959 0.0968 0.0924 0.0529 0.0529 0.0538 0 2.0 0 0.0605 0.0608 0.0497 -16765 -1188.0 -291 -828.0 -1186.0 0.0 0.0 0.08 0.08 0.0806 0.0345 0.0345 0.0345 0.311267 0.622246 NaN 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0.070862 0.208736 135000.0 0.132217 0.027598 5885878.500 -1400.75 0.0 -544.5 -1097.333333 0.000 0.0 254350.125000 0.0 202500.00000 0.0 -816.000 NaN 0.25 0.0 0.75 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.5 0.5 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 4.0
2 6750.0 135000.0 135000.0 67500.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN 0 1.0 1 NaN NaN NaN -19046 -225.0 -2531 -815.0 -4260.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN 0.555912 0.729567 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0.011814 0.500000 67500.0 0.100000 0.050000 6829133.500 -867.00 0.0 -488.5 -532.500000 0.000 0.0 94518.900000 0.0 0.00000 0.0 -532.000 NaN 0.00 0.0 1.00 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 2.0
3 29686.5 312682.5 297000.0 135000.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 0 2.0 0 NaN NaN NaN -19005 -3039.0 -2437 -617.0 -9833.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN 0.650442 NaN 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0.159905 0.431748 67500.0 0.219900 0.094941 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 21865.5 513000.0 513000.0 121500.0 0.0 0.0 0.0 0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN 0 1.0 1 NaN NaN NaN -19932 -3038.0 -3458 -1106.0 -4311.0 0.0 0.0 NaN NaN NaN NaN NaN NaN NaN 0.322738 NaN 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0.152418 0.236842 121500.0 0.179963 0.042623 5987200.000 -1149.00 0.0 -783.0 -783.000000 0.000 0.0 146250.000000 0.0 0.00000 0.0 -783.000 NaN 0.00 0.0 1.00 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0

5 rows × 295 columns

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

previous_application tablosu

application tablosunda kredi başvurusu olan ve daha önce Home Credit Group da kredi başvurusu bulunanlar

bureau tablosunda farklı yerlerde olan başvurular vardı

In [23]:

Out[23]:

SK_ID_PREV SK_ID_CURR NAME_CONTRACT_TYPE AMT_ANNUITY AMT_APPLICATION AMT_CREDIT AMT_DOWN_PAYMENT AMT_GOODS_PRICE WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START FLAG_LAST_APPL_PER_CONTRACT NFLAG_LAST_APPL_IN_DAY RATE_DOWN_PAYMENT RATE_INTEREST_PRIMARY RATE_INTEREST_PRIVILEGED NAME_CASH_LOAN_PURPOSE NAME_CONTRACT_STATUS DAYS_DECISION NAME_PAYMENT_TYPE CODE_REJECT_REASON NAME_TYPE_SUITE NAME_CLIENT_TYPE NAME_GOODS_CATEGORY NAME_PORTFOLIO NAME_PRODUCT_TYPE CHANNEL_TYPE SELLERPLACE_AREA NAME_SELLER_INDUSTRY CNT_PAYMENT NAME_YIELD_GROUP PRODUCT_COMBINATION DAYS_FIRST_DRAWING DAYS_FIRST_DUE DAYS_LAST_DUE_1ST_VERSION DAYS_LAST_DUE DAYS_TERMINATION NFLAG_INSURED_ON_APPROVAL
0 2030495 271877 Consumer loans 1730.430 17145.0 17145.0 0.0 17145.0 SATURDAY 15 Y 1 0.0 0.182832 0.867336 XAP Approved -73 Cash through the bank XAP NaN Repeater Mobile POS XNA Country-wide 35 Connectivity 12.0 middle POS mobile with interest 365243.0 -42.0 300.0 -42.0 -37.0 0.0
1 2802425 108129 Cash loans 25188.615 607500.0 679671.0 NaN 607500.0 THURSDAY 11 Y 1 NaN NaN NaN XNA Approved -164 XNA XAP Unaccompanied Repeater XNA Cash x-sell Contact center -1 XNA 36.0 low_action Cash X-Sell: low 365243.0 -134.0 916.0 365243.0 365243.0 1.0
2 2523466 122040 Cash loans 15060.735 112500.0 136444.5 NaN 112500.0 TUESDAY 11 Y 1 NaN NaN NaN XNA Approved -301 Cash through the bank XAP Spouse, partner Repeater XNA Cash x-sell Credit and cash offices -1 XNA 12.0 high Cash X-Sell: high 365243.0 -271.0 59.0 365243.0 365243.0 1.0
3 2819243 176158 Cash loans 47041.335 450000.0 470790.0 NaN 450000.0 MONDAY 7 Y 1 NaN NaN NaN XNA Approved -512 Cash through the bank XAP NaN Repeater XNA Cash x-sell Credit and cash offices -1 XNA 12.0 middle Cash X-Sell: middle 365243.0 -482.0 -152.0 -182.0 -177.0 1.0
4 1784265 202054 Cash loans 31924.395 337500.0 404055.0 NaN 337500.0 THURSDAY 9 Y 1 NaN NaN NaN Repairs Refused -781 Cash through the bank HC NaN Repeater XNA Cash walk-in Credit and cash offices -1 XNA 24.0 high Cash Street: high NaN NaN NaN NaN NaN NaN

In [24]:

In [25]:

In [26]:

Out[26]:

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

POS_CASH_balance tablosu

bureau_balance gibidir, bu tabloda prev_application tablosundaki önceki kredilerin ödemesi hakkında veri içerir.

In [27]:

Out[27]:

SK_ID_PREV SK_ID_CURR MONTHS_BALANCE CNT_INSTALMENT CNT_INSTALMENT_FUTURE NAME_CONTRACT_STATUS SK_DPD SK_DPD_DEF
0 1803195 182943 -31 48.0 45.0 Active 0 0
1 1715348 367990 -33 36.0 35.0 Active 0 0
2 1784872 397406 -32 12.0 9.0 Active 0 0
3 1903291 269225 -35 48.0 42.0 Active 0 0
4 2341044 334279 -35 36.0 35.0 Active 0 0

In [28]:

In [29]:

Out[29]:

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

installments_payments tablosu

Home Credit de önceki krediler için ödeme geçmişi verilerini tutar

yapılan her ödeme için bir satır ve kaçırılan her ödeme için bir satır vardır.

In [30]:

Out[30]:

SK_ID_PREV SK_ID_CURR NUM_INSTALMENT_VERSION NUM_INSTALMENT_NUMBER DAYS_INSTALMENT DAYS_ENTRY_PAYMENT AMT_INSTALMENT AMT_PAYMENT
0 1054186 161674 1.0 6 -1180.0 -1187.0 6948.360 6948.360
1 1330831 151639 0.0 34 -2156.0 -2156.0 1716.525 1716.525
2 2085231 193053 2.0 1 -63.0 -63.0 25425.000 25425.000
3 2452527 199697 1.0 3 -2418.0 -2426.0 24350.130 24350.130
4 2714724 167756 1.0 2 -1383.0 -1366.0 2165.040 2160.585

In [31]:

In [32]:

In [33]:

Out[33]:

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

credit_card_balance tablosu

müşterilerin Home Credit ile aldıkları önceki kredi kartları hakkındaki aylık veriler.

her satır bir aylık kredi kartı bakiyesidir ve tek bir kredi kartında çok sayıda satır olabilir.

In [34]:

Out[34]:

SK_ID_PREV SK_ID_CURR MONTHS_BALANCE AMT_BALANCE AMT_CREDIT_LIMIT_ACTUAL AMT_DRAWINGS_ATM_CURRENT AMT_DRAWINGS_CURRENT AMT_DRAWINGS_OTHER_CURRENT AMT_DRAWINGS_POS_CURRENT AMT_INST_MIN_REGULARITY AMT_PAYMENT_CURRENT AMT_PAYMENT_TOTAL_CURRENT AMT_RECEIVABLE_PRINCIPAL AMT_RECIVABLE AMT_TOTAL_RECEIVABLE CNT_DRAWINGS_ATM_CURRENT CNT_DRAWINGS_CURRENT CNT_DRAWINGS_OTHER_CURRENT CNT_DRAWINGS_POS_CURRENT CNT_INSTALMENT_MATURE_CUM NAME_CONTRACT_STATUS SK_DPD SK_DPD_DEF
0 2562384 378907 -6 56.970 135000 0.0 877.5 0.0 877.5 1700.325 1800.0 1800.0 0.000 0.000 0.000 0.0 1 0.0 1.0 35.0 Active 0 0
1 2582071 363914 -1 63975.555 45000 2250.0 2250.0 0.0 0.0 2250.000 2250.0 2250.0 60175.080 64875.555 64875.555 1.0 1 0.0 0.0 69.0 Active 0 0
2 1740877 371185 -7 31815.225 450000 0.0 0.0 0.0 0.0 2250.000 2250.0 2250.0 26926.425 31460.085 31460.085 0.0 0 0.0 0.0 30.0 Active 0 0
3 1389973 337855 -4 236572.110 225000 2250.0 2250.0 0.0 0.0 11795.760 11925.0 11925.0 224949.285 233048.970 233048.970 1.0 1 0.0 0.0 10.0 Active 0 0
4 1891521 126868 -1 453919.455 450000 0.0 11547.0 0.0 11547.0 22924.890 27000.0 27000.0 443044.395 453919.455 453919.455 0.0 1 0.0 1.0 101.0 Active 0 0

In [35]:

In [36]:

Out[36]:

>>>> verinin analizi son <<<<

•••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••••

LightGBM ile Sınıflandırma İşlemine Başlayalım


LightGBM