나는야 데이터사이언티스트/PYTHON

[Python]시계열 데이터 모델링 - 기초버전

우주먼지의하루 2020. 4. 5. 03:37
728x90

데이터는 Kaggle에 있는 데이터를 사용했습니다.

https://www.kaggle.com/ternaryrealm/airlines-passenger-data

 

Airlines Passenger Data

 

www.kaggle.com

 

 

 

time series_2
In [23]:
#tistory 관련 코드(필요없음)
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
In [1]:
#! pip install keras

Keras는 딥러닝 인터페이스 라이브러리로 고수준의 neural networks API.

In [2]:
import pandas as pd
import numpy as np
import keras

pd.set_option("display.max_columns",500) #생략없이 출력 가능
pd.set_option("display.max_rows",999)
Using TensorFlow backend.
In [3]:
dataframe = pd.read_csv("C://Users//82106//Desktop//data//airlines-passenger-data//international-airline-passengers.csv",usecols=[1],engine='python',skipfooter=2)
In [4]:
dataframe
Out[4]:
International airline passengers: monthly totals in thousands. Jan 49 ? Dec 60
0 112
1 118
2 132
3 129
4 121
5 135
6 148
7 148
8 136
9 119
10 104
11 118
12 115
13 126
14 141
15 135
16 125
17 149
18 170
19 170
20 158
21 133
22 114
23 140
24 145
25 150
26 178
27 163
28 172
29 178
30 199
31 199
32 184
33 162
34 146
35 166
36 171
37 180
38 193
39 181
40 183
41 218
42 230
43 242
44 209
45 191
46 172
47 194
48 196
49 196
50 236
51 235
52 229
53 243
54 264
55 272
56 237
57 211
58 180
59 201
60 204
61 188
62 235
63 227
64 234
65 264
66 302
67 293
68 259
69 229
70 203
71 229
72 242
73 233
74 267
75 269
76 270
77 315
78 364
79 347
80 312
81 274
82 237
83 278
84 284
85 277
86 317
87 313
88 318
89 374
90 413
91 405
92 355
93 306
94 271
95 306
96 315
97 301
98 356
99 348
100 355
101 422
102 465
103 467
104 404
105 347
106 305
107 336
108 340
109 318
110 362
111 348
112 363
113 435
114 491
115 505
116 404
117 359
118 310
119 337
120 360
121 342
122 406
123 396
124 420
125 472
126 548
127 559
128 463
129 407
130 362
131 405
132 417
133 391
134 419
135 461
136 472
137 535
138 622
139 606
140 508
141 461
142 390
143 432

LOAD DATASET

In [5]:
#데이터 전처리. MinMaxScaler는 최대값이 각각 1, 최소값이 0이 되도록 변환
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
from sklearn.preprocessing import MinMaxScaler
In [6]:
#scale dataset
dataset = dataframe.values
dataset = dataset.astype('float32')
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(dataset)
In [7]:
#split dataset
train_size = int(len(dataset)*0.67)
test_size = len(dataset)-train_size

train,test = dataset[0:train_size,:],dataset[train_size:len(dataset),:]
In [8]:
def create_dataset(dataset, look_back = 1):
    dataX, dataY = [],[]
    for i in range(len(dataset)-look_back-1):
        a = dataset[i:(i+look_back),0]
        dataX.append(a)
        dataY.append(dataset[i+look_back,0])
    return np.array(dataX),np.array(dataY)
In [9]:
look_back = 4
x_train, y_train = create_dataset(train, look_back)
x_test, y_test = create_dataset(test, look_back)
In [10]:
# reshape input to be [samples, time steps, features]
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
x_test = np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))

CREATE AND TRAIN MODEL

In [11]:
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras import regularizers
In [12]:
#LSTM 설명 : https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr

model = Sequential()
model.add(LSTM(4,input_shape=(1,look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error',optimizer='adam')
In [13]:
# train
model.fit(x_train, y_train, epochs=100, batch_size=1, verbose=2)
Epoch 1/100
 - 0s - loss: 0.0087
Epoch 2/100
 - 0s - loss: 0.0051
Epoch 3/100
 - 0s - loss: 0.0048
Epoch 4/100
 - 0s - loss: 0.0045
Epoch 5/100
 - 0s - loss: 0.0043
Epoch 6/100
 - 0s - loss: 0.0041
Epoch 7/100
 - 0s - loss: 0.0040
Epoch 8/100
 - 0s - loss: 0.0040
Epoch 9/100
 - 0s - loss: 0.0038
Epoch 10/100
 - 0s - loss: 0.0038
Epoch 11/100
 - 0s - loss: 0.0037
Epoch 12/100
 - 0s - loss: 0.0036
Epoch 13/100
 - 0s - loss: 0.0036
Epoch 14/100
 - 0s - loss: 0.0036
Epoch 15/100
 - 0s - loss: 0.0035
Epoch 16/100
 - 0s - loss: 0.0034
Epoch 17/100
 - 0s - loss: 0.0034
Epoch 18/100
 - 0s - loss: 0.0033
Epoch 19/100
 - 0s - loss: 0.0032
Epoch 20/100
 - 0s - loss: 0.0032
Epoch 21/100
 - 0s - loss: 0.0032
Epoch 22/100
 - 0s - loss: 0.0031
Epoch 23/100
 - 0s - loss: 0.0031
Epoch 24/100
 - 0s - loss: 0.0030
Epoch 25/100
 - 0s - loss: 0.0030
Epoch 26/100
 - 0s - loss: 0.0030
Epoch 27/100
 - 0s - loss: 0.0030
Epoch 28/100
 - 0s - loss: 0.0028
Epoch 29/100
 - 0s - loss: 0.0028
Epoch 30/100
 - 0s - loss: 0.0028
Epoch 31/100
 - 0s - loss: 0.0028
Epoch 32/100
 - 0s - loss: 0.0026
Epoch 33/100
 - 0s - loss: 0.0026
Epoch 34/100
 - 0s - loss: 0.0026
Epoch 35/100
 - 0s - loss: 0.0026
Epoch 36/100
 - 0s - loss: 0.0026
Epoch 37/100
 - 0s - loss: 0.0025
Epoch 38/100
 - 0s - loss: 0.0025
Epoch 39/100
 - 0s - loss: 0.0024
Epoch 40/100
 - 0s - loss: 0.0025
Epoch 41/100
 - 0s - loss: 0.0024
Epoch 42/100
 - 0s - loss: 0.0025
Epoch 43/100
 - 0s - loss: 0.0024
Epoch 44/100
 - 0s - loss: 0.0023
Epoch 45/100
 - 0s - loss: 0.0024
Epoch 46/100
 - 0s - loss: 0.0022
Epoch 47/100
 - 0s - loss: 0.0023
Epoch 48/100
 - 0s - loss: 0.0022
Epoch 49/100
 - 0s - loss: 0.0023
Epoch 50/100
 - 0s - loss: 0.0024
Epoch 51/100
 - 0s - loss: 0.0022
Epoch 52/100
 - 0s - loss: 0.0023
Epoch 53/100
 - 0s - loss: 0.0022
Epoch 54/100
 - 0s - loss: 0.0022
Epoch 55/100
 - 0s - loss: 0.0021
Epoch 56/100
 - 0s - loss: 0.0021
Epoch 57/100
 - 0s - loss: 0.0021
Epoch 58/100
 - 0s - loss: 0.0021
Epoch 59/100
 - 0s - loss: 0.0021
Epoch 60/100
 - 0s - loss: 0.0021
Epoch 61/100
 - 0s - loss: 0.0021
Epoch 62/100
 - 0s - loss: 0.0021
Epoch 63/100
 - 0s - loss: 0.0021
Epoch 64/100
 - 0s - loss: 0.0020
Epoch 65/100
 - 0s - loss: 0.0020
Epoch 66/100
 - 0s - loss: 0.0023
Epoch 67/100
 - 0s - loss: 0.0020
Epoch 68/100
 - 0s - loss: 0.0022
Epoch 69/100
 - 0s - loss: 0.0020
Epoch 70/100
 - 0s - loss: 0.0020
Epoch 71/100
 - 0s - loss: 0.0020
Epoch 72/100
 - 0s - loss: 0.0020
Epoch 73/100
 - 0s - loss: 0.0020
Epoch 74/100
 - 0s - loss: 0.0021
Epoch 75/100
 - 0s - loss: 0.0020
Epoch 76/100
 - 0s - loss: 0.0020
Epoch 77/100
 - 0s - loss: 0.0020
Epoch 78/100
 - 0s - loss: 0.0021
Epoch 79/100
 - 0s - loss: 0.0021
Epoch 80/100
 - 0s - loss: 0.0020
Epoch 81/100
 - 0s - loss: 0.0021
Epoch 82/100
 - 0s - loss: 0.0020
Epoch 83/100
 - 0s - loss: 0.0020
Epoch 84/100
 - 0s - loss: 0.0020
Epoch 85/100
 - 0s - loss: 0.0020
Epoch 86/100
 - 0s - loss: 0.0019
Epoch 87/100
 - 0s - loss: 0.0020
Epoch 88/100
 - 0s - loss: 0.0019
Epoch 89/100
 - 0s - loss: 0.0020
Epoch 90/100
 - 0s - loss: 0.0020
Epoch 91/100
 - 0s - loss: 0.0019
Epoch 92/100
 - 0s - loss: 0.0020
Epoch 93/100
 - 0s - loss: 0.0020
Epoch 94/100
 - 0s - loss: 0.0019
Epoch 95/100
 - 0s - loss: 0.0020
Epoch 96/100
 - 0s - loss: 0.0020
Epoch 97/100
 - 0s - loss: 0.0020
Epoch 98/100
 - 0s - loss: 0.0020
Epoch 99/100
 - 0s - loss: 0.0019
Epoch 100/100
 - 0s - loss: 0.0019
Out[13]:
<keras.callbacks.callbacks.History at 0x2986338de08>
In [14]:
# model summary
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
lstm_1 (LSTM)                (None, 4)                 144       
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 5         
=================================================================
Total params: 149
Trainable params: 149
Non-trainable params: 0
_________________________________________________________________

EVALUATE MODEL

In [15]:
import math
from sklearn.metrics import mean_squared_error
In [16]:
# make predictions
train_predict = model.predict(x_train)
test_predict = model.predict(x_test)
In [17]:
# invert predictions
train_predict = scaler.inverse_transform(train_predict)
y_train = scaler.inverse_transform([y_train])

test_predict = scaler.inverse_transform(test_predict)
y_test = scaler.inverse_transform([y_test])
In [18]:
# calculate root mean squared error
train_score = math.sqrt(mean_squared_error(y_train[0], train_predict[:,0]))
print('Train Score: %.2f RMSE' % (train_score))

test_score = math.sqrt(mean_squared_error(y_test[0], test_predict[:,0]))
print('Test Score: %.2f RMSE' % (test_score))
Train Score: 22.27 RMSE
Test Score: 64.58 RMSE

PLOT ACTUAL DATA AND TRAIN-TEST PREDICTIONS

In [19]:
%matplotlib inline
import matplotlib.pyplot as plt
In [20]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataset)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(train_predict)+look_back, :] = train_predict
In [21]:
# shift test predictions for plotting
testPredictPlot = np.empty_like(dataset)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(train_predict)+(look_back*2)+1:len(dataset)-1, :] = test_predict
In [22]:
# plot baseline and predictions
plt.plot(scaler.inverse_transform(dataset),label = "dataset") # real values
plt.plot(trainPredictPlot,label = 'train') # train values
plt.plot(testPredictPlot,label = 'test') # test values
plt.legend(loc="upper left")
plt.show()
반응형