728x90
데이터는 Kaggle에 있는 데이터를 사용했습니다.
https://www.kaggle.com/ternaryrealm/airlines-passenger-data
In [23]:
#tistory 관련 코드(필요없음)
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
In [1]:
#! pip install keras
Keras는 딥러닝 인터페이스 라이브러리로 고수준의 neural networks API.
In [2]:
import pandas as pd
import numpy as np
import keras
pd.set_option("display.max_columns",500) #생략없이 출력 가능
pd.set_option("display.max_rows",999)
In [3]:
dataframe = pd.read_csv("C://Users//82106//Desktop//data//airlines-passenger-data//international-airline-passengers.csv",usecols=[1],engine='python',skipfooter=2)
In [4]:
dataframe
Out[4]:
LOAD DATASET¶
In [5]:
#데이터 전처리. MinMaxScaler는 최대값이 각각 1, 최소값이 0이 되도록 변환
# https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
from sklearn.preprocessing import MinMaxScaler
In [6]:
#scale dataset
dataset = dataframe.values
dataset = dataset.astype('float32')
scaler = MinMaxScaler(feature_range=(0,1))
dataset = scaler.fit_transform(dataset)
In [7]:
#split dataset
train_size = int(len(dataset)*0.67)
test_size = len(dataset)-train_size
train,test = dataset[0:train_size,:],dataset[train_size:len(dataset),:]
In [8]:
def create_dataset(dataset, look_back = 1):
dataX, dataY = [],[]
for i in range(len(dataset)-look_back-1):
a = dataset[i:(i+look_back),0]
dataX.append(a)
dataY.append(dataset[i+look_back,0])
return np.array(dataX),np.array(dataY)
In [9]:
look_back = 4
x_train, y_train = create_dataset(train, look_back)
x_test, y_test = create_dataset(test, look_back)
In [10]:
# reshape input to be [samples, time steps, features]
x_train = np.reshape(x_train, (x_train.shape[0], 1, x_train.shape[1]))
x_test = np.reshape(x_test, (x_test.shape[0], 1, x_test.shape[1]))
CREATE AND TRAIN MODEL¶
In [11]:
from keras.models import Sequential
from keras.layers import LSTM, Dense
from keras import regularizers
In [12]:
#LSTM 설명 : https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr
model = Sequential()
model.add(LSTM(4,input_shape=(1,look_back)))
model.add(Dense(1))
model.compile(loss='mean_squared_error',optimizer='adam')
In [13]:
# train
model.fit(x_train, y_train, epochs=100, batch_size=1, verbose=2)
Out[13]:
In [14]:
# model summary
model.summary()
EVALUATE MODEL¶
In [15]:
import math
from sklearn.metrics import mean_squared_error
In [16]:
# make predictions
train_predict = model.predict(x_train)
test_predict = model.predict(x_test)
In [17]:
# invert predictions
train_predict = scaler.inverse_transform(train_predict)
y_train = scaler.inverse_transform([y_train])
test_predict = scaler.inverse_transform(test_predict)
y_test = scaler.inverse_transform([y_test])
In [18]:
# calculate root mean squared error
train_score = math.sqrt(mean_squared_error(y_train[0], train_predict[:,0]))
print('Train Score: %.2f RMSE' % (train_score))
test_score = math.sqrt(mean_squared_error(y_test[0], test_predict[:,0]))
print('Test Score: %.2f RMSE' % (test_score))
PLOT ACTUAL DATA AND TRAIN-TEST PREDICTIONS¶
In [19]:
%matplotlib inline
import matplotlib.pyplot as plt
In [20]:
# shift train predictions for plotting
trainPredictPlot = np.empty_like(dataset)
trainPredictPlot[:, :] = np.nan
trainPredictPlot[look_back:len(train_predict)+look_back, :] = train_predict
In [21]:
# shift test predictions for plotting
testPredictPlot = np.empty_like(dataset)
testPredictPlot[:, :] = np.nan
testPredictPlot[len(train_predict)+(look_back*2)+1:len(dataset)-1, :] = test_predict
In [22]:
# plot baseline and predictions
plt.plot(scaler.inverse_transform(dataset),label = "dataset") # real values
plt.plot(trainPredictPlot,label = 'train') # train values
plt.plot(testPredictPlot,label = 'test') # test values
plt.legend(loc="upper left")
plt.show()
반응형
'나는야 데이터사이언티스트 > PYTHON' 카테고리의 다른 글
[Python]Dataframe에서 like 검색-str.startswith() , str.contains() (0) | 2020.05.04 |
---|---|
[Python]Jupyter Notebook 잘 사용하기 (0) | 2020.04.06 |
[Python] 시계열 데이터 분석 - 기초버전 (1) | 2020.03.27 |
[Python] 용량이 큰 CSV 파일 빠르게 불러오기 (1) | 2020.03.23 |
[Python]데이터 시각화, 연관성 분석 heat map, pairplot 그리기 (0) | 2020.03.22 |