나는야 데이터사이언티스트/PYTHON

[Python]데이터 시각화, matplotlib & seaborn - line Plot(선 그래프)

우주먼지의하루 2020. 3. 20. 02:42
728x90

데이터는 Kaggle에 있는 bostan marathon 데이터를 참고했다.

 

https://www.kaggle.com/rojour/boston-results

 

Finishers Boston Marathon 2015, 2016 & 2017

This data has the names, times and general demographics of the finishers

www.kaggle.com

 

 

python(line plot)
In [29]:
import pandas as pd
pd.set_option('display.max_columns',500) #생략없이 출력 가능
In [63]:
#tistory 관련 코드(필요없음)
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))
In [41]:
marathon_2015 = pd.read_csv("C://Users//User//Desktop//boston-results/marathon_results_2015.csv")
marathon_2016 = pd.read_csv("C://Users//User//Desktop//boston-results/marathon_results_2016.csv")
marathon_2017 = pd.read_csv("C://Users//User//Desktop//boston-results/marathon_results_2017.csv")

line plot

matplotlib

In [31]:
from matplotlib import pyplot as plt
In [32]:
marathon_2017.head().append(marathon_2017.tail())
Out[32]:
Unnamed: 0 Bib Name Age M/F City State Country Citizen Unnamed: 9 5K 10K 15K 20K Half 25K 30K 35K 40K Pace Proj Time Official Time Overall Gender Division
0 0 11 Kirui, Geoffrey 24 M Keringet NaN KEN NaN NaN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:02:53 0:04:57 - 2:09:37 1 1 1
1 1 17 Rupp, Galen 30 M Portland OR USA NaN NaN 0:15:24 0:30:27 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:03:14 0:04:58 - 2:09:58 2 2 2
2 2 23 Osako, Suguru 25 M Machida-City NaN JPN NaN NaN 0:15:25 0:30:29 0:45:44 1:01:16 1:04:36 1:17:00 1:33:01 1:48:31 2:03:38 0:04:59 - 2:10:28 3 3 3
3 3 21 Biwott, Shadrack 32 M Mammoth Lakes CA USA NaN NaN 0:15:25 0:30:29 0:45:44 1:01:19 1:04:45 1:17:00 1:33:01 1:48:58 2:04:35 0:05:03 - 2:12:08 4 4 4
4 4 9 Chebet, Wilson 31 M Marakwet NaN KEN NaN NaN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:41 2:05:00 0:05:04 - 2:12:35 5 5 5
26405 26405 25166 Steinbach, Paula Eyvonne 61 F Ontario CA USA NaN MI 0:46:44 1:35:41 2:23:35 3:12:44 3:23:31 4:12:06 5:03:08 5:55:18 6:46:57 0:16:24 - 7:09:39 26407 11972 344
26406 26406 25178 Avelino, Andrew R. 25 M Fayetteville NC USA NaN MI 0:32:03 1:05:33 1:52:17 2:49:41 3:00:26 3:50:19 4:50:01 5:53:48 6:54:21 0:16:40 - 7:16:59 26408 14436 4774
26407 26407 27086 Hantel, Johanna 57 F Malvern PA USA NaN NaN 0:53:11 1:43:36 2:32:36 - 3:36:24 4:15:21 5:06:37 6:00:33 6:54:38 0:16:47 - 7:19:37 26409 11973 698
26408 26408 25268 Reilly, Bill 64 M New York NY USA NaN MI 0:40:34 1:27:19 2:17:17 3:11:40 3:22:30 4:06:10 5:07:09 6:06:07 6:56:08 0:16:49 - 7:20:44 26410 14437 1043
26409 26409 25266 Rigsby, Scott 48 M Alpharetta GA USA NaN MI 0:39:36 1:17:12 2:00:10 2:58:55 3:08:16 4:27:14 5:37:13 6:39:07 7:41:23 0:18:15 - 7:58:14 26411 14438 2553
In [42]:
# Convert using pandas to_timedelta method

marathon_2017['5K'] = pd.to_timedelta(marathon_2017['5K'])
marathon_2017['10K'] = pd.to_timedelta(marathon_2017['10K'])
marathon_2017['15K'] = pd.to_timedelta(marathon_2017['15K'])
marathon_2017['20K'] = pd.to_timedelta(marathon_2017['20K'])
marathon_2017['Half'] = pd.to_timedelta(marathon_2017['Half'])
marathon_2017['25K'] = pd.to_timedelta(marathon_2017['25K'])
marathon_2017['30K'] = pd.to_timedelta(marathon_2017['30K'])
marathon_2017['35K'] = pd.to_timedelta(marathon_2017['35K'])
marathon_2017['40K'] = pd.to_timedelta(marathon_2017['40K'])
marathon_2017['Pace'] = pd.to_timedelta(marathon_2017['Pace'])
marathon_2017['Official Time'] = pd.to_timedelta(marathon_2017['Official Time'])
In [26]:
import numpy as np
In [43]:
# Convert time to seconds value using astype method
marathon_2017['5K'] = marathon_2017['5K'].astype('m8[s]').astype(np.int64)
marathon_2017['10K'] = marathon_2017['10K'].astype('m8[s]').astype(np.int64)
marathon_2017['15K'] = marathon_2017['15K'].astype('m8[s]').astype(np.int64)
marathon_2017['20K'] = marathon_2017['20K'].astype('m8[s]').astype(np.int64)
marathon_2017['Half'] = marathon_2017['Half'].astype('m8[s]').astype(np.int64)
marathon_2017['25K'] = marathon_2017['25K'].astype('m8[s]').astype(np.int64)
marathon_2017['30K'] = marathon_2017['30K'].astype('m8[s]').astype(np.int64)
marathon_2017['35K'] = marathon_2017['35K'].astype('m8[s]').astype(np.int64)
marathon_2017['40K'] = marathon_2017['40K'].astype('m8[s]').astype(np.int64)
marathon_2017['Pace'] = marathon_2017['Pace'].astype('m8[s]').astype(np.int64)
marathon_2017['Official Time'] = marathon_2017['Official Time'].astype('m8[s]').astype(np.int64)
In [44]:
marathon_2017.head()
Out[44]:
Unnamed: 0 Bib Name Age M/F City State Country Citizen Unnamed: 9 5K 10K 15K 20K Half 25K 30K 35K 40K Pace Proj Time Official Time Overall Gender Division
0 0 11 Kirui, Geoffrey 24 M Keringet NaN KEN NaN NaN 925 1828 2744 3675 3875 4619 5581 6499 7373 297 - 7777 1 1 1
1 1 17 Rupp, Galen 30 M Portland OR USA NaN NaN 924 1827 2744 3675 3875 4619 5581 6499 7394 298 - 7798 2 2 2
2 2 23 Osako, Suguru 25 M Machida-City NaN JPN NaN NaN 925 1829 2744 3676 3876 4620 5581 6511 7418 299 - 7828 3 3 3
3 3 21 Biwott, Shadrack 32 M Mammoth Lakes CA USA NaN NaN 925 1829 2744 3679 3885 4620 5581 6538 7475 303 - 7928 4 4 4
4 4 9 Chebet, Wilson 31 M Marakwet NaN KEN NaN NaN 925 1828 2744 3675 3875 4619 5581 6521 7500 304 - 7955 5 5 5

linestyle (linestyle을 넣을수도있음)

linestyle='-' #linestyle='solid' #디폴트값, https://matplotlib.org/gallery/lines_bars_and_markers/line_styles_reference.html
linestyle='--' #linestyle='dashed'
linestyle='-.' #linestyle='dashdot'
linestyle=':' #linestyle='dotted'
In [56]:
plt.figure(figsize = (30,10))
plt.plot(marathon_2017.index, marathon_2017['5K'],label = '5K')
plt.plot(marathon_2017.index, marathon_2017['10K'], label = '10K')
plt.plot(marathon_2017.index, marathon_2017['Half'], label = 'Half')
plt.title("time")
#plt.ylabel('time', fontsize=14)
#plt.xlabel('time', fontsize=14)
plt.legend(loc='upper right')
plt.show()

seaborn

In [59]:
import seaborn as sns
In [61]:
plt.figure(figsize=(30,10))
sns.lineplot(x=marathon_2017.index, y=marathon_2017['5K'] )
Out[61]:
<matplotlib.axes._subplots.AxesSubplot at 0x233d900c748>
반응형