나는야 데이터사이언티스트/PYTHON

[Python]데이터 시각화, matplotlib & seaborn - Bar Plot(막대그래프)

우주먼지의하루 2020. 3. 11. 09:40
728x90
데이터 시각화
In [1]:
import pandas as pd
pd.set_option('display.max_columns',500) #생략없이 모두 출력
In [2]:
marathon_2015 = pd.read_csv("C://Users//82106//Desktop//boston-results//marathon_results_2015.csv")
marathon_2016 = pd.read_csv("C://Users//82106//Desktop//boston-results//marathon_results_2016.csv")
marathon_2017 = pd.read_csv("C://Users//82106//Desktop//boston-results//marathon_results_2017.csv")

bar plot

matplotlib

In [3]:
from matplotlib import pyplot as plt
In [4]:
marathon_2017.head().append(marathon_2017.tail())
Out[4]:
Unnamed: 0 Bib Name Age M/F City State Country Citizen Unnamed: 9 5K 10K 15K 20K Half 25K 30K 35K 40K Pace Proj Time Official Time Overall Gender Division
0 0 11 Kirui, Geoffrey 24 M Keringet NaN KEN NaN NaN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:02:53 0:04:57 - 2:09:37 1 1 1
1 1 17 Rupp, Galen 30 M Portland OR USA NaN NaN 0:15:24 0:30:27 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:03:14 0:04:58 - 2:09:58 2 2 2
2 2 23 Osako, Suguru 25 M Machida-City NaN JPN NaN NaN 0:15:25 0:30:29 0:45:44 1:01:16 1:04:36 1:17:00 1:33:01 1:48:31 2:03:38 0:04:59 - 2:10:28 3 3 3
3 3 21 Biwott, Shadrack 32 M Mammoth Lakes CA USA NaN NaN 0:15:25 0:30:29 0:45:44 1:01:19 1:04:45 1:17:00 1:33:01 1:48:58 2:04:35 0:05:03 - 2:12:08 4 4 4
4 4 9 Chebet, Wilson 31 M Marakwet NaN KEN NaN NaN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:41 2:05:00 0:05:04 - 2:12:35 5 5 5
26405 26405 25166 Steinbach, Paula Eyvonne 61 F Ontario CA USA NaN MI 0:46:44 1:35:41 2:23:35 3:12:44 3:23:31 4:12:06 5:03:08 5:55:18 6:46:57 0:16:24 - 7:09:39 26407 11972 344
26406 26406 25178 Avelino, Andrew R. 25 M Fayetteville NC USA NaN MI 0:32:03 1:05:33 1:52:17 2:49:41 3:00:26 3:50:19 4:50:01 5:53:48 6:54:21 0:16:40 - 7:16:59 26408 14436 4774
26407 26407 27086 Hantel, Johanna 57 F Malvern PA USA NaN NaN 0:53:11 1:43:36 2:32:36 - 3:36:24 4:15:21 5:06:37 6:00:33 6:54:38 0:16:47 - 7:19:37 26409 11973 698
26408 26408 25268 Reilly, Bill 64 M New York NY USA NaN MI 0:40:34 1:27:19 2:17:17 3:11:40 3:22:30 4:06:10 5:07:09 6:06:07 6:56:08 0:16:49 - 7:20:44 26410 14437 1043
26409 26409 25266 Rigsby, Scott 48 M Alpharetta GA USA NaN MI 0:39:36 1:17:12 2:00:10 2:58:55 3:08:16 4:27:14 5:37:13 6:39:07 7:41:23 0:18:15 - 7:58:14 26411 14438 2553

남자 여자 참여자 그래프

In [5]:
marathon_2017['M/F'].value_counts()
Out[5]:
M    14438
F    11972
Name: M/F, dtype: int64
In [6]:
#plt.figure(figsize = (20,10))
plt.bar(marathon_2017['M/F'].value_counts().index,marathon_2017['M/F'].value_counts(),color = 'red', 
        alpha = 0.4,width = 0.8, align = "edge") 
#alpha는 투명도, align는 한쪽끝으로 정렬

plt.title("marathon 2017 male VS female")
plt.xlabel('sex', fontsize = 15, rotation = 40) #rotaion은 회전
plt.ylabel("counts", fontsize = 15)
#plt.xticks([1,2]) # x축 단위 바꾸기
plt.show()

참여자 나이

In [7]:
#범주화 하기
bins = [0,10,20,30,40,50,60,70,80,90]
bins_names = ["0세","10대",'20대',"30대",'40대',"50대",'60대',"70대",'80대']
age_categories = pd.cut(marathon_2017['Age'], bins, labels = bins_names)
In [8]:
age_categories=pd.DataFrame(age_categories)
In [9]:
marathon_2017['age_categories'] = age_categories
In [10]:
marathon_2017['age_categories'].value_counts()
Out[10]:
40대    8147
30대    6805
50대    5080
20대    4595
60대    1510
10대     146
70대     123
80대       4
Name: age_categories, dtype: int64
In [11]:
plt.bar(marathon_2017['age_categories'].value_counts().index,marathon_2017['age_categories'].value_counts())
plt.title("age category")
plt.show()
C:\Users\82106\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 45824 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\82106\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 45824 missing from current font.
  font.set_text(s, 0, flags=flags)

seaborn

seaborn 라이브러리는 보통 sns로 불러오는데 이때 matplotlib도 반드시 함께 불러와야만 한다. seaborn은 matplotlib의 확장팩 개념이기 때문이다.

In [12]:
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = [10, 5] # [width, height] (inches)
import seaborn as sns

참여 국가(USA 빼고)

In [13]:
marathon_2017_country = marathon_2017['Country'].isin(['USA'])
marathon_2017_country=marathon_2017[~marathon_2017_country]
In [14]:
plt.figure(figsize = (100,30))
sns.countplot('Country', data = marathon_2017_country)
plt.title("Runner Country (drop USA)", fontsize = 20)
plt.show()
In [15]:
marathon_2017.head()
Out[15]:
Unnamed: 0 Bib Name Age M/F City State Country Citizen Unnamed: 9 5K 10K 15K 20K Half 25K 30K 35K 40K Pace Proj Time Official Time Overall Gender Division age_categories
0 0 11 Kirui, Geoffrey 24 M Keringet NaN KEN NaN NaN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:02:53 0:04:57 - 2:09:37 1 1 1 20대
1 1 17 Rupp, Galen 30 M Portland OR USA NaN NaN 0:15:24 0:30:27 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:19 2:03:14 0:04:58 - 2:09:58 2 2 2 20대
2 2 23 Osako, Suguru 25 M Machida-City NaN JPN NaN NaN 0:15:25 0:30:29 0:45:44 1:01:16 1:04:36 1:17:00 1:33:01 1:48:31 2:03:38 0:04:59 - 2:10:28 3 3 3 20대
3 3 21 Biwott, Shadrack 32 M Mammoth Lakes CA USA NaN NaN 0:15:25 0:30:29 0:45:44 1:01:19 1:04:45 1:17:00 1:33:01 1:48:58 2:04:35 0:05:03 - 2:12:08 4 4 4 30대
4 4 9 Chebet, Wilson 31 M Marakwet NaN KEN NaN NaN 0:15:25 0:30:28 0:45:44 1:01:15 1:04:35 1:16:59 1:33:01 1:48:41 2:05:00 0:05:04 - 2:12:35 5 5 5 30대

참여자 성별, 나이

In [16]:
age_runner=marathon_2017.sort_values(by=['age_categories'])
In [17]:
sns.countplot('age_categories', data = age_runner, hue = 'M/F')
plt.show()
C:\Users\82106\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:211: RuntimeWarning: Glyph 45824 missing from current font.
  font.set_text(s, 0.0, flags=flags)
C:\Users\82106\Anaconda3\lib\site-packages\matplotlib\backends\backend_agg.py:180: RuntimeWarning: Glyph 45824 missing from current font.
  font.set_text(s, 0, flags=flags)
반응형