import pandas as pd

#Tistory 관련 모듈이라 상관 없음.
from IPython.core.display import display, HTML
display(HTML("<style>.container {width:90% !important;}</style>"))

#데이터 가져오기 https://www.kaggle.com/rojour/boston-results
marathon_2017 = pd.read_csv("C://Users//82106//Desktop//boston-results//marathon_results_2017.csv")

marathon_2017.head()

범주화 하기¶

pandas.cut(x, bins, right: bool = True, labels=None, retbins: bool = False, precision: int = 3, include_lowest: bool = False,duplicates: str = 'raise')¶

참고 페이지 : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html ¶

1. 나누고 싶은 범주 정의¶

bins = [0,10,20,30,40,50,60,70,80,90]

2. 범주화 이름 정의¶

bins_names = ["0세","10대",'20대',"30대",'40대',"50대",'60대',"70대",'80대']

3. 범주화하고 싶은 데이터 가져와서 범주화 해주기¶

age_categories = pd.cut(marathon_2017['Age'], bins, labels = bins_names)

age_categories

0        20대
1        20대
2        20대
3        30대
4        30대
        ... 
26405    60대
26406    20대
26407    50대
26408    60대
26409    40대
Name: Age, Length: 26410, dtype: category
Categories (9, object): [0세 < 10대 < 20대 < 30대 ... 50대 < 60대 < 70대 < 80대]

#데이터프레임으로 만들어주기
age_categories=pd.DataFrame(age_categories)

#marathon_2017에 age_categories 만들어주기
marathon_2017['age_categories'] = age_categories

marathon_2017.head()

	Unnamed: 0	Bib	Name	Age	M/F	City	State	Country	Citizen	Unnamed: 9	...	25K	30K	35K	40K	Pace	Proj Time	Official Time	Overall	Gender	Division
0	0	11	Kirui, Geoffrey	24	M	Keringet	NaN	KEN	NaN	NaN	...	1:16:59	1:33:01	1:48:19	2:02:53	0:04:57	-	2:09:37	1	1	1
1	1	17	Rupp, Galen	30	M	Portland	OR	USA	NaN	NaN	...	1:16:59	1:33:01	1:48:19	2:03:14	0:04:58	-	2:09:58	2	2	2
2	2	23	Osako, Suguru	25	M	Machida-City	NaN	JPN	NaN	NaN	...	1:17:00	1:33:01	1:48:31	2:03:38	0:04:59	-	2:10:28	3	3	3
3	3	21	Biwott, Shadrack	32	M	Mammoth Lakes	CA	USA	NaN	NaN	...	1:17:00	1:33:01	1:48:58	2:04:35	0:05:03	-	2:12:08	4	4	4
4	4	9	Chebet, Wilson	31	M	Marakwet	NaN	KEN	NaN	NaN	...	1:16:59	1:33:01	1:48:41	2:05:00	0:05:04	-	2:12:35	5	5	5

	Unnamed: 0	Bib	Name	Age	M/F	City	State	Country	Citizen	Unnamed: 9	...	30K	35K	40K	Pace	Proj Time	Official Time	Overall	Gender	Division	age_categories
0	0	11	Kirui, Geoffrey	24	M	Keringet	NaN	KEN	NaN	NaN	...	1:33:01	1:48:19	2:02:53	0:04:57	-	2:09:37	1	1	1	20대
1	1	17	Rupp, Galen	30	M	Portland	OR	USA	NaN	NaN	...	1:33:01	1:48:19	2:03:14	0:04:58	-	2:09:58	2	2	2	20대
2	2	23	Osako, Suguru	25	M	Machida-City	NaN	JPN	NaN	NaN	...	1:33:01	1:48:31	2:03:38	0:04:59	-	2:10:28	3	3	3	20대
3	3	21	Biwott, Shadrack	32	M	Mammoth Lakes	CA	USA	NaN	NaN	...	1:33:01	1:48:58	2:04:35	0:05:03	-	2:12:08	4	4	4	30대
4	4	9	Chebet, Wilson	31	M	Marakwet	NaN	KEN	NaN	NaN	...	1:33:01	1:48:41	2:05:00	0:05:04	-	2:12:35	5	5	5	30대

[Python]데이터 시각화, 연관성 분석 heat map, pairplot 그리기 (0)	2020.03.22
[Python]데이터 시각화, matplotlib & seaborn - line Plot(선 그래프) (0)	2020.03.20
[Python]데이터 시각화, matplotlib & seaborn - Bar Plot(막대그래프) (0)	2020.03.11
[Python]파이썬 데이터 전처리 기초 정리 (0)	2020.03.05
[Python] sklearn.pipeline, 파이프라인(Pipeline)이란 ? (0)	2020.02.23

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

우주먼지의 하루

[Python]pandas.cut - 데이터 범주화하기 / if문 쓰지않고 데이터 나누기

범주화 하기¶

pandas.cut(x, bins, right: bool = True, labels=None, retbins: bool = False, precision: int = 3, include_lowest: bool = False,duplicates: str = 'raise')¶

참고 페이지 : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html ¶

1. 나누고 싶은 범주 정의¶

2. 범주화 이름 정의¶

3. 범주화하고 싶은 데이터 가져와서 범주화 해주기¶

'나는야 데이터사이언티스트 > PYTHON' 카테고리의 다른 글

'나는야 데이터사이언티스트/PYTHON'의 다른글

티스토리툴바

[Python]pandas.cut - 데이터 범주화하기 / if문 쓰지않고 데이터 나누기

범주화 하기¶

pandas.cut(x, bins, right: bool = True, labels=None, retbins: bool = False, precision: int = 3, include_lowest: bool = False,duplicates: str = 'raise')¶

참고 페이지 : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html¶

1. 나누고 싶은 범주 정의¶

2. 범주화 이름 정의¶

3. 범주화하고 싶은 데이터 가져와서 범주화 해주기¶

'나는야 데이터사이언티스트 > PYTHON' 카테고리의 다른 글

'나는야 데이터사이언티스트/PYTHON'의 다른글

관련글

티스토리툴바

참고 페이지 : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.cut.html ¶