[파이썬] 데이터프레임 행 반복 처리

Python

[파이썬] 데이터프레임 행 반복 처리 - iterrows, itertuples

weweGH 2025. 4. 24. 09:00

데이터프레임 행 반복 처리 - iterrows, itertuples

들어가며

파이썬에서 데이터프레임의 행을 반복해서 처리하는 작업이 필요할 때 사용하는 iterrows와 itertuples에 대해 소개합니다. 이 글에서 활용할 데이터는 seaborn의 iris 데이터입니다.

import seaborn as sns
df = sns.load_dataset('iris')
df.head()

iterrows
itertuples

iterrows

iterrows는 Pandas DataFrame에서 각 행을 반복할 수 있게 해주는 메서드입니다. 이 메서드는 DataFrame의 행을 순차적으로 처리할 때 유용하며, 각 행에 대해 행의 인덱스와 데이터를 반환합니다.

for index, row in df.iterrows():
    print(f"Index: {index}")
    print(f"species: {row['species']}, sepal_length: {row['sepal_length']}, sepal_width: {row['sepal_width']}\n")

itertuples

itertuples는 Pandas 데이터프레임에서 각 행을 namedtuple 형태로 반복하게 해주는 메서드입니다. iterrows와 유사하지만, 성능이 더 뛰어나며 일반적으로 큰 DataFrame을 다룰 때 선호됩니다.

iterrows와 동일한 결과를 출력한다면 다음과 같습니다. itertuples를 사용하면 각 행을 namedtuple로 반환하므로 속성에 점 표기법으로 접근할 수 있습니다. row의 species 변수를 출력한다면, row.species로 표현합니다.

for row in df.itertuples():
    print(f"Index: {row.Index}")
    print(f"species: {row.species}, sepal_length: {row.sepal_length}, sepal_width: {row.sepal_width}\n")

index를 제외한 결과를 출력한다면 다음과 같습니다. index=False 옵션을 통해 index를 제외한 결과를 출력합니다.

for row in df.itertuples(index=False):
    print(row)

데이터프레임의 품질을 확인해야 할 때, itertuple을 활용하면 에러 데이터를 빠르고 편리하게 출력할 수 있습니다. 예를 들어 iris 데이터에서 sepal_width가 3.0이면 이상 데이터라고 가정했을 때, 이상 데이터만 출력하는 방법은 다음과 같습니다.

for row in df.itertuples():
    if row.sepal_width == 3.0:
        print(row.Index, ': sepal width check required')

전체 코드

## iris dataset

import seaborn as sns
df = sns.load_dataset('iris')
df.head()

# --------------------------------------------------------------------------------- */

## iterrows

# row 출력
for index, row in df.iterrows():
    print(f"Index: {index}")
    print(f"species: {row['species']}, sepal_length: {row['sepal_length']}, sepal_width: {row['sepal_width']}\n")

# --------------------------------------------------------------------------------- */

## itertuples

# row 출력
for row in df.itertuples():
    print(f"Index: {row.Index}")
    print(f"species: {row.species}, sepal_length: {row.sepal_length}, sepal_width: {row.sepal_width}\n")


# row 출력 - index 제외
for row in df.itertuples(index=False):
    print(row)


# row 출력 - 이상 데이터 출력
for row in df.itertuples():
    if row.sepal_width == 3.0:
        print(row.Index, ': sepal width check required')

# --------------------------------------------------------------------------------- */

저작자표시 비영리 변경금지 (새창열림)

'Python' 카테고리의 다른 글

[파이썬] API를 활용한 유튜브 크롤링 - 댓글, 조회수, 좋아요 수 수집 (0)	2025.04.28
[파이썬] 날짜 차이 계산, 날짜 범위 리스트 생성 - DateOffset, date_range (5)	2025.04.27
[파이썬] 토이 프로젝트 - 테니스 코트 정보 확인(날씨 예보, 주차 등) (2)	2025.04.23
[파이썬] Slack API를 활용한 슬랙 메시지 전송 (0)	2025.04.22
[파이썬] 파이썬을 활용한 좌표 변환 - pyproj (0)	2025.04.20

현재글[파이썬] 데이터프레임 행 반복 처리 - iterrows, itertuples

안녕하세요 5년차 데이터사이언티스트/데이터분석가 GH입니다. 지금까지 겪었던 시행착오에 관해 기록합니다. 찾아와주셔서 감사합니다. ** 한걸음(명): 쉬지 아니하고 더 나아가 걷는 걸음이나 움직임. e-mail: wewegahyun@gmail.com

160x600

numpy, print, BeautifulSoup, Error, KONLPY, datetime, OS, Folium, matplotlib, openweather, API, tfidvectorizer, pandas, cx_oracle, Python, pivot_table, AttributeError, 폴더, 파이썬, 딥러닝,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

데이터 사이언스로 한걸음