This article shows the most common methods regarding data selection
First, let’s create a dataframe.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(5,4), index = ['a', 'b', 'c', 'd', 'e'], columns= ['A', 'B', 'C', 'D'])
df
|
A |
B |
C |
D |
a |
0.122248 |
0.581777 |
0.972888 |
0.869366 |
b |
0.476451 |
0.453979 |
0.004705 |
0.644530 |
c |
0.954790 |
0.747131 |
0.652936 |
0.758767 |
d |
0.077122 |
0.407514 |
0.019102 |
0.553546 |
e |
0.921520 |
0.157199 |
0.371028 |
0.825792 |
Use [] for columns selection
|
A |
B |
a |
0.122248 |
0.581777 |
b |
0.476451 |
0.453979 |
c |
0.954790 |
0.747131 |
d |
0.077122 |
0.407514 |
e |
0.921520 |
0.157199 |
Select a range
|
A |
B |
C |
D |
b |
0.476451 |
0.453979 |
0.004705 |
0.644530 |
c |
0.954790 |
0.747131 |
0.652936 |
0.758767 |
Use .loc
df.loc['a':'c', ['A','B']]
|
A |
B |
a |
0.122248 |
0.581777 |
b |
0.476451 |
0.453979 |
c |
0.954790 |
0.747131 |
Select all columns
|
A |
B |
C |
D |
a |
0.122248 |
0.581777 |
0.972888 |
0.869366 |
b |
0.476451 |
0.453979 |
0.004705 |
0.644530 |
c |
0.954790 |
0.747131 |
0.652936 |
0.758767 |
Select with boolean
|
A |
B |
C |
D |
b |
0.476451 |
0.453979 |
0.004705 |
0.644530 |
c |
0.954790 |
0.747131 |
0.652936 |
0.758767 |
e |
0.921520 |
0.157199 |
0.371028 |
0.825792 |
Select with callable(lambda)
df[lambda df: df['A']>0.2]
|
A |
B |
C |
D |
b |
0.476451 |
0.453979 |
0.004705 |
0.644530 |
c |
0.954790 |
0.747131 |
0.652936 |
0.758767 |
e |
0.921520 |
0.157199 |
0.371028 |
0.825792 |