12.3.10.4.10. Selection of data#

import numpy as np
import pandas as pd


dates = pd.date_range("20220501", periods=6)
dataFrame = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))

Getting data

dataFrame["A"]
2022-05-01    0.973799
2022-05-02    0.155131
2022-05-03    0.445839
2022-05-04   -0.613454
2022-05-05   -0.776714
2022-05-06   -0.600595
Freq: D, Name: A, dtype: float64
dataFrame[0:3]
A B C D
2022-05-01 0.973799 -1.856910 1.102405 1.448620
2022-05-02 0.155131 -0.582155 -0.230280 0.691659
2022-05-03 0.445839 -0.988124 0.265889 -1.048844


dataFrame["20220501":"20220502"]
A B C D
2022-05-01 0.973799 -1.856910 1.102405 1.448620
2022-05-02 0.155131 -0.582155 -0.230280 0.691659


Selection by label

dataFrame.loc[dates[0]]
A    0.973799
B   -1.856910
C    1.102405
D    1.448620
Name: 2022-05-01 00:00:00, dtype: float64
dataFrame.loc[:, ["A", "B"]]
A B
2022-05-01 0.973799 -1.856910
2022-05-02 0.155131 -0.582155
2022-05-03 0.445839 -0.988124
2022-05-04 -0.613454 -1.217677
2022-05-05 -0.776714 0.459753
2022-05-06 -0.600595 -0.796993


dataFrame.loc["20220501":"20220502", ["A", "B"]]
A B
2022-05-01 0.973799 -1.856910
2022-05-02 0.155131 -0.582155


dataFrame.loc["20220501", ["A", "B"]]
A    0.973799
B   -1.856910
Name: 2022-05-01 00:00:00, dtype: float64
dataFrame.loc[dates[0], "A"]
0.9737991449933912
dataFrame.at[dates[0], "A"]
0.9737991449933912

Selection by position

dataFrame.iloc[3]
A   -0.613454
B   -1.217677
C    0.144906
D    0.719465
Name: 2022-05-04 00:00:00, dtype: float64
dataFrame.iloc[3:5, 0:2]
A B
2022-05-04 -0.613454 -1.217677
2022-05-05 -0.776714 0.459753


dataFrame.iloc[[1, 2, 4], [0, 2]]
A C
2022-05-02 0.155131 -0.230280
2022-05-03 0.445839 0.265889
2022-05-05 -0.776714 0.131384


dataFrame.iloc[1:3, :]
A B C D
2022-05-02 0.155131 -0.582155 -0.230280 0.691659
2022-05-03 0.445839 -0.988124 0.265889 -1.048844


dataFrame.iloc[:, 1:3]
B C
2022-05-01 -1.856910 1.102405
2022-05-02 -0.582155 -0.230280
2022-05-03 -0.988124 0.265889
2022-05-04 -1.217677 0.144906
2022-05-05 0.459753 0.131384
2022-05-06 -0.796993 -0.133263


dataFrame.iloc[1, 1]
-0.5821550971365247
dataFrame.iat[1, 1]
-0.5821550971365247

Boolean indexing

dataFrame[dataFrame["A"] > 0]
A B C D
2022-05-01 0.973799 -1.856910 1.102405 1.448620
2022-05-02 0.155131 -0.582155 -0.230280 0.691659
2022-05-03 0.445839 -0.988124 0.265889 -1.048844


dataFrame[dataFrame > 0]
A B C D
2022-05-01 0.973799 NaN 1.102405 1.448620
2022-05-02 0.155131 NaN NaN 0.691659
2022-05-03 0.445839 NaN 0.265889 NaN
2022-05-04 NaN NaN 0.144906 0.719465
2022-05-05 NaN 0.459753 0.131384 0.513463
2022-05-06 NaN NaN NaN 0.640226


dataFrame2 = dataFrame.copy()
dataFrame2["E"] = ["one", "one", "two", "three", "four", "three"]
dataFrame2[dataFrame2["E"].isin(["two", "four"])]
A B C D E
2022-05-03 0.445839 -0.988124 0.265889 -1.048844 two
2022-05-05 -0.776714 0.459753 0.131384 0.513463 four


Setting data

series = pd.Series([1, 2, 3, 4, 5, 6], index=pd.date_range("20130102", periods=6))
dataFrame["F"] = series
dataFrame.at[dates[0], "A"] = 0
dataFrame.iat[0, 1] = 0
dataFrame.loc[:, "D"] = np.array([5] * len(dataFrame))
dataFrame2 = dataFrame.copy()
dataFrame2[dataFrame2 > 0] = -dataFrame2

Total running time of the script: (0 minutes 0.019 seconds)