12.3.10.4.3. Grouping data#

import pandas as pd
import numpy as np

Concat

Create a dataFrame

dataFrame = pd.DataFrame(np.random.randn(10, 4))

break in pieces

pieces = [dataFrame[:3], dataFrame[3:7], dataFrame[7:]]
pd.concat(pieces)
0 1 2 3
0 2.005605 0.122321 0.314603 0.318486
1 -1.015193 0.074791 0.816450 -1.115685
2 -0.997419 -0.064948 0.973292 0.969946
3 1.281346 0.154516 -3.354396 -0.940462
4 -0.063201 0.225435 0.350171 1.603465
5 0.654310 0.085126 0.021388 -0.631416
6 1.328035 0.053975 0.672767 -1.708392
7 -0.512168 -0.044290 0.993882 -0.127941
8 1.000896 -0.096002 -0.327849 2.242147
9 0.407832 -0.135254 1.369629 0.255078


Join

left = pd.DataFrame({"key": ["foo", "foo"], "lval": [1, 2]})
right = pd.DataFrame({"key": ["foo", "foo"], "rval": [4, 5]})
pd.merge(left, right, on="key")
key lval rval
0 foo 1 4
1 foo 1 5
2 foo 2 4
3 foo 2 5


Grouping

dataFrame = pd.DataFrame(
    {
        "A": ["foo", "bar", "foo", "bar", "foo", "bar", "foo", "foo"],
        "B": ["one", "one", "two", "three", "two", "two", "one", "three"],
        "C": np.random.randn(8),
        "D": np.random.randn(8),
    }
)
dataFrame.groupby("A").sum()
B C D
A
bar onethreetwo 2.097917 0.999664
foo onetwotwoonethree 2.141605 0.976511


dataFrame.groupby(["A", "B"]).sum()
C D
A B
bar one 0.048469 0.013294
three 0.631859 -0.183019
two 1.417589 1.169390
foo one 1.114664 1.950852
three -0.457640 -2.412908
two 1.484580 1.438567


Total running time of the script: (0 minutes 0.011 seconds)