Tutorial#

This example will show the simplest way to use pandas with pint and the underlying objects. It’s slightly fiddly to set up units compared to reading data and units from a file. A more typical use case is given in Reading from csv.

Imports#

First some imports

In [1]: import pandas as pd

In [2]: import pint

In [3]: import pint_pandas

In [4]: pint_pandas.show_versions()
{'numpy': '2.3.5',
 'pandas': '2.3.3',
 'pint': '0.25.2',
 'pint_pandas': '0.1.dev114+gfdb109299'}

Create a DataFrame#

Next, we create a DataFrame with PintArrays as columns.

In [5]: df = pd.DataFrame(
   ...:    {
   ...:       "torque": pd.Series([1.0, 2.0, 2.0, 3.0], dtype="pint[lbf ft]"),
   ...:       "angular_velocity": pd.Series([1.0, 2.0, 2.0, 3.0], dtype="pint[rpm]"),
   ...:    }
   ...: )
   ...: 

In [6]: df
Out[6]: 
   torque  angular_velocity
0     1.0               1.0
1     2.0               2.0
2     2.0               2.0
3     3.0               3.0

DataFrame Operations#

Operations with columns are units aware so behave as we would intuitively expect.

In [7]: df["power"] = df["torque"] * df["angular_velocity"]

In [8]: df
Out[8]: 
   torque  angular_velocity  power
0     1.0               1.0    1.0
1     2.0               2.0    4.0
2     2.0               2.0    4.0
3     3.0               3.0    9.0

Note

Notice that the units are not displayed in the cells of the DataFrame. If you ever see units in the cells of the DataFrame, something isn’t right. See Units in Cells for more information.

We can see the columns’ units in the dtypes attribute

In [9]: df.dtypes
Out[9]: 
torque                              pint[foot * force_pound][Float64]
angular_velocity                pint[revolutions_per_minute][Float64]
power               pint[foot * force_pound * revolutions_per_minu...
dtype: object

Each column can be accessed as a Pandas Series

In [10]: df.power
Out[10]: 
0    1.0
1    4.0
2    4.0
3    9.0
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute][Float64]

Which contains a PintArray

In [11]: df.power.values
Out[11]: 
<PintArray>
[1.0, 4.0, 4.0, 9.0]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute][Float64]

The PintArray contains a Quantity

In [12]: df.power.values.quantity
Out[12]: <Quantity([1. 4. 4. 9.], 'force_pound * foot * revolutions_per_minute')>

DataFrame Index#

PintArrays can be used as the DataFrame’s index.

In [13]: time = pd.Series([1, 2, 2, 3], dtype="pint[second]")

In [14]: df.index = time

In [15]: df.index
Out[15]: Index([1.0 second, 2.0 second, 2.0 second, 3.0 second], dtype='pint[second][Float64]')

Pandas Series Accessors#

Pandas Series accessors are provided for most Quantity properties and methods. Methods that return arrays will be converted to Series.

In [16]: df.power.pint.units
Out[16]: <Unit('force_pound * foot * revolutions_per_minute')>

In [17]: df.power.pint.to("kW")
Out[17]: 
1.0    0.00014198092353610376
2.0      0.000567923694144415
2.0      0.000567923694144415
3.0     0.0012778283118249339
Name: power, dtype: pint[kilowatt][Float64]

That’s the basics! More examples are given at Reading from csv.