Tutorial#
This example will show the simplest way to use pandas with pint and the underlying objects. It’s slightly fiddly to set up units compared to reading data and units from a file. A more typical use case is given in Reading from csv.
Imports#
First some imports
In [1]: import pandas as pd
In [2]: import pint
In [3]: import pint_pandas
In [4]: pint_pandas.show_versions()
{'numpy': '2.3.5',
'pandas': '2.3.3',
'pint': '0.25.2',
'pint_pandas': '0.1.dev114+gfdb109299'}
Create a DataFrame#
Next, we create a DataFrame with PintArrays as columns.
In [5]: df = pd.DataFrame(
...: {
...: "torque": pd.Series([1.0, 2.0, 2.0, 3.0], dtype="pint[lbf ft]"),
...: "angular_velocity": pd.Series([1.0, 2.0, 2.0, 3.0], dtype="pint[rpm]"),
...: }
...: )
...:
In [6]: df
Out[6]:
torque angular_velocity
0 1.0 1.0
1 2.0 2.0
2 2.0 2.0
3 3.0 3.0
DataFrame Operations#
Operations with columns are units aware so behave as we would intuitively expect.
In [7]: df["power"] = df["torque"] * df["angular_velocity"]
In [8]: df
Out[8]:
torque angular_velocity power
0 1.0 1.0 1.0
1 2.0 2.0 4.0
2 2.0 2.0 4.0
3 3.0 3.0 9.0
Note
Notice that the units are not displayed in the cells of the DataFrame. If you ever see units in the cells of the DataFrame, something isn’t right. See Units in Cells for more information.
We can see the columns’ units in the dtypes attribute
In [9]: df.dtypes
Out[9]:
torque pint[foot * force_pound][Float64]
angular_velocity pint[revolutions_per_minute][Float64]
power pint[foot * force_pound * revolutions_per_minu...
dtype: object
Each column can be accessed as a Pandas Series
In [10]: df.power
Out[10]:
0 1.0
1 4.0
2 4.0
3 9.0
Name: power, dtype: pint[foot * force_pound * revolutions_per_minute][Float64]
Which contains a PintArray
In [11]: df.power.values
Out[11]:
<PintArray>
[1.0, 4.0, 4.0, 9.0]
Length: 4, dtype: pint[foot * force_pound * revolutions_per_minute][Float64]
The PintArray contains a Quantity
In [12]: df.power.values.quantity
Out[12]: <Quantity([1. 4. 4. 9.], 'force_pound * foot * revolutions_per_minute')>
DataFrame Index#
PintArrays can be used as the DataFrame’s index.
In [13]: time = pd.Series([1, 2, 2, 3], dtype="pint[second]")
In [14]: df.index = time
In [15]: df.index
Out[15]: Index([1.0 second, 2.0 second, 2.0 second, 3.0 second], dtype='pint[second][Float64]')
Pandas Series Accessors#
Pandas Series accessors are provided for most Quantity properties and methods. Methods that return arrays will be converted to Series.
In [16]: df.power.pint.units
Out[16]: <Unit('force_pound * foot * revolutions_per_minute')>
In [17]: df.power.pint.to("kW")
Out[17]:
1.0 0.00014198092353610376
2.0 0.000567923694144415
2.0 0.000567923694144415
3.0 0.0012778283118249339
Name: power, dtype: pint[kilowatt][Float64]
That’s the basics! More examples are given at Reading from csv.