Reading data and plotting#
Reading from files is the far more standard way to use pandas.
To facilitate this, DataFrame accessors are provided to make it easy to get to PintArray objects.
Read data from csv#
First some imports
In [1]: import pandas as pd
In [2]: import pint
In [3]: import pint_pandas
In [4]: import io
Here’s the contents of the csv file.
In [5]: test_data = """ShaftSpeedIndex,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
...: pump,,A,B,C,A,B,C,A,B,C
...: TestDate,No Unit,01/01,01/01,01/01,01/01,01/01,01/01,01/02,01/02,01/02
...: ShaftSpeed,rpm,1200,1200,1200,1600,1600,1600,2300,2300,2300
...: FlowRate,m^3 h^-1,8.72,9.28,9.31,11.61,12.78,13.51,18.32,17.90,19.23
...: DifferentialPressure,kPa,162.03,144.16,136.47,286.86,241.41,204.21,533.17,526.74,440.76
...: ShaftPower,kW,1.32,1.23,1.18,3.09,2.78,2.50,8.59,8.51,7.61
...: Efficiency,dimensionless,30.60,31.16,30.70,30.72,31.83,31.81,32.52,31.67,32.05"""
...:
Let’s read that into a DataFrame. Here io.StringIO is used in place of reading a file from disk, whereas a csv file path would typically be used and is shown commented.
In [6]: df = pd.read_csv(io.StringIO(test_data), header=[0, 1], index_col=[0, 1]).T
# df = pd.read_csv("/path/to/test_data.csv", header=[0, 1])
In [7]: for col in df.columns:
...: try:
...: df[col] = pd.to_numeric(df[col])
...: except ValueError:
...: pass
...:
In [8]: df.dtypes
Out[8]:
TestDate No Unit object
ShaftSpeed rpm int64
FlowRate m^3 h^-1 float64
DifferentialPressure kPa float64
ShaftPower kW float64
Efficiency dimensionless float64
dtype: object
Pandas DataFrame Accessors#
Then use the DataFrame’s pint accessor’s quantify method to convert the columns from ndarray to PintArray, with units from the bottom column level.
Using ‘No Unit’ as the unit will prevent quantify converting a column to a PintArray. This can be changed by changing pint_pandas.pint_array.NO_UNIT.
In [9]: df_ = df.pint.quantify(level=-1)
In [10]: df_
Out[10]:
TestDate ShaftSpeed ... ShaftPower Efficiency
ShaftSpeedIndex pump ...
1200 A 01/01 1200 ... 1.32 30.6
B 01/01 1200 ... 1.23 31.16
C 01/01 1200 ... 1.18 30.7
1600 A 01/01 1600 ... 3.09 30.72
B 01/01 1600 ... 2.78 31.83
C 01/01 1600 ... 2.5 31.81
2300 A 01/02 2300 ... 8.59 32.52
B 01/02 2300 ... 8.51 31.67
C 01/02 2300 ... 7.61 32.05
[9 rows x 6 columns]
Let’s confirm the units have been parsed correctly by looking at the dtypes.
In [11]: df_.dtypes
Out[11]:
TestDate object
ShaftSpeed pint[revolutions_per_minute][int64]
FlowRate pint[meter ** 3 / hour][float64]
DifferentialPressure pint[kilopascal][float64]
ShaftPower pint[kilowatt][float64]
Efficiency pint[dimensionless][float64]
dtype: object
Here the Efficiency has been parsed as dimensionless. Let’s change it to percent.
In [12]: df_["Efficiency"] = pint_pandas.PintArray(
....: df_["Efficiency"].values.quantity.m, dtype="pint[percent]"
....: )
....:
In [13]: df_.dtypes
Out[13]:
TestDate object
ShaftSpeed pint[revolutions_per_minute][int64]
FlowRate pint[meter ** 3 / hour][float64]
DifferentialPressure pint[kilopascal][float64]
ShaftPower pint[kilowatt][float64]
Efficiency pint[percent][Float64]
dtype: object
As previously, operations between DataFrame columns are unit aware
In [14]: df_.ShaftPower / df_.ShaftSpeed
Out[14]:
ShaftSpeedIndex pump
1200 A 0.0011
B 0.001025
C 0.0009833333333333332
1600 A 0.0019312499999999998
B 0.0017375
C 0.0015625
2300 A 0.003734782608695652
B 0.0036999999999999997
C 0.0033086956521739133
dtype: pint[kilowatt / revolutions_per_minute][float64]
In [15]: df_["ShaftTorque"] = df_.ShaftPower / df_.ShaftSpeed
In [16]: df_["FluidPower"] = df_["FlowRate"] * df_["DifferentialPressure"]
In [17]: df_
Out[17]:
TestDate ... FluidPower
ShaftSpeedIndex pump ...
1200 A 01/01 ... 1412.9016000000001
B 01/01 ... 1337.8048
C 01/01 ... 1270.5357000000001
1600 A 01/01 ... 3330.4446
B 01/01 ... 3085.2198
C 01/01 ... 2758.8771
2300 A 01/02 ... 9767.6744
B 01/02 ... 9428.645999999999
C 01/02 ... 8475.8148
[9 rows x 8 columns]
In [18]: df_.groupby(by=["ShaftSpeedIndex"])[['FlowRate', 'DifferentialPressure', 'ShaftPower', 'Efficiency']].mean()
Out[18]:
FlowRate ... Efficiency
ShaftSpeedIndex ...
1200 9.103333333333333 ... 30.820000000000004
1600 12.633333333333333 ... 31.453333333333333
2300 18.483333333333334 ... 32.080000000000005
[3 rows x 4 columns]
The DataFrame’s pint.dequantify method then allows us to retrieve the units information as a header row once again.
In [19]: df_.pint.dequantify()
Out[19]:
TestDate ... FluidPower
unit No Unit ... kilopascal * meter ** 3 / hour
ShaftSpeedIndex pump ...
1200 A 01/01 ... 1412.9016
B 01/01 ... 1337.8048
C 01/01 ... 1270.5357
1600 A 01/01 ... 3330.4446
B 01/01 ... 3085.2198
C 01/01 ... 2758.8771
2300 A 01/02 ... 9767.6744
B 01/02 ... 9428.6460
C 01/02 ... 8475.8148
[9 rows x 8 columns]
This allows for some rather powerful abilities. For example, to change a column’s units
In [20]: df_["FluidPower"] = df_["FluidPower"].pint.to("kW")
In [21]: df_["FlowRate"] = df_["FlowRate"].pint.to("L/s")
In [22]: df_["ShaftTorque"] = df_["ShaftTorque"].pint.to("N m")
In [23]: df_.pint.dequantify()
Out[23]:
TestDate ShaftSpeed ... ShaftTorque FluidPower
unit No Unit revolutions_per_minute ... meter * newton kilowatt
ShaftSpeedIndex pump ...
1200 A 01/01 1200 ... 10.504226 0.392473
B 01/01 1200 ... 9.788029 0.371612
C 01/01 1200 ... 9.390142 0.352927
1600 A 01/01 1600 ... 18.442079 0.925123
B 01/01 1600 ... 16.591903 0.857005
C 01/01 1600 ... 14.920776 0.766355
2300 A 01/02 2300 ... 35.664547 2.713243
B 01/02 2300 ... 35.332397 2.619068
C 01/02 2300 ... 31.595716 2.354393
[9 rows x 8 columns]
The units are harder to read than they need be, so lets change pint’s default format for displaying units.
In [24]: pint_pandas.PintType.ureg.formatter.default_format = "P~"
In [25]: df_.pint.dequantify()
Out[25]:
TestDate ShaftSpeed ... ShaftTorque FluidPower
unit No Unit rpm ... m·N kW
ShaftSpeedIndex pump ...
1200 A 01/01 1200 ... 10.504226 0.392473
B 01/01 1200 ... 9.788029 0.371612
C 01/01 1200 ... 9.390142 0.352927
1600 A 01/01 1600 ... 18.442079 0.925123
B 01/01 1600 ... 16.591903 0.857005
C 01/01 1600 ... 14.920776 0.766355
2300 A 01/02 2300 ... 35.664547 2.713243
B 01/02 2300 ... 35.332397 2.619068
C 01/02 2300 ... 31.595716 2.354393
[9 rows x 8 columns]
or the entire table’s units
In [26]: df_.pint.to_base_units().pint.dequantify()
Out[26]:
TestDate ShaftSpeed ... ShaftTorque FluidPower
unit No Unit rad/s ... kg·m²/s² kg·m²/s³
ShaftSpeedIndex pump ...
1200 A 01/01 125 ... 10.504226 392.472667
B 01/01 125 ... 9.788029 371.612444
C 01/01 125 ... 9.390142 352.926583
1600 A 01/01 167 ... 18.442079 925.123500
B 01/01 167 ... 16.591903 857.005500
C 01/01 167 ... 14.920776 766.354750
2300 A 01/02 240 ... 35.664547 2713.242889
B 01/02 240 ... 35.332397 2619.068333
C 01/02 240 ... 31.595716 2354.393000
[9 rows x 8 columns]
Plotting#
Pint’s matplotlib support allows columns with the same dimensionality to be plotted. First, set up matplotlib to use pint’s units.
In [27]: import matplotlib.pyplot as plt
In [28]: pint_pandas.PintType.ureg.setup_matplotlib()
Let’s convert a column to a different unit and plot two columns with different units. Pint’s matplotlib support will automatically convert the units to the first units and add the units to the axis labels.
In [29]: df_['FluidPower'] = df_['FluidPower'].pint.to('W')
In [30]: df_[["ShaftPower", "FluidPower"]].dtypes
Out[30]:
ShaftPower pint[kW][float64]
FluidPower pint[W][float64]
dtype: object
In [31]: fig, ax = plt.subplots()
In [32]: ax = df_[["ShaftPower", "FluidPower"]].unstack("pump").plot(ax=ax)
In [33]: ax.yaxis.units
Out[33]: <Unit('kilowatt')>
In [34]: ax.yaxis.label
Out[34]: Text(55.847222222222214, 0.5, 'kilowatt')
Single row headers#
A parsing function can be passed into df.pint.quantify to handle single row headers.
In [35]: df = pd.DataFrame(
....: {
....: "no_unit_column": pd.Series([i for i in range(4)], dtype="Float64"),
....: "torque [lbf ft]": pd.Series([1.0, 2.0, 2.0, 3.0], dtype="Float64"),
....: }
....: )
....:
In [36]: def parsing_function(column_name):
....: if "[" in column_name:
....: return column_name.split("]")[0].split(" [")
....: return column_name, pint_pandas.pint_array.NO_UNIT
....:
In [37]: df.pint.quantify(parsing_function=parsing_function)
Out[37]:
no_unit_column torque
0 0.0 1.0
1 1.0 2.0
2 2.0 2.0
3 3.0 3.0
Alternatively df.pint.quantify() will attempt to parse single row headers that adhere to the following formats:
{column_name} [{unit}]{column_name} ({unit}){column_name} / {unit}
In [38]: df = pd.DataFrame(
....: {
....: "no_unit_column": pd.Series([i for i in range(4)], dtype="Float64"),
....: "torque [lbf ft]": pd.Series([1.0, 2.0, 2.0, 3.0], dtype="Float64"),
....: }
....: )
....:
In [39]: df_ = df.pint.quantify()
In [40]: df_
Out[40]:
no_unit_column torque
0 0.0 1.0
1 1.0 2.0
2 2.0 2.0
3 3.0 3.0
The reverse operation can be done with df.pint.dequantify() and the writing_function argument.
In [41]: df_.pint.dequantify()
Out[41]:
no_unit_column torque [ft·lbf]
0 0.0 1.0
1 1.0 2.0
2 2.0 2.0
3 3.0 3.0
In [42]: def writing_function(column_name, unit):
....: if unit == pint_pandas.pint_array.NO_UNIT:
....: return column_name
....: return f"{column_name} [{unit}]"
....:
In [43]: df_.pint.dequantify(writing_function=writing_function)
Out[43]:
no_unit_column torque [ft·lbf]
0 0.0 1.0
1 1.0 2.0
2 2.0 2.0
3 3.0 3.0