Initializing data

Initializing data#

There are several ways to initialize a PintArray in a DataFrame. Here’s the most common methods.

In [1]: df = pd.DataFrame(
   ...:     {
   ...:         "Ser1": pd.Series([1, 2], dtype="pint[m]"),
   ...:         "Ser2": pd.Series([1, 2]).astype("pint[m]"),
   ...:         "Ser3": pd.Series([1, 2], dtype="pint[m][Int64]"),
   ...:         "Ser4": pd.Series([1, 2]).astype("pint[m][Int64]"),
   ...:         "PArr1": PintArray([1, 2], dtype="pint[m]"),
   ...:         "PArr2": PintArray([1, 2], dtype="pint[m][Int64]"),
   ...:         "PArr3": PintArray([1, 2], dtype="m"),
   ...:         "PArr4": PintArray([1, 2], dtype=ureg.m),
   ...:         "PArr5": PintArray(Quantity([1, 2], ureg.m)),
   ...:         "PArr6": PintArray([1, 2],"m"),
   ...:     }
   ...: )
   ...: 

In [2]: df
Out[2]: 
   Ser1  Ser2  Ser3  Ser4  PArr1  PArr2  PArr3  PArr4  PArr5  PArr6
0   1.0   1.0     1     1      1      1      1      1      1      1
1   2.0   2.0     2     2      2      2      2      2      2      2

In the first two Series examples above, the data was converted to Float64.

In [3]: df.dtypes
Out[3]: 
Ser1     pint[meter][Float64]
Ser2     pint[meter][Float64]
Ser3       pint[meter][Int64]
Ser4       pint[meter][Int64]
PArr1      pint[meter][Int64]
PArr2      pint[meter][Int64]
PArr3      pint[meter][Int64]
PArr4      pint[meter][Int64]
PArr5      pint[meter][Int64]
PArr6      pint[meter][Int64]
dtype: object

To avoid this conversion, specify the subdtype (dtype of the magnitudes) in the dtype "pint[m][Int64]" when constructing using a Series. The default data dtype that pint-pandas converts to can be changed by modifying pint_pandas.pint_array.DEFAULT_SUBDTYPE.

PintArray infers the subdtype from the data passed into it when there is no subdtype specified in the dtype. It also accepts a pint Unit or unit string as the dtype.

Note

"pint[unit]" or "pint[unit][subdtype]" must be used for the Series or DataFrame constuctor.

Non-native pandas dtypes#

PintArray uses an ExtensionArray to hold its data inclluding those from other libraries that extend pandas. For example, an UncertaintyArray can be used.

In [4]: from uncertainties_pandas import UncertaintyArray, UncertaintyDtype

In [5]: from uncertainties import ufloat, umath, unumpy

In [6]: ufloats = [ufloat(i, abs(i) / 100) for i in [4.0, np.nan, -5.0]]

In [7]: uarr = UncertaintyArray(ufloats)

In [8]: uarr
Out[8]: 
<UncertaintyArray>
[4.0+/-0.04, <NA>, -5.0+/-0.05]
Length: 3, dtype: UncertaintyDtype

In [9]: PintArray(uarr,"m")
Out[9]: 
<PintArray>
[4.00+/-0.04, <NA>, -5.00+/-0.05]
Length: 3, dtype: pint[meter][UncertaintyDtype]

In [10]: pd.Series(PintArray(uarr,"m")*2)
Out[10]: 
0      8.00+/-0.08
1              nan
2    -10.00+/-0.10
dtype: pint[meter][UncertaintyDtype]