Introduction¶
HydroGeoChemistry - A python package for correction, validation and analysis of ground water quality samples
Getting Started¶
HGC (HydroGeoChemistry) is a Python package for correction, validation and analysis of ground water quality samples.
- Main features:
- Handle common erros and peculiarities in hydrochemical data, such as negative concentrations, detection limits and non-numerical placeholders
- Calculate common ratios, such as for example MONC, SUVA and HCO3 to the sum of all anions
- Classify groundwater samples according to the Stuyfzand water types
- Calculate missing concentrations through the ion balance, with PhreeqPython
Philosophy¶
HGC is an extension of the Pandas DataFrame, giving your DataFrame hydrochemistry superpowers. You can thus mix HGC with your regular Pandas/Numpy workflows.
- A DataFrame does not need to conform to a specific format to work with HGC, however it is required that:
- Each row in the DataFrame represents a groundwater quality sample
- Each column represents a groundwater quality parameter
HGC checks if column names in the DataFrame match with chemical parameters that it recognizes. Such columns should be in the units that HGC expects. In addition to ‘HGC-enabled’ columns, the DataFrame can contain an arbitrary number of non-hydrochemistry columns (such as XY-locations, comments, or other measured quantities), HGC simply ignores those columns.
Conventions¶
The naming conventions of the column names is that they are all in lower case with an underscore between separate words. E.g. the EC measured in the lab is indicated with ec_lab. The only exception to this is the notation of chemical structures and atoms; there standard capitalization is used. E.g. the column name for total total nitrogen is N_total and for ortho-phosphate PO4_ortho.
Tutorial¶
Practical examples
We always start by importing HGC:
In [1]: import pandas as pd
In [2]: import hgc
Creating a HGC-enabled DataFrame¶
A hydrochemical groundwater analysis typically starts with a ‘normal’ Pandas DataFrame, in which each row contains a groundwater quality sample, and each column represents a water quality parameter. The DataFrame may contain concentrations of different chemical compounds, possibly exceeding the detection limit (denoted with a ‘<’ or ‘>’ prefix). There may also be errors in the data, such as negative concentrations or text placeholders.
Note
Please refer to the excellent WaDI package to get your excel or csv file with measurements in a format that HGC understands. In this tutorial, we create our own DataFrame for clarity.
In [3]: testdata = {'alkalinity': [0.0], 'Al': [2600], 'Ba': [44.0],
...: 'Br': [0.0], 'Ca': [2.0], 'Cl': [19.0],
...: 'Co': [1.2], 'Cu': [4.0], 'doc': [4.4],
...: 'F': [0.08], 'Fe': [0.29], 'K': [1.1],
...: 'Li': [5.0], 'Mg': [1.6], 'Mn': ['< 0.05'],
...: 'Na': [15.0], 'Ni': [7.0], 'NH4': ['< 0.05'],
...: 'NO2': [0.0], 'NO3': [22.6], 'Pb': [2.7],
...: 'PO4': ['0.04'], 'ph': [4.3], 'SO4': [16.0],
...: 'Sr': [50], 'Zn': [60.0] }
...:
In [4]: df = pd.DataFrame.from_dict(testdata)
In [5]: df
Out[5]:
alkalinity Al Ba Br Ca Cl ... Pb PO4 ph SO4 Sr Zn
0 0.0 2600 44.0 0.0 2.0 19.0 ... 2.7 0.04 4.3 16.0 50 60.0
[1 rows x 26 columns]
Since the data in this DataFrame is messy, we cannot use it yet for hydrochemical calculations. HGC can check if the data contains obvious errors:
In [6]: is_valid = df.hgc.is_valid
In [7]: is_valid
Out[7]: False
The DataFrame may contain any kind of columns and column names. However, HGC will only recognize a specific set of columns with names of hydrochemical parameters.
In [8]: from hgc.constants import constants
In [9]: print([*constants.atoms])
['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F', 'Ne', 'Na', 'Mg', 'Al', 'Si', 'P', 'S', 'Cl', 'Ar', 'K', 'Ca', 'Sc', 'Ti', 'V', 'Cr', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Ga', 'Ge', 'As', 'Se', 'Br', 'Kr', 'Rb', 'Sr', 'Y', 'Zr', 'Nb', 'Mo', 'Tc', 'Ru', 'Rh', 'Pd', 'Ag', 'Cd', 'In', 'Sn', 'Sb', 'Te', 'I', 'Xe', 'Cs', 'Ba', 'La', 'Ce', 'Pr', 'Nd', 'Pm', 'Sm', 'Eu', 'Gd', 'Tb', 'Dy', 'Ho', 'Er', 'Tm', 'Yb', 'Lu', 'Hf', 'Ta', 'W', 'Re', 'Os', 'Ir', 'Pt', 'Au', 'Hg', 'Tl', 'Pb', 'Bi', 'Po', 'At', 'Rn', 'Fr', 'Ra', 'Ac', 'Th', 'Pa', 'U', 'Np', 'Pu', 'Am', 'Cm', 'Bk', 'Cf', 'Es', 'Fm', 'Md', 'No', 'Lr', 'Rf', 'Db', 'Sg', 'Bh', 'Hs', 'Mt', 'Ds', 'Rg', 'Cn', 'Nh', 'Fl', 'Mc', 'Lv', 'Ts', 'Og']
In [10]: print([*constants.ions])
['CH4', 'H2S', 'S', 'CO2', 'alkalinity', 'O2_field', 'O2_lab', 'O2', 'KMnO4', 'NH4', 'NO2', 'NO3', 'N_kj', 'N', 'PO4', 'PO4_ortho', 'P', 'SiO2', 'SO4_ic', 'SO4', 'doc', 'toc', 'cod']
In [11]: print([*constants.properties])
['ec_field', 'ec_lab', 'ec', 'ph_field', 'ph_lab', 'ph', 'temp_field', 'temp_lab', 'temp', 'eh_field', 'turb', 'uva254']
You can also retreive the details of each compound, such as the expected units, full name or molar weight:
In [12]: constants.atoms['H']
Out[12]: Atom(feature='H', name='Hydrogen', unit='mg/L', mw=1.00794, oxidized=1.0, reduced=0.0, SMOW=0.0)
In [13]: constants.properties['ec']
Out[13]: Properties(feature='ec', name='EC converted to 20°C', example='read', unit='μS/cm', phreeq_name=None)
For your convenience, all units for all allowed (columns with) atoms, ions and properties are enlisted here here.
Since in this case our DataFrame contains negative concentrations, detection limits (rows with ‘<’ or ‘>’) and incorrect data types (e.g. string columns that are supposed to be numeric), HGC will initially report that the DataFrame is invalid. HGC can automatically solve inconsistencies with the ‘make_valid’ method. As a result, negative concentrations are replaced by 0; concentrations below detection limit are replaced by half the limit; concentrations above the upper detection limit are replaced by 1.5 times that limit.
In [14]: df.hgc.make_valid()
In [15]: is_valid = df.hgc.is_valid
In [16]: is_valid
Out[16]: True
In [17]: df
Out[17]:
alkalinity Al Ba Br Ca Cl ... Pb PO4 ph SO4 Sr Zn
0 0.0 2600 44.0 0.0 2.0 19.0 ... 2.7 0.04 4.3 16.0 50 60.0
[1 rows x 26 columns]
# Recognized HGC columns
In [18]: hgc_cols = df.hgc.hgc_cols
In [19]: print(hgc_cols)
['Li', 'F', 'Na', 'Mg', 'Al', 'Cl', 'K', 'Ca', 'Mn', 'Fe', 'Co', 'Ni', 'Cu', 'Zn', 'Br', 'Sr', 'Ba', 'Pb', 'alkalinity', 'NH4', 'NO2', 'NO3', 'PO4', 'SO4', 'doc', 'ph']
Calculations¶
Now that our DataFrame is valid, we can use all HGC methods, such as calculating the Base Exchange Index of each row; this is added as column to df:
In [20]: df.hgc.get_bex()
In [21]: df.bex
Out[21]:
0 0.238022
Name: bex, dtype: float64
We can also classify each sample into the Stuyfzand water type:
In [22]: df.hgc.get_stuyfzand_water_type()
In [23]: df.water_type
Out[23]:
0 g*NaNO3o
Name: water_type, dtype: object
Or get the sum of all anions (using the Stuyfzand method):
In [24]: df.hgc.get_sum_anions()
In [25]: df.sum_anions
Out[25]:
0 1.282127
Name: sum_anions, dtype: float64
It is also possible to compute common hydrochemical ratios between different compounds. HGC calculates ratios for all columns that are available and ignores any missing columns.
In [26]: df.hgc.get_ratios()
In [27]: df.cl_to_na
Out[27]:
0 0.82138
Name: cl_to_na, dtype: float64
For all these above mentioned get functions, the columns are added to the dataframe. Most of the times this is convenient, but there are also cases where you don’t want to add them to the DataFrame but only want to return the result. In that case, one could use the inplace argument; this works the same as native pandas methods that have this argument With inplace=True (the default), the columns are added to the DataFrame (as shown in the examples above). With inplace=False the columns are not added to the database but returned as a pandas Series or DataFrame. E.g., for the Stuyfzand water type (a Series) or ratios (a DataFrame):
In [28]: water_type = df.hgc.get_stuyfzand_water_type(inplace=False)
In [29]: water_type
Out[29]:
0 g*NaNO3o
Name: swt, dtype: object
In [30]: ratios = df.hgc.get_ratios(inplace=False)
In [31]: ratios
Out[31]:
cl_to_br cl_to_na ... hco3_to_sum_anions hco3_to_ca_and_mg
0 inf 0.82138 ... 0.0 0.0
[1 rows x 8 columns]
Consolidation¶
A common situation is that one single parameter of a sample is measured with several methods or in different places. Parameters such as EC and pH are frequently measured both in the lab and field, and SO4 and PO4 are frequently measured both by IC and ICP-OES. Normally we prefer the field data for EC and pH, but ill calibrated sensors or tough field circumstances may prevent these readings to be superior to the lab measurement. In such cases we want select from multiple columns the one to use for subsequent calculations, by consolidating into one single column containing the best measurements, possibly filling gaps with measurements from the inferior method. Let’s consider this example:
In [32]: testdata = {
....: 'ph_lab': [4.3, 6.3, 5.4], 'ph_field': [4.4, 6.1, 5.7],
....: 'ec_lab': [304, 401, 340], 'ec_field': [290, 'error', 334.6],
....: }
....:
In [33]: df = pd.DataFrame.from_dict(testdata)
In [34]: df
Out[34]:
ph_lab ph_field ec_lab ec_field
0 4.3 4.4 304 290
1 6.3 6.1 401 error
2 5.4 5.7 340 334.6
In [35]: df.hgc.make_valid()
In [36]: df
Out[36]:
ph_lab ph_field ec_lab ec_field
0 4.3 4.4 304 290.0
1 6.3 6.1 401 NaN
2 5.4 5.7 340 334.6
In [37]: df.hgc.consolidate(use_ph='field', use_ec='lab', use_temp=None,
....: use_so4=None, use_o2=None)
....:
In [38]: df
Out[38]:
ph ec
0 4.4 304.0
1 6.1 401.0
2 5.7 340.0
Warning
Note that omitting use_so4=None
in the function call, would let the function
fall back to the default which is ic
. Because the column so4_ic
is not in the dataframe
this will return an error. The same holds for use_temp
and use_o2
.
In [39]: df.hgc.consolidate(use_ph='field', use_ec='lab', use_temp=None,)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-39-2cbb7cd1e5e6> in <module>
----> 1 df.hgc.consolidate(use_ph='field', use_ec='lab', use_temp=None,)
~/checkouts/readthedocs.org/user_builds/hgc/checkouts/latest/hgc/samples_frame.py in consolidate(self, use_ph, use_ec, use_so4, use_o2, use_temp, use_alkalinity, merge_on_na, inplace)
346 raise ValueError(f"Column {source} not present in DataFrame. Use " +
347 f"use_{param.lower()}=None to explicitly ignore consolidating " +
--> 348 f"this column.")
349
350
ValueError: Column SO4_ic not present in DataFrame. Use use_so4=None to explicitly ignore consolidating this column.
Visualizing and exporting¶
The great thing about HGC is that your DataFrame gets hydrochemical superpowers, yet all functions that you expect from a regular Pandas DataFrame are still available, allowing you to easily import/export and visualize data.
In [40]: df.std()
Out[40]:
ph 0.888819
ec 49.034002
dtype: float64
In [41]: df.plot()
Out[41]: <AxesSubplot:>

Coupling to PHREEQC¶
Another great superpower of HGC is that it allows easy geochemistry directly on your dataframe! It currently has coupling with the popular geochemistry software PHREEQC via its python wrappers as implemented by the phreeqpython package.
Let’s extend the above DataFrame a little to make it more meaningful in the context of this coupling:
In [42]: testdata = {
....: 'ph_lab': [4.5, 5.5, 7.6], 'ph_field': [4.4, 6.1, 7.7],
....: 'ec_lab': [304, 401, 340], 'ec_field': [290, 'error', 334.6],
....: 'temp': [10, 10, 10],
....: 'alkalinity': [0, 7, 121],
....: 'O2': [11, 0, 0],
....: 'Na': [9,20,31], 'K':[0.4, 2.1, 2.0],
....: 'Ca':[1,3,47],
....: 'Fe': [0.10, 2.33, 0.4],
....: 'Mn': [0.02, 0.06, 0.13],
....: 'NH4': [1.29, 0.08, 0.34],
....: 'SiO2': [0.2, 15.4, 13.3],
....: 'SO4': [7,19,35],
....: 'NO3': [3.4,0.1,0],
....: }
....:
In [43]: df = pd.DataFrame.from_dict(testdata)
In [44]: df.hgc.make_valid()
In [45]: df.hgc.consolidate(use_ph='lab', use_ec='lab', use_temp=None,
....: use_so4=None, use_o2=None)
....:
With this DataFrame, we can do some PHREEQC calculations. For example, we can calculate the saturation index of different minerals like Calcite:
In [46]: df.hgc.get_saturation_index('Calcite')
In [47]: df['si_calcite'] # or df.si_calcite
Out[47]:
0 -999.000000
1 -4.722956
2 -0.288641
Name: si_calcite, dtype: float64
The mineral name will be added as a column named si_<mineral_name> where <mineral_name> is the name of the mineral as given to PHREEQC but all letters in lower case (and don’t forget the underscore). The saturation index (SI) of a mineral can only be retrieved if they are defined in the phreeqc database used by phreeqpython. If the mineral is not defined, always an SI of -999 will be returned.
This also works for the partial pressure of gasses (because in PhreeqC both minerals and gasses are defined as PHASES; see below for explanation of the coupling to PhreeqC). But it looks better if one uses the alias partial_pressure which returns the same values but with a different name of the column (prepending pp instead of si, since it is the partial pressure and not the saturation index).
In [48]: df.hgc.get_saturation_index('CO2(g)')
In [49]: df['si_co2(g)']
Out[49]:
0 -999.000000
1 -1.715479
2 -2.609626
Name: si_co2(g), dtype: float64
In [50]: df.hgc.get_partial_pressure('CO2(g)')
In [51]: df['pp_co2(g)']
Out[51]:
0 -999.000000
1 -1.715479
2 -2.609626
Name: pp_co2(g), dtype: float64
Similar to the SI, the specific conductance (SC), also known as electric conductance (EC) or EGV, is simply retrieved by calling:
In [52]: df.hgc.get_specific_conductance()
In [53]: df.sc
Out[53]:
0 37.193926
1 64.734047
2 218.187466
Name: sc, dtype: float64
Internally, these methods call the method get_phreeqpython_solutions to retrieve instances of the PhreeqPython Solution class. PhreeqPython is a Python package that allows the use of the Geochemical modeling package PhreeqC from within Python. HGC leverages this package to have a PhreeqC solution (or actually a PhreeqPython solution) for every row of the SamplesFrame. These are available to the user by calling
In [54]: df.hgc.get_phreeqpython_solutions()
In [55]: df.pp_solutions
Out[55]: ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
700 type_pprinters=self.type_printers,
701 deferred_pprinters=self.deferred_printers)
--> 702 printer.pretty(obj)
703 printer.flush()
704 return stream.getvalue()
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/IPython/lib/pretty.py in pretty(self, obj)
392 if cls is not object \
393 and callable(cls.__dict__.get('__repr__')):
--> 394 return _repr_pprint(obj, self, cycle)
395
396 return _default_pprint(obj, self, cycle)
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
698 """A pprint that just redirects to the normal repr function."""
699 # Find newlines and replace them with p.break_()
--> 700 output = repr(obj)
701 lines = output.splitlines()
702 with p.group():
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/core/series.py in __repr__(self)
1469 min_rows=min_rows,
1470 max_rows=max_rows,
-> 1471 length=show_dimensions,
1472 )
1473 return buf.getvalue()
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/core/series.py in to_string(self, buf, na_rep, float_format, header, index, length, dtype, name, max_rows, min_rows)
1532 max_rows=max_rows,
1533 )
-> 1534 result = formatter.to_string()
1535
1536 # catch contract violations
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/format.py in to_string(self)
389
390 fmt_index, have_header = self._get_formatted_index()
--> 391 fmt_values = self._get_formatted_values()
392
393 if self.is_truncated_vertically:
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/format.py in _get_formatted_values(self)
378 float_format=self.float_format,
379 na_rep=self.na_rep,
--> 380 leading_space=self.index,
381 )
382
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/format.py in format_array(values, formatter, float_format, na_rep, digits, space, justify, decimal, leading_space, quoting)
1238 )
1239
-> 1240 return fmt_obj.get_result()
1241
1242
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/format.py in get_result(self)
1269
1270 def get_result(self) -> list[str]:
-> 1271 fmt_values = self._format_strings()
1272 return _make_fixed_width(fmt_values, self.justify)
1273
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/format.py in _format_strings(self)
1332 for i, v in enumerate(vals):
1333 if not is_float_type[i] and leading_space:
-> 1334 fmt_values.append(f" {_format(v)}")
1335 elif is_float_type[i]:
1336 fmt_values.append(float_format(v))
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/format.py in _format(x)
1312 else:
1313 # object dtype
-> 1314 return str(formatter(x))
1315
1316 vals = extract_array(self.values, extract_numpy=True)
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/printing.py in pprint_thing(thing, _nest_lvl, escape_chars, default_escapes, quote_strings, max_seq_items)
231 result = f"'{as_escaped_string(thing)}'"
232 else:
--> 233 result = as_escaped_string(thing)
234
235 return result
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/pandas/io/formats/printing.py in as_escaped_string(thing, escape_chars)
207 escape_chars = escape_chars or ()
208
--> 209 result = str(thing)
210 for c in escape_chars:
211 result = result.replace(c, translate[c])
~/checkouts/readthedocs.org/user_builds/hgc/envs/latest/lib/python3.7/site-packages/phreeqpython/solution.py in __str__(self)
239 # pretty printing
240 def __str__(self):
--> 241 return "<PhreeqPython.Solution."+self.__class__.__name__ + " with number '" + self.number + "'>"
TypeError: can only concatenate str (not "int") to str
Because all elements in this column are PhreeqPython Solution’s, PhreeqC can be used to calculate all kind of properties of each water sample of each row in the SamplesFrame. In the documentation of PhreeqPython all these are described. For example, one can derive the specific conductance or pH from the first sample:
In [56]: df.pp_solutions[0].sc
Out[56]: 37.19392619168676
In [57]: df.pp_solutions[0].pH
Out[57]: 4.5
or for all the samples:
In [58]: [s.sc for s in df.pp_solutions]
Out[58]: [37.19392619168676, 64.73404717149515, 218.18746625402372]
Note that these are the exact same results as above when df.hgc.get_specific_conductance() was called.
But also more involved operations are possible, for example, inspecting the speciation of the first sample in the original SamplesFrame df:
In [59]: df.pp_solutions[0].species
Out[59]:
{'Amm': 4.107427346238537e-10,
'AmmH+': 7.145332058236736e-05,
'AmmHSO4-': 6.038481976861183e-08,
'Ca+2': 2.4719556914911714e-05,
'CaHSO4+': 4.107750059503443e-11,
'CaOH+': 1.206793597413229e-13,
'CaSO4': 2.3258121349793236e-07,
'Fe(OH)2': 3.377294151086896e-19,
'Fe(OH)2+': 3.0790467275285987e-13,
'Fe(OH)3': 6.15008335198951e-16,
'Fe(OH)3-': 3.481413027742945e-25,
'Fe(OH)4-': 9.631965209582988e-21,
'Fe(SO4)2-': 4.549314587256489e-19,
'Fe+2': 1.7766173469538303e-06,
'Fe+3': 8.017708354854963e-16,
'Fe2(OH)2+4': 2.084732141128855e-25,
'Fe3(OH)4+5': 7.00155604671684e-35,
'FeHSO4+': 2.9536979682997056e-12,
'FeHSO4+2': 3.2081256063657047e-20,
'FeOH+': 5.078860843975143e-12,
'FeOH+2': 5.756579318382429e-14,
'FeSO4': 1.4104744825322718e-08,
'FeSO4+': 3.3701491576993005e-16,
'H+': 3.237684629013282e-05,
'H2': 8.284273096100859e-21,
'H2O': 55.50929780738274,
'H2SiO4-2': 7.095512381986744e-21,
'H3SiO4-': 8.93438650865223e-12,
'H4SiO4': 3.3288309795364082e-06,
'HSO4-': 1.5224767993236832e-07,
'K+': 1.022746595129961e-05,
'KSO4-': 3.578529154040609e-09,
'Mn(NO3)2': 3.873920247618094e-15,
'Mn(OH)3-': 1.6845954285204035e-28,
'Mn+2': 3.612266976206157e-07,
'Mn+3': 1.2482278567749458e-29,
'MnOH+': 7.539734502001295e-14,
'MnSO4': 2.8321415294557053e-09,
'NO3-': 5.483700575514323e-05,
'Na+': 0.0003913712170485659,
'NaOH': 3.552855616723773e-24,
'NaSO4-': 1.1642908789175805e-07,
'O2': 0.00034378298237867114,
'OH-': 9.532511716174231e-11,
'SO4-2': 7.229439162542111e-05}
Note that units of these speciation calculations are in mmol/L.
One could even manipulate the solution by letting for example calcite precipitate and see how this changes pH
In [60]: desaturated_solutions = [s.desaturate('Calcite') for s in df.pp_solutions]
In [61]: pd.DataFrame(dict(
....: original=df.ph,
....: desaturated=[s.pH for s in desaturated_solutions],)
....: ).round(2)
....:
Out[61]:
original desaturated
0 4.5 4.48
1 5.5 5.47
2 7.6 7.60
For more examples, please visit the examples on the Github page of PhreeqPython.
FAQ¶
Frequently Asked Questions
Where are total P and total N?¶
These are stored as the columns P and N respectively.
Which redox state is used for the phreeqc simulations?¶
It uses the default values as used by phreeqc itself, that is, Fe(2), As(3) and Mn(2).
Why does ammonium not contribute in the redox-equilibrium?¶
This is by design as its kinetics are generally too slow. It is added as a separate species in phreeqc (Amm instead of NH4+).
How do I report alkalinity and/or bicarbonate (HCO3-)?¶
It is assumed everywhere that the HCO3- concentration equals the alkalinity.
Why is my pH, temperature or other column not added to the SamplesFrame and/or recognized by HGC?¶
A common mistake is that the temperature is added with
df.temp = 10
But this is an invalid way of adding columns to a DataFrame and therefore, it is not recognized as a column Instead, use
df['temp'] = 10
Glossary¶
Key definitions and abbreviations
- Base Exchange Index (BEX)
- Can be negative (salinized water), zero (no base exchange) or positive (freshened water).
- Chemical Oxygen Demand (COD)
- Chemical Oxygen Demand.
- Detection limit
- Sensor accuracy limit, denoted by ‘<’ or ‘>’ in the data.
- Electrical Conductivity (EC)
- Electrical Conductivity.
- Mean Oxidation Number of Carbon (MONC)
- Mean Oxidation Number of Carbon
- Stuyfzand Water Type
- Classification.
- Total Dissolved Solids (TDS)
- Total Dissolved Solids.
- Total Inorganic Carbon (TIC)
- Total Inorganic Carbon.
Units¶
An overview of the used units of all columns in a SamplesFrame.
Atoms¶
In [1]: import pandas as pd
In [2]: from hgc.constants import constants
In [3]: {a: constants.units(a) for a in constants.atoms}
Out[3]:
{'H': 'mg/L',
'He': 'mg/L',
'Li': 'μg/L',
'Be': 'μg/L',
'B': 'μg/L',
'C': 'mg/L',
'N': 'mg/L',
'O': 'mg/L',
'F': 'mg/L',
'Ne': 'mg/L',
'Na': 'mg/L',
'Mg': 'mg/L',
'Al': 'μg/L',
'Si': 'mg/L',
'P': 'mg/L',
'S': 'mg/L',
'Cl': 'mg/L',
'Ar': 'mg/L',
'K': 'mg/L',
'Ca': 'mg/L',
'Sc': 'μg/L',
'Ti': 'μg/L',
'V': 'μg/L',
'Cr': 'μg/L',
'Mn': 'mg/L',
'Fe': 'mg/L',
'Co': 'μg/L',
'Ni': 'μg/L',
'Cu': 'μg/L',
'Zn': 'μg/L',
'Ga': 'μg/L',
'Ge': 'μg/L',
'As': 'μg/L',
'Se': 'μg/L',
'Br': 'μg/L',
'Kr': 'mg/L',
'Rb': 'μg/L',
'Sr': 'μg/L',
'Y': 'μg/L',
'Zr': 'μg/L',
'Nb': 'μg/L',
'Mo': 'μg/L',
'Tc': 'mg/L',
'Ru': 'μg/L',
'Rh': 'μg/L',
'Pd': 'mg/L',
'Ag': 'μg/L',
'Cd': 'μg/L',
'In': 'μg/L',
'Sn': 'μg/L',
'Sb': 'μg/L',
'Te': 'μg/L',
'I': 'μg/L',
'Xe': 'mg/L',
'Cs': 'μg/L',
'Ba': 'μg/L',
'La': 'μg/L',
'Ce': 'μg/L',
'Pr': 'μg/L',
'Nd': 'μg/L',
'Pm': 'mg/L',
'Sm': 'μg/L',
'Eu': 'μg/L',
'Gd': 'μg/L',
'Tb': 'μg/L',
'Dy': 'μg/L',
'Ho': 'μg/L',
'Er': 'μg/L',
'Tm': 'μg/L',
'Yb': 'μg/L',
'Lu': 'μg/L',
'Hf': 'μg/L',
'Ta': 'μg/L',
'W': 'μg/L',
'Re': 'μg/L',
'Os': 'μg/L',
'Ir': 'μg/L',
'Pt': 'μg/L',
'Au': 'μg/L',
'Hg': 'μg/L',
'Tl': 'μg/L',
'Pb': 'μg/L',
'Bi': 'μg/L',
'Po': 'mg/L',
'At': 'mg/L',
'Rn': 'mg/L',
'Fr': 'mg/L',
'Ra': 'mg/L',
'Ac': 'mg/L',
'Th': 'μg/L',
'Pa': 'mg/L',
'U': 'μg/L',
'Np': 'mg/L',
'Pu': 'mg/L',
'Am': 'mg/L',
'Cm': 'mg/L',
'Bk': 'mg/L',
'Cf': 'mg/L',
'Es': 'mg/L',
'Fm': 'mg/L',
'Md': 'mg/L',
'No': 'mg/L',
'Lr': 'mg/L',
'Rf': 'mg/L',
'Db': 'mg/L',
'Sg': 'mg/L',
'Bh': 'mg/L',
'Hs': 'mg/L',
'Mt': 'mg/L',
'Ds': 'mg/L',
'Rg': 'mg/L',
'Cn': 'mg/L',
'Nh': 'mg/L',
'Fl': 'mg/L',
'Mc': 'mg/L',
'Lv': 'mg/L',
'Ts': 'mg/L',
'Og': 'mg/L'}
Ions¶
In [4]: import pandas as pd
In [5]: from hgc.constants import constants
In [6]: {a: constants.units(a) for a in constants.ions}
Out[6]:
{'CH4': 'mg/L as CH4',
'H2S': 'mg/L as S',
'S': 'mg/L',
'CO2': 'mg/L as CO2',
'alkalinity': 'mg/L as HCO3',
'O2_field': 'mg/L',
'O2_lab': 'mg/L',
'O2': 'mg/L',
'KMnO4': ' mg/L as KMnO4',
'NH4': 'mg/L',
'NO2': 'mg/L as NO2',
'NO3': 'mg/L as NO3',
'N_kj': 'mg/L',
'N': 'mg/L',
'PO4': 'mg/L as PO4',
'PO4_ortho': 'mg/L',
'P': 'mg/L',
'SiO2': 'mg/L as SiO2',
'SO4_ic': 'mg/L as SO4',
'SO4': 'mg/L as SO4',
'doc': 'mg/L',
'toc': 'mg/L',
'cod': 'mg/L'}
Other properties¶
In [7]: import pandas as pd
In [8]: from hgc.constants import constants
In [9]: {a: constants.units(a) for a in constants.properties}
Out[9]:
{'ec_field': 'μS/cm',
'ec_lab': 'μS/cm',
'ec': 'μS/cm',
'ph_field': '-',
'ph_lab': '-',
'ph': '-',
'temp_field': '°C',
'temp_lab': '°C',
'temp': '°C',
'eh_field': 'mV',
'turb': 'FTU',
'uva254': 'E/m'}
hgc.samples_frame module¶
The SamplesFrame class is an extended Pandas DataFrame, offering additional methods for validation of hydrochemical data, calculation of relevant ratios and classifications.
-
class
hgc.samples_frame.
SamplesFrame
(pandas_obj)¶ Bases:
object
DataFrame with additional hydrochemistry-specific methods. All HGC methods and attributes defined in this class are available in the namespace ‘hgc’ of the Dataframe.
Examples
To use HGC methods, we always start from a Pandas DataFrame:
import pandas as pd import hgc # We start off with an ordinary DataFrame df = pd.DataFrame({'Cl': [1,2,3], 'Mg': [11,12,13]}) # Since we imported hgc, the HGC-methods become available # on the DataFrame. This allows for instance to use HGC's # validation function df.hgc.is_valid False df.hgc.make_valid()
-
allowed_hgc_columns
¶ Returns allowed columns of the hgc SamplesFrame
-
consolidate
(use_ph='field', use_ec='lab', use_so4='ic', use_o2='field', use_temp='field', use_alkalinity='alkalinity', merge_on_na=False, inplace=True)¶ Consolidate parameters measured with different methods to one single parameter.
Parameters such as EC and pH are frequently measured both in the lab and field, and SO4 and PO4 are frequently measured both by IC and ICP-OES. Normally we prefer the field data for EC and pH, but ill calibrated sensors or tough field circumstances may prevent these readings to be superior to the lab measurement. This method allows for quick selection of the preferred measurement method for each parameter and select that for further analysis.
For each consolidated parameter HGC adds a new column that is either filled with the lab measurements or the field measurements. It is also possible to fill it with the preferred method, and fill remaining NaN’s with measurements gathered with the other possible method.
Parameters: - use_ph ({'lab', 'field', None}, default 'field') – Which pH to use? Ignored if None.
- use_ec ({'lab', 'field', None}, default 'lab') – Which EC to use?
- use_so4 ({'ic', 'field', None}, default 'ic') – Which SO4 to use?
- use_o2 ({'lab', 'field', None}, default 'field') – Which O2 to use?
- use_alkalinity (str, default 'alkalinity') – name of the column to use for alkalinity
- merge_on_na (bool, default False) – Fill NaN’s from one measurement method with measurements from other method.
- inplace (bool, default True) – Modify SamplesFrame in place. inplace=False is not implemented (yet)
Raises: ValueError: if one of the `use_` parameters is set to a column that is not in the dataframe – or if one of the default parameters is not in the dataframe while it is not set to None.
-
fillna_concentrations
(how='phreeqc')¶ Calculate missing concentrations based on the charge balance.
Parameters: how ({'phreeqc', 'analytic'}, default 'phreeqc') – Method to compute missing concentrations.
-
fillna_ec
(use_phreeqc=True)¶ Calculate missing Electrical Conductivity measurements using known anions and cations.
-
get_bex
(watertype='G', inplace=True)¶ Get Base Exchange Index (meq/L). By default this is the BEX without dolomite.
Parameters: - watertype ({'G', 'P'}, default 'G') – Watertype (Groundwater or Precipitation)
- inplace (bool, optional, default True) – whether the saturation index should be added to the pd.DataFrame (inplace=True) as column si_<mineral_name> or returned as a pd.Series (inplace=False).
Returns: Returns None if inplace=True or pd.Series with base exchange index for each row in SamplesFrame if inplace=False.
Return type: pandas.Series or None
-
get_dominant_anions
(inplace=True)¶ calculates the dominant anion of each row in the SamplesFrame as used by the Stuyfzand water type classification ( See: http://www.hydrology-amsterdam.nl/valorisation/HGCmanual_v2_1.pdf chapter 5 for the definitions.)
Parameters: inplace (bool, optional, default True) – whether the dominant anion should be added to the pd.DataFrame as column dominant_anion (inplace=True) or returned as a pd.Series (inplace=False). Returns: Returns None if inplace=True or pd.Series with dominant anion for each row in SamplesFrame if inplace=False. Return type: pandas.Series or None
-
get_dominant_cations
(*args, **kwargs)¶
-
get_ion_balance
(inplace=True)¶ Calculate the balance between anion and cations and add it as a percentage [%] to the column ‘ion_balance’ to the SamplesFrame
Parameters: inplace (bool, optional, default True) – whether the ion balance should be added to the SamplesFrame (inplace=True) as column ion_balance or returned as a pd.Series (inplace=False). Returns: Returns None if inplace=True or pd.Series with ion balance for each row in SamplesFrame if inplace=False. Return type: pandas.Series or None
-
get_partial_pressure
(gas, use_phreeqc=True, inplace=True, **kwargs)¶ adds or returns the partial pressure of a gas using phreeqc. It is an alias for get_saturation_index so look at that method for details. gas column is pp_<gas_name>
-
get_phreeqpython_solutions
(equilibrate_with='none', inplace=True)¶ Return a series of phreeqpython solutions derived from the (row)data in the SamplesFrame.
Parameters: - equilibrate_with (str, default 'none') – Ion to add for achieving charge equilibrium in the solutions.
- inplace (bool, default True) – Whether the result is returned as a pd.Series or is added to the pd.DataFrame as column pp_solutions.
Returns: Returns None if inplace=True and pd.Series with PhreeqPython.Solution instances for every row in SamplesFrame if inplace=False.
Return type: pandas.Series or None
-
get_ratios
(*args, **kwargs)¶
-
get_saturation_index
(mineral_or_gas, use_phreeqc=True, inplace=True, **kwargs)¶ - adds or returns the saturation index (SI) of a mineral or the partial pressure of a gas using phreeqc. The
- column name of the result is si_<mineral_name> in lower case (if inplace=True).
Parameters: - mineral_or_gas (str) – the name of the mineral of which the SI needs to be calculated
- use_phreeqc (bool) –
- whether to return use phreeqc as backend or fall back on internal hgc-routines to calculate SI
- or partial pressure
- inplace: bool, optional, default=True
- whether the saturation index should be added to the pd.DataFrame (inplace=True) as column si_<mineral_name> or returned as a pd.Series (inplace=False).
returns: Returns None if inplace=True and pd.Series with the saturation index of the mineral for each row in SamplesFrame if inplace=False. rtype: pandas.Series or None
-
get_specific_conductance
(use_phreeqc=True, inplace=True, **kwargs)¶ returns the specific conductance (sc) of a water sample using phreeqc. sc is also known as electric conductivity (ec) or egv measurements.
Parameters: - use_phreeqc (bool, optional) – whether to return use phreeqc as backend or fall back on internal hgc-routines to calculate SI or partial pressure
- inplace (bool, optional, default=True) – whether the specific conductance should be added to the pd.DataFrame (inplace=True) as column sc or returned as a pd.Series (inplace=False).
- **kwargs – are passed to the method get_phreeqpython_solutions
Returns: Returns None if inplace=True and pd.Series with specific conductance for each row in SamplesFrame if inplace=False.
Return type: pandas.Series or None
-
get_stuyfzand_water_type
(*args, **kwargs)¶
-
get_sum_anions
(*args, **kwargs)¶
-
get_sum_cations
(*args, **kwargs)¶
-
hgc_cols
¶ Return the columns that are used by hgc
-
is_valid
¶ returns a boolean indicating that the columns used by hgc have valid values
-
make_valid
()¶ Try to convert the DataFrame into a valid HGC-SamplesFrame.
-
select_phreeq_columns
(*args, **kwargs)¶
-
-
hgc.samples_frame.
requires_ph
(func)¶ Decorator function for methods in the SamplesFrame class that require a column ph with valid values (non-zero and non-NaN).
About¶
HGC is developed by KWR Water Research Institute. The code is based on the original Excel-based HGC program by Pieter Stuyfzand.
License¶
This package is MIT licensed. See License File .