Aggregating Tables
As one of the main use cases for using pyrolite-meltsutil is executing, interrogating
and visualising multiple experiments, one of the core functionalities is importing
alphaMELTS results and integrating these. Of the key functions to do this is
aggregate_tables()
. This enables you to load
in all the results from an array of experiments within a single folder, enabling
subsequent analysis and visualization.
First let’s find a folder with some results. In this case we’ll use one of the pyrolite-meltsutil example folders which already contains some batch experiment results:
from pyrolite_meltsutil.util.general import get_data_example
experiment_dir = get_data_example("batch")
Now we can import the table files from each of the experiments. Note that in the
same fashion as import_tables()
,
aggregate_tables()
returns two tables
- one for system
variables and one for phases
,
which contains information pertaining to individual phases or aggregates (e.g.
‘olivine_0’, ‘bulk’, ‘liquid’ etc).
from pyrolite_meltsutil.tables.load import aggregate_tables
system, phases = aggregate_tables(experiment_dir)
/home/docs/checkouts/readthedocs.org/user_builds/pyrolite-meltsutil/checkouts/develop/pyrolite_meltsutil/tables/load.py:333: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
phase = pd.concat([phase, cumulate_comp])
/home/docs/checkouts/readthedocs.org/user_builds/pyrolite-meltsutil/checkouts/develop/pyrolite_meltsutil/tables/load.py:333: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
phase = pd.concat([phase, cumulate_comp])
In addition to the variables you’d expect from the tables, the returned dataframes also include an ‘experiment’ column which contains the hash-index of each experiment such that they can be easily distinguished:
array(['4689ca6fc3', '363f3d0a0b'], dtype=object)
As this aggregation process can take a while for larger arrays of experiments, it’s generally a good idea to save these results to disk such that they can be loaded faster:
import pandas as pd
system.to_csv(experiment_dir / "system.csv")
phases.to_csv(experiment_dir / "phases.csv")
Then next time you wish to access the data, you could simply load the tables back in with:
system, phases = (
pd.read_csv(experiment_dir / "system.csv"),
pd.read_csv(experiment_dir / "phases.csv"),
)
See also
Total running time of the script: (0 minutes 0.413 seconds)