Aggregating Tables

As one of the main use cases for using pyrolite-meltsutil is executing, interrogating and visualising multiple experiments, one of the core functionalities is importing alphaMELTS results and integrating these. Of the key functions to do this is aggregate_tables(). This enables you to load in all the results from an array of experiments within a single folder, enabling subsequent analysis and visualization.

First let’s find a folder with some results. In this case we’ll use one of the pyrolite-meltsutil example folders which already contains some batch experiment results:

from pyrolite_meltsutil.util.general import get_data_example

experiment_dir = get_data_example("batch")

Now we can import the table files from each of the experiments. Note that in the same fashion as import_tables(), aggregate_tables() returns two tables - one for system variables and one for phases, which contains information pertaining to individual phases or aggregates (e.g. ‘olivine_0’, ‘bulk’, ‘liquid’ etc).

from pyrolite_meltsutil.tables.load import aggregate_tables

system, phases = aggregate_tables(experiment_dir)

/home/docs/checkouts/readthedocs.org/user_builds/pyrolite-meltsutil/checkouts/develop/pyrolite_meltsutil/tables/load.py:333: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  phase = pd.concat([phase, cumulate_comp])
/home/docs/checkouts/readthedocs.org/user_builds/pyrolite-meltsutil/checkouts/develop/pyrolite_meltsutil/tables/load.py:333: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
  phase = pd.concat([phase, cumulate_comp])

In addition to the variables you’d expect from the tables, the returned dataframes also include an ‘experiment’ column which contains the hash-index of each experiment such that they can be easily distinguished:

phases.experiment.unique()

array(['4689ca6fc3', '363f3d0a0b'], dtype=object)

As this aggregation process can take a while for larger arrays of experiments, it’s generally a good idea to save these results to disk such that they can be loaded faster:

import pandas as pd

system.to_csv(experiment_dir / "system.csv")
phases.to_csv(experiment_dir / "phases.csv")

Then next time you wish to access the data, you could simply load the tables back in with:

system, phases = (
    pd.read_csv(experiment_dir / "system.csv"),
    pd.read_csv(experiment_dir / "phases.csv"),
)