load_expansionrx_splits#

skfp.datasets.expansionrx.load_expansionrx_splits(dataset_name: str, data_dir: str | PathLike | None = None, as_dict: bool = False, verbose: bool = False, force_update: bool = False) → tuple[list[int], list[int]] | dict[str, list[int]]#

Load pre-generated dataset splits for the ExpansionRx-OpenADMET challenge.

ExpansionRx-OpenADMET challenge [1] provides time (chronological) split, based on the experiment order during late-stage ADMET optimization. 70/30 train/test split is used, with no provided validation data. However, in Pandas DataFrame output, IDs in the “Molecule name” column are meaningful and indicate experiment order.

Dataset names are the same as those returned by load_expansionrx_benchmark() and are case-sensitive.

Parameters:

dataset_name (str) – Name of the dataset to load splits for.
data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.
as_dict (bool, default=False) – If True, returns the splits as dictionary with keys “train”, “valid” and “test”, and index lists as values. Otherwise, returns three lists with splits indexes.
verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.
force_update (bool, default=False) – If True, always re-download the dataset from HuggingFace Hub, even if it is already present locally. If False, the dataset is downloaded only if it is not yet available locally.

Returns:

data – Depending on the as_dict argument, one of: - two lists of integer indexes - dictionary with “train” and “test” keys, and values as lists with splits indexes

Return type:

tuple(list[int], list[int]) or dict

References