load_moleculeace_benchmark#

skfp.datasets.moleculeace.load_moleculeace_benchmark(subset: list[str] | None = None, data_dir: str | PathLike | None = None, as_frames: bool = False, verbose: bool = False, force_update: bool = False) → Iterator[tuple[str, DataFrame]] | Iterator[tuple[str, list[str], ndarray]]#

Load the MoleculeACE benchmark datasets.

MoleculeACE [1] datasets are varied inhibition and effective concentration targets from ChEMBL [2]. Activity cliffs split is recommended for all of them.

For more details, see loading functions for particular datasets. Allowed individual dataset names are listed below. Dataset names are also returned (case-sensitive).

chembl204_ki
chembl214_ki
chembl218_ec50
chembl219_ki
chembl228_ki
chembl231_ki
chembl233_ki
chembl234_ki
chembl235_ec50
chembl236_ki
chembl237_ec50
chembl237_ki
chembl238_ki
chembl239_ec50
chembl244_ki
chembl262_ki
chembl264_ki
chembl287_ki
chembl1862_ki
chembl1871_ki
chembl2034_ki
chembl2047_ec50
chembl2147_ki
chembl2835_ki
chembl2971_ki
chembl3979_ec50
chembl4005_ki
chembl4203_ki
chembl4616_ec50
chembl4792_ki

Parameters:

subset (None or list of strings) – If None, returns all datasets. List of strings loads only datasets with given names.
data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.
as_frames (bool, default=False) – If True, returns the raw DataFrame for each dataset. Otherwise, returns SMILES as a list of strings, and labels as a NumPy array for each dataset.
verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.
force_update (bool, default=False) – If True, always re-download the dataset from HuggingFace Hub, even if it is already present locally. If False, the dataset is downloaded only if it is not yet available locally.

Returns:

data – Loads and returns datasets with a generator. Returned types depend on the as_frame parameter, either: - Pandas DataFrame with columns: “SMILES”, “label” - tuple of: list of strings (SMILES), NumPy array (labels)

Return type:

generator of pd.DataFrame or tuples (list[str], np.ndarray)

References