load_moleculeace_benchmark#

skfp.datasets.moleculeace.load_moleculeace_benchmark(subset: list[str] | None = None, data_dir: str | PathLike | None = None, as_frames: bool = False, verbose: bool = False, force_update: bool = False) Iterator[tuple[str, DataFrame]] | Iterator[tuple[str, list[str], ndarray]]#

Load the MoleculeACE benchmark datasets.

MoleculeACE [1] datasets are varied inhibition and effective concentration targets from ChEMBL [2]. Activity cliffs split is recommended for all of them.

For more details, see loading functions for particular datasets. Allowed individual dataset names are listed below. Dataset names are also returned (case-sensitive).

  • chembl204_ki

  • chembl214_ki

  • chembl218_ec50

  • chembl219_ki

  • chembl228_ki

  • chembl231_ki

  • chembl233_ki

  • chembl234_ki

  • chembl235_ec50

  • chembl236_ki

  • chembl237_ec50

  • chembl237_ki

  • chembl238_ki

  • chembl239_ec50

  • chembl244_ki

  • chembl262_ki

  • chembl264_ki

  • chembl287_ki

  • chembl1862_ki

  • chembl1871_ki

  • chembl2034_ki

  • chembl2047_ec50

  • chembl2147_ki

  • chembl2835_ki

  • chembl2971_ki

  • chembl3979_ec50

  • chembl4005_ki

  • chembl4203_ki

  • chembl4616_ec50

  • chembl4792_ki

Parameters:
  • subset (None or list of strings) – If None, returns all datasets. List of strings loads only datasets with given names.

  • data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.

  • as_frames (bool, default=False) – If True, returns the raw DataFrame for each dataset. Otherwise, returns SMILES as a list of strings, and labels as a NumPy array for each dataset.

  • verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.

  • force_update (bool, default=False) – If True, always re-download the dataset from HuggingFace Hub, even if it is already present locally. If False, the dataset is downloaded only if it is not yet available locally.

Returns:

data – Loads and returns datasets with a generator. Returned types depend on the as_frame parameter, either: - Pandas DataFrame with columns: “SMILES”, “label” - tuple of: list of strings (SMILES), NumPy array (labels)

Return type:

generator of pd.DataFrame or tuples (list[str], np.ndarray)

References