load_asap_benchmark#

skfp.datasets.asap.load_asap_benchmark(subset: list[str] | None = None, data_dir: str | PathLike | None = None, as_frames: bool = False, verbose: bool = False, force_update: bool = False) Iterator[tuple[str, DataFrame]] | Iterator[tuple[str, list[str], ndarray]]#

Load the ASAP Discovery-Polaris-OpenADMET challenge datasets.

ASAP Discovery - Polaris - OpenADMET challenge [1] [2] [3] datasets come from antiviral drug discovery campaigns by the ASAP Discovery consortium, targeting SARS-CoV-2 and MERS-CoV main protease (Mpro) inhibitors. The challenge included ADMET and potency endpoints.

For more details, see loading functions for particular datasets. Allowed individual dataset names are listed below. Dataset names are also returned (case-sensitive).

  • HLM

  • KSOL

  • LogD

  • MDR1-MDCKII

  • MLM

  • pIC50 SARS-CoV-2

  • pIC50 MERS-CoV

Parameters:
  • subset (None or list of strings) – If None, returns all datasets. List of strings loads only datasets with given names.

  • data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.

  • as_frames (bool, default=False) – If True, returns the raw DataFrame for each dataset. Otherwise, returns SMILES as a list of strings, and labels as a NumPy array for each dataset.

  • verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.

  • force_update (bool, default=False) – If True, always re-download the dataset from HuggingFace Hub, even if it is already present locally. If False, the dataset is downloaded only if it is not yet available locally.

Returns:

data – Loads and returns datasets with a generator. Returned types depend on the as_frame parameter, either: - Pandas DataFrame with columns: “SMILES”, “label” - tuple of: list of strings (SMILES), NumPy array (labels)

Return type:

generator of pd.DataFrame or tuples (list[str], np.ndarray)

References