load_bioavailability_ma#
- skfp.datasets.tdc.adme.load_bioavailability_ma(data_dir: str | PathLike | None = None, as_frame: bool = False, verbose: bool = False, force_update: bool = False) DataFrame | tuple[list[str], ndarray]#
Load the Bioavailability dataset.
The task is to predict the oral bioavailability. It is defined as “the rate and extent to which the active ingredient or active moiety is absorbed from a drug product and becomes available at the site of action” [1] [2].
This dataset is a part of “absorption” subset of ADME tasks.
Tasks
1
Task type
classification
Total samples
640
Recommended split
scaffold
Recommended metric
AUROC
- Parameters:
data_dir ({None, str, path-like}, default=None) – Path to the root data directory. If
None, currently set scikit-learn directory is used, by default $HOME/scikit_learn_data.as_frame (bool, default=False) – If True, returns the raw DataFrame with columns: “SMILES”, “label”. Otherwise, returns SMILES as list of strings, and labels as a NumPy array (1D integer binary vector).
verbose (bool, default=False) – If True, progress bar will be shown for downloading or loading files.
force_update (bool, default=False) – If True, always re-download the dataset from HuggingFace Hub, even if it is already present locally. If False, the dataset is downloaded only if it is not yet available locally.
- Returns:
data – Depending on the
as_frameargument, one of: - Pandas DataFrame with columns: “SMILES”, “label” - tuple of: list of strings (SMILES), NumPy array (labels)- Return type:
pd.DataFrame or tuple(list[str], np.ndarray)
References