run_in_parallel#
- skfp.utils.run_in_parallel(func: Callable, data: Sequence, n_jobs: int | None = None, batch_size: int | None = None, single_element_func: bool = False, flatten_results: bool = False, verbose: int | dict = 0, **kwargs) list#
Run a function in parallel on provided data in batches, using joblib.
Results are returned in the same order as input data.
funcfunction must take batch of data, e.g. list of integers, not a single integer.If
funcreturns lists, the result will be a list of lists. To get a flat list of results, useflatten_results=True.Note that progress bar for
verboseoption tracks processing of data batches, not individual data points.- Parameters:
func (Callable) – The function to run in parallel. It must take only a single argument, a batch of data.
data ({sequence, array-like} of shape (n_samples,)) – Sequence containing data to process.
n_jobs (int, default=None) – The number of jobs to run in parallel.
Nonemeans 1 unless in ajoblib.parallel_backendcontext.-1means using all processors. See scikit-learn documentation onn_jobsfor more details.batch_size (int, default=None) – Number of inputs processed in each batch.
Nonedivides input data into equal-sized parts, as many asn_jobs.single_element_func (bool, default=False) – If True, single element will be passed to
func, rather than batches of data. This way, non-batched functions can be parallelized. When this option is set,batch_sizewill be ignored.flatten_results (bool, default=False) – Whether to flatten the results, e.g. to change list of lists of integers into a list of integers.
verbose (int or dict, default=0) – Controls the verbosity. If higher than zero, progress bar will be shown, tracking the processing of batches. If
dictobject is provided, it will be used to configure thetqdmprogress bar.**kwargs (dict) – parameters specific to the function passed in the
funcparameter
- Returns:
X – The processed data. If the processing function returns lists, this will be a list of lists.
- Return type:
list of length (n_samples,)
Examples
>>> from skfp.utils import run_in_parallel >>> func = lambda X: [x + 1 for x in X] >>> data = list(range(10)) >>> run_in_parallel(func, data, n_jobs=-1, batch_size=1) [[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]] >>> run_in_parallel(func, data, n_jobs=-1, batch_size=1, flatten_results=True) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> func_single = lambda x: x + 1 >>> run_in_parallel(func_single, data, n_jobs=-1, single_element_func=True) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]