bulk_sokal_sneath_2_binary_similarity#

skfp.distances.bulk_sokal_sneath_2_binary_similarity(X: list | ndarray | csr_array, Y: list | ndarray | csr_array | None = None) ndarray#

Bulk Sokal-Sneath similarity 2 for binary matrices.

Computes the pairwise Sokal-Sneath similarity 2 between binary matrices. If one array is passed, similarities are computed between its rows. For two arrays, similarities are between their respective rows, with i-th row and j-th column in output corresponding to i-th row from the first array and j-th row from the second array.

The formula is:

\[sim(a, b) = \frac{|a \cap b|}{2|a| + 2|b| - 3|a \cap b|}\]
Parameters:
  • X (ndarray or CSR sparse array) – First binary input array, of shape \(m \times d\).

  • Y (ndarray or CSR sparse array, default=None) – Second binary input array, of shape \(n \times d\). If not passed, similarities are computed between rows of X.

Returns:

similarities – Array with pairwise Sokal-Sneath similarity 2 values. Shape is \(m \times n\) if two arrays are passed, or \(m \times m\) otherwise.

Return type:

ndarray

Examples

>>> from skfp.distances import bulk_sokal_sneath_2_binary_similarity
>>> import numpy as np
>>> X = np.array([[1, 1, 1], [0, 0, 1]])
>>> Y = np.array([[1, 0, 1], [0, 1, 1]])
>>> bulk_sokal_sneath_2_binary_similarity(X, Y)
array([[0.5       , 0.5       ],
       [0.33333333, 0.33333333]])
>>> from scipy.sparse import csr_array
>>> X = csr_array([[1, 1, 1], [0, 0, 1]])
>>> Y = csr_array([[1, 0, 1], [0, 1, 1]])
>>> bulk_sokal_sneath_2_binary_similarity(X, Y)
array([[0.5       , 0.5       ],
       [0.33333333, 0.33333333]])