[ACCEPTED]-Why NUMPY correlate and corrcoef return different values and how to "normalize" a correlate in "full" mode?-correlation

Accepted answer
Score: 28

You are looking for normalized cross-correlation. This 21 option isn't available yet in Numpy, but 20 a patch is waiting for review that does just what 19 you want. It shouldn't be too hard to apply 18 it I would think. Most of the patch is just 17 doc string stuff. The only lines of code 16 that it adds are

if normalize:
    a = (a - mean(a)) / (std(a) * len(a))
    v = (v - mean(v)) /  std(v)

where a and v are the inputted 15 numpy arrays of which you are finding the 14 cross-correlation. It shouldn't be hard 13 to either add them into your own distribution 12 of Numpy or just make a copy of the correlate 11 function and add the lines there. I would 10 do the latter personally if I chose to go 9 this route.

Another, quite possibly better, alternative 8 is to just do the normalization to the input 7 vectors before you send it to correlate. It's 6 up to you which way you would like to do 5 it.

By the way, this does appear to be the 4 correct normalization as per the Wikipedia page on cross-correlation except 3 for dividing by len(a) rather than (len(a)-1). I feel that 2 the discrepancy is akin to the standard deviation of the sample vs. sample standard deviation and really 1 won't make much of a difference in my opinion.

Score: 0

According to this slides, I would suggest to do 1 it this way:

def cross_correlation(a1, a2):
        lags = range(-len(a1)+1, len(a2))
        cs = []
        for lag in lags:
            idx_lower_a1 = max(lag, 0)
            idx_lower_a2 = max(-lag, 0)
            idx_upper_a1 = min(len(a1), len(a1)+lag)
            idx_upper_a2 = min(len(a2), len(a2)-lag)
            b1 = a1[idx_lower_a1:idx_upper_a1]
            b2 = a2[idx_lower_a2:idx_upper_a2]
            c = np.correlate(b1, b2)[0]
            c = c / np.sqrt((b1**2).sum() * (b2**2).sum())
        return cs
Score: 0

For a full mode, would it make sense to compute 1 corrcoef directly on the lagged signal/feature? Code

from dataclasses import dataclass
from typing import Any, Optional, Sequence

import numpy as np

ArrayLike = Any

class XCorr:
    cross_correlation: np.ndarray
    lags: np.ndarray

def cross_correlation(
    signal: ArrayLike, feature: ArrayLike, lags: Optional[Sequence[int]] = None
) -> XCorr:
    Computes normalized cross correlation between the `signal` and the `feature`.
    Current implementation assumes the `feature` can't be longer than the `signal`.
    You can optionally provide specific lags, if not provided `signal` is padded
    with the length of the `feature` - 1, and the `feature` is slid/padded (creating lags)
    with 0 padding to match the length of the new signal. Pearson product-moment
    correlation coefficients is computed for each lag.

    See: https://en.wikipedia.org/wiki/Cross-correlation

    :param signal: observed signal
    :param feature: feature you are looking for
    :param lags: optional lags, if not provided equals to (-len(feature), len(signal))
    signal_ar = np.asarray(signal)
    feature_ar = np.asarray(feature)
    if np.count_nonzero(feature_ar) == 0:
        raise ValueError("Unsupported - feature contains only zeros")
    assert (
        signal_ar.ndim == feature_ar.ndim == 1
    ), "Unsupported - only 1d signal/feature supported"
    assert len(feature_ar) <= len(
    ), "Unsupported - signal should be at least as long as the feature"
    padding_sz = len(feature_ar) - 1
    padded_signal = np.pad(
        signal_ar, (padding_sz, padding_sz), "constant", constant_values=0
    lags = lags if lags is not None else range(-padding_sz, len(signal_ar), 1)
    if np.max(lags) >= len(signal_ar):
        raise ValueError("max positive lag must be shorter than the signal")
    if np.min(lags) <= -len(feature_ar):
        raise ValueError("max negative lag can't be longer than the feature")
    assert np.max(lags) < len(signal_ar), ""
    lagged_patterns = np.asarray(
                (padding_sz + lag, len(signal_ar) - lag - 1),
            for lag in lags
    return XCorr(
        cross_correlation=np.corrcoef(padded_signal, lagged_patterns)[0, 1:],


signal = [0, 0, 1, 0.5, 1, 0, 0, 1]
feature = [1, 0, 0, 1]
xcorr = cross_correlation(signal, feature)
assert xcorr.lags[xcorr.cross_correlation.argmax()] == 4

More Related questions