[ACCEPTED]-python(or numpy) equivalent of match in R-numpy
Accepted answer
>>> a = [5,4,3,2,1]
>>> b = [2,3]
>>> [ b.index(x) if x in b else None for x in a ]
[None, None, 1, 0, None]
Add 1 if you really need position "one based" instead 3 of "zero based".
>>> [ b.index(x)+1 if x in b else None for x in a ]
[None, None, 2, 1, None]
You can make this one-liner 2 reusable if you are going to repeat it a 1 lot:
>>> match = lambda a, b: [ b.index(x)+1 if x in b else None for x in a ]
>>> match
<function <lambda> at 0x04E77B70>
>>> match(a, b)
[None, None, 2, 1, None]
A faster approach building on Paulo Scardine's answer (difference 3 becomes more meaningful as the size of the 2 arrays increases). If you don't mind losing 1 the one-liner:
from typing import Hashable, List
def match_list(a: List[Hashable], b: List[Hashable]) -> List[int]:
return [b.index(x) if x in b else None for x in a]
def match(a: List[Hashable], b: List[Hashable]) -> List[int]:
b_dict = {x: i for i, x in enumerate(b)}
return [b_dict.get(x, None) for x in a]
import random
a = [random.randint(0, 100) for _ in range(10000)]
b = [i for i in range(100) if i % 2 == 0]
%timeit match(a, b)
>>> 580 µs ± 15.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit match_list(a, b)
>>> 6.13 ms ± 146 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
match(a, b) == match_list(a, b)
>>> True
one can accomplish the match functionality 3 of R in python and return the matched indices 2 as a dataframe index(useful for further 1 subsetting) as
import numpy as np
import pandas as pd
def match(ser1, ser2):
"""
return index of ser2 matching elements of ser1(or return np.nan)
equivalent to match function of R
"""
idx=[ser2.index[ser2==ser1[i]].to_list()[0] if ser1.isin(ser2)[i] == True else np.nan for i in range(len(ser1))]
return (pd.Index(idx))
Source:
stackoverflow.com
More Related questions
Cookie Warning
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.