Pandas: Efficient way to get first row with element that is smaller than a given value

Georgios Bitzes

I'm wondering if there's an efficient way to do this in pandas: Given a dataframe, what is the first row that is smaller than a given value? For example, given:

      addr
0  4196656
1  4197034
2  4197075
3  4197082
4  4197134

What is the first value that is smaller than 4197080? I want it to return just the row with 4197075. A solution would be to first filter by 4197080 and then take the last row, but that looks like to be an extremely slow O(N) operation (first building a dataframe and then taking its last row), while a binary search would take O(logN).

df.addr[ df.addr < 4197080].tail(1)

I timed it, and creating df.addr[ df.addr < 4197080] more or less takes the same as df.addr[ df.addr < 4197080].tail(1), strongly hinting that internally it's building an entire df first.

num = np.random.randint(0, 10**8, 10**6)
num.sort()
df = pd.DataFrame({'addr':num})
df = df.set_index('addr', drop=False)
df = df.sort_index()

Getting the first smaller value is very slow:

%timeit df.addr[ df.addr < 57830391].tail(1)
100 loops, best of 3: 7.9 ms per loop

Using lt improves things a bit:

%timeit df.lt(57830391)[-1:]
1000 loops, best of 3: 853 µs per loop

But still nowhere near as fast as a binary search:

%timeit bisect(num, 57830391, 0, len(num))
100000 loops, best of 3: 6.53 µs per loop

Is there any better way?

Jeff

This requires 0.14.0

Note that the frame IS NOT SORTED.

In [16]: s = df['addr']

Find biggest value lower than required

In [18]: %timeit s[s<5783091]
100 loops, best of 3: 9.01 ms per loop

In [19]: %timeit s[s<5783091].nlargest(1)
100 loops, best of 3: 11 ms per loop

So this is faster than actuallying performing a full-sort, then indexing. The .copy is to avoid biasing the inplace sort.

In [32]: x = np.random.randint(0, 10**8, 10**6)

In [33]: def f(x):
   ....:     x.copy().sort()
   ....:     

In [35]: %timeit f(x)
10 loops, best of 3: 67.2 ms per loop

If you are simply searching an ALREADY SORTED series, then use searchsorted. Note that you must use the numpy version (e.g. operate on .values. The series version will be defined in 0.14.1)

In [41]: %timeit  s.values.searchsorted(5783091)
100000 loops, best of 3: 2.5 µs per loop

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Java

Pandas - Get first row value of a given column

From Dev

Most efficient way to find the smallest index where its value minus the value of a previous index is smaller than a given x?

From Dev

Most efficient way to find the smallest index where its value minus the value of a previous index is smaller than a given x?

From Dev

Maximal subset sum smaller than a given value

From Dev

What is an efficient way to find the first occurrence of an element after a given position in an array?

From Dev

Find number of elements smaller than a given element in BST

From Dev

First images inside a row smaller than other ones html/css

From Dev

Efficient way to do pandas operation and skip row

From Dev

Optimization - return first value in array that is smaller then cell value (for each row)

From Dev

What is the most efficient way to get rid off border-right of the first element of table header?

From Dev

Get the column name for the first non-zero value in that row with pandas

From Dev

Efficient Way to Get All Values From Hash With Key Less Than a Certain Value

From Dev

More efficient way of replacing dates with first day of a given week

From Dev

The Ruby way to test if multiple variables are smaller than a value

From Dev

The Ruby way to test if multiple variables are smaller than a value

From Java

Most efficient way to get the last element of a stream

From Dev

Efficient way to get a random element in Scala?

From Dev

Efficient way to get a random element in Scala?

From Dev

How to find the first smaller value compared to the current row in subsequent rows?

From Dev

Efficient way of counting number of elements smaller (larger) than cutoff in a sorted list

From Dev

Given a jQuery, how to get the first result element?

From Dev

Efficient way to check if a traversable has more than 1 element in Scala

From Dev

Efficient way of reading only first row of a table as a vector

From Dev

How to get a number higher than a given value

From Dev

Whats the most efficient way to remove the first element from a large vector?

From Dev

Ruby: efficient way to sum elements grouped by the first element of each subarray

From Dev

Can a row with a bigger auto-increment value appear sooner than a row with a smaller one?

From Dev

Pandas select last row value grater than

From Dev

Search given value in table and return value from first row

Related Related

  1. 1

    Pandas - Get first row value of a given column

  2. 2

    Most efficient way to find the smallest index where its value minus the value of a previous index is smaller than a given x?

  3. 3

    Most efficient way to find the smallest index where its value minus the value of a previous index is smaller than a given x?

  4. 4

    Maximal subset sum smaller than a given value

  5. 5

    What is an efficient way to find the first occurrence of an element after a given position in an array?

  6. 6

    Find number of elements smaller than a given element in BST

  7. 7

    First images inside a row smaller than other ones html/css

  8. 8

    Efficient way to do pandas operation and skip row

  9. 9

    Optimization - return first value in array that is smaller then cell value (for each row)

  10. 10

    What is the most efficient way to get rid off border-right of the first element of table header?

  11. 11

    Get the column name for the first non-zero value in that row with pandas

  12. 12

    Efficient Way to Get All Values From Hash With Key Less Than a Certain Value

  13. 13

    More efficient way of replacing dates with first day of a given week

  14. 14

    The Ruby way to test if multiple variables are smaller than a value

  15. 15

    The Ruby way to test if multiple variables are smaller than a value

  16. 16

    Most efficient way to get the last element of a stream

  17. 17

    Efficient way to get a random element in Scala?

  18. 18

    Efficient way to get a random element in Scala?

  19. 19

    How to find the first smaller value compared to the current row in subsequent rows?

  20. 20

    Efficient way of counting number of elements smaller (larger) than cutoff in a sorted list

  21. 21

    Given a jQuery, how to get the first result element?

  22. 22

    Efficient way to check if a traversable has more than 1 element in Scala

  23. 23

    Efficient way of reading only first row of a table as a vector

  24. 24

    How to get a number higher than a given value

  25. 25

    Whats the most efficient way to remove the first element from a large vector?

  26. 26

    Ruby: efficient way to sum elements grouped by the first element of each subarray

  27. 27

    Can a row with a bigger auto-increment value appear sooner than a row with a smaller one?

  28. 28

    Pandas select last row value grater than

  29. 29

    Search given value in table and return value from first row

HotTag

Archive