Python - speed up finding percentile of set which is greater than threshold

kilojoules

I need to find which percentile of a group of numbers is over a threshold value. Is there a way that this can be speed up? My implementation is much too slow for the intended application. In case this changes anything, I am running my program using mpirun -np 100 python program.py. I cannot use numba, as the rest of this program uses try/except statements.

import numpy as np
my_vals = []
threshold_val = 0.065
for i in range(60000):
    my_vals.append(np.random.normal(0.05, 0.02))

for i in np.arange(0,100,0.001):
    if np.percentile(my_vals,i) > threshold_val:
        perc = 1*i
        break
else: perc = 100
Matt Jordan

Since the Gaussian (Normal) distribution produces a bell-curve, you should be able to calculate the percentile with the highest probability of being optimal and then write your code to check there first, and then use a modified binary search to find the optimal lowest threshold.

For example, if you determine that your parameters are most likely to favor e.g. 17.951 (this is an example only, I didn't actually bother computing it), then begin near that point rather than starting at 0. Treat this like a binary search - start your lower limit at 0 and your upper limit at 100.0, and set the point to bisect the list as the optimal percentile for the distribution.

If your current upper limit is over threshold_val, bisect the lower half to find the lowest such value that matches; if it is not over the threshold, bisect the upper half, etc. So, e.g. in the range 0.000 to 100.000, if you start at 17.951 and find that it is not above the threshold, adjust to bounds to 17.952 to 100.000 and try 58.976 (halfway between). As soon as you find a value that is above the threshold, then use that value as the upper bound (since it is a non-optimal answer). Continue this process until the lower and upper bounds are 0.001 apart, which gives you the optimal answer. On average, you should have to run about 17 tests rather than 100,000.

You may also be able to automate the computation of the optimal value in case your normal distribution will change, since the distribution produces a bell-curve, and you will know the statistics of that bell-curve based on the parameters anyway.

Your solution only needs to find the lowest value for which the percentile is above your threshold, so this approach should minimize the number of samples you need to check.

One more hint: np.percentile has to sorts the my_vals 100,000 times in your code; I do not know whether a pre-sorted list would help, but it may be worth checking (you'll probably have to test several possible sort parameters since it doesn't appear to be documented in which direction it sorts).

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

Python - speed up finding percentile of set which is greater than threshold

From Dev

Finding first samples greater than a threshold value efficiently in Python (and MATLAB comparison)

From Dev

Python: Find largest array index along a specific dimension which is greater than a threshold

From Dev

MATLAB: how to find indices of cells which have length greater than threshold?

From Dev

Count values which are different from NA and discard column if result is greater than threshold

From Dev

MATLAB: how to find indices of cells which have length greater than threshold?

From Dev

How to parse a line which has a number after a certain string greater than a threshold?

From Dev

numpy.argmin for elements greater than a threshold

From Dev

MySQL select records with sum greater than threshold

From Dev

Substitute those values greater than the 95% percentile with NA

From Dev

Finding a number greater than x in a range

From Dev

finding prime numbers greater than 10^12

From Dev

finding prime numbers greater than 10^12

From Dev

Finding values greater than * in a map list

From Dev

Finding values in elements of a list greater than X

From Dev

Method for finding strictly greater-than-zero solution for Python's Scipy Linear Programing

From Dev

Split list of tuples where second elment is greater than threshold

From Dev

How to extract only values greater than a threshold from a file?

From Dev

Split list of tuples where second elment is greater than threshold

From Dev

SQL Command return user greater than threshold value

From Dev

How to extract only values greater than a threshold from a file?

From Dev

Can I speed up a large data set operation in SQLite / Python?

From Dev

How to set up charging threshold in ubuntu 14.04

From Dev

Speed up code in python

From Dev

Python - speed up pathfinding

From Dev

Python : How to find in set of rows, one element is less than 1 and other is greater than 1?

From Dev

Python : How to find in set of rows, one element is less than 1 and other is greater than 1?

From Dev

Filtering the lines which are greater than a percentage value

From Dev

Calculate percentile per set interval in python array

Related Related

  1. 1

    Python - speed up finding percentile of set which is greater than threshold

  2. 2

    Finding first samples greater than a threshold value efficiently in Python (and MATLAB comparison)

  3. 3

    Python: Find largest array index along a specific dimension which is greater than a threshold

  4. 4

    MATLAB: how to find indices of cells which have length greater than threshold?

  5. 5

    Count values which are different from NA and discard column if result is greater than threshold

  6. 6

    MATLAB: how to find indices of cells which have length greater than threshold?

  7. 7

    How to parse a line which has a number after a certain string greater than a threshold?

  8. 8

    numpy.argmin for elements greater than a threshold

  9. 9

    MySQL select records with sum greater than threshold

  10. 10

    Substitute those values greater than the 95% percentile with NA

  11. 11

    Finding a number greater than x in a range

  12. 12

    finding prime numbers greater than 10^12

  13. 13

    finding prime numbers greater than 10^12

  14. 14

    Finding values greater than * in a map list

  15. 15

    Finding values in elements of a list greater than X

  16. 16

    Method for finding strictly greater-than-zero solution for Python's Scipy Linear Programing

  17. 17

    Split list of tuples where second elment is greater than threshold

  18. 18

    How to extract only values greater than a threshold from a file?

  19. 19

    Split list of tuples where second elment is greater than threshold

  20. 20

    SQL Command return user greater than threshold value

  21. 21

    How to extract only values greater than a threshold from a file?

  22. 22

    Can I speed up a large data set operation in SQLite / Python?

  23. 23

    How to set up charging threshold in ubuntu 14.04

  24. 24

    Speed up code in python

  25. 25

    Python - speed up pathfinding

  26. 26

    Python : How to find in set of rows, one element is less than 1 and other is greater than 1?

  27. 27

    Python : How to find in set of rows, one element is less than 1 and other is greater than 1?

  28. 28

    Filtering the lines which are greater than a percentage value

  29. 29

    Calculate percentile per set interval in python array

HotTag

Archive