While trying to overlay pdf (probability density function) values on a histogram I face significant scaling issues as the histogram is barely visible on my chart. This might be due to scaling factors used in the below code, otherwise the density curves would be tiny. I am in a predicament and thus wondering if there is a better way to achieve this task without resorting to adhoc scaling factor?
bmin=min(b);
bmx=max(b);
nrow=length(b);
nbin=sqrt(nrow);
pd = fitdist(b,'Normal');
p = stblfit(b,'ecf');
x_pdf=[bmin:0.0025:bmax];
y=pdf(pd,x_pdf);
hist(b,nrow);
h = findobj(gca, 'Type','patch');
h.FaceColor=[0 0 0];
hold on;
scale = 0.156*max(y);
plot(x_pdf,y.*scale,'or');
hold on;
scale2 = 0.24*max(y);
plot(x_pdf,stblpdf(x_pdf,p(1),p(2),p(3),p(4)).*scale2,'k-');
legend('P&L distribution','Normal fit', 'ecf fit')
Thanks
As you are plotting the histogram, the y-axis represents the number of counts for a specific interval. If you have a longer or shorter input vector, the values of the histogram will be very different. The histogram can be used as an approximation to the probability density (PDF), but for that you need to scale it correctly. The integral of a PDF from -infinity to +infinity has to result in 1, so we need to scale the histogram accordingly.
You can still use the hist
command, but instead of using it to generate the histogram plot, we get the count values from it. Then this vector can be scaled to have an integral of 1 by simply calculating the integral and dividing the vector by that.
% Generate some arbitrary gaussian distribution
b = randi(10) + randi(10) .* randn(10000,1);
bmin = min(b);
bmax = max(b);
% Calculate histogram
[counts,bins] = hist(b,100);
% Scale histogram to get the pdf
est_pdf = counts / sum(counts * mean(diff(bins)));
% Estimate pdf using fitdist
pd = fitdist(b,'Normal');
x_pdf = linspace(bmin,bmax,1000);
y_pdf = pdf(pd,x_pdf);
% Plot everything
figure;
hold on;
bar(bins,est_pdf);
plot(x_pdf, y_pdf, '-r');
hold off;
Note: I calculate the integral of counts by multiplying the counts
with the mean interval width of the histogram, as not all intervals are exactly equally wide. This is an approximation of the integral, but it should be exact enough.
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments