I have the following MySQL table setup that logs site visits:
id timestamp ip tracking
The purpose of this tracking is various forwarded domains are sent to this one site with this script running. The script logs the ip and timestamp, and grabs the tracking code from the URL.
What we are tying to do is have a rollup count of all tracking codes used, however, there are a LOT of spam requests (bots) hitting the site. I'm trying to figure out the best way to filter out results that I think are bots, I don't want them counted in the final results.
My initial thought was to first filter out all ips that occur more than once. My problem is, how do I then use the results of that query to go back and count the tracking codes?
My code to filter ips that only show once is:
SELECT tracking, ip, COUNT( * )
FROM tracking
GROUP BY ip
HAVING COUNT( * ) =1
ORDER BY COUNT( * ) DESC
How do I then take those results and run another query to then count and sum up the tracking codes?
-Kevin
EDIT:
Sorry, first post here and I rushed a little. In the end, what I'm looking for is to get the count of all the tracking codes used.
Let's assume I have the following table data:
id timestamp ip tracking
-- --------- -- --------
1 2014-01-10 23:43:10 192.168.1.1 100
2 2014-01-10 23:43:10 192.168.1.1 200
3 2014-01-10 23:43:10 192.168.1.2 100
4 2014-01-10 23:43:10 192.168.1.1 999
5 2014-01-10 23:43:10 192.168.1.1 100
6 2014-01-12 23:43:10 192.168.1.1 100
7 2014-01-12 23:43:10 192.168.1.3 100
8 2014-01-12 23:43:10 192.168.1.4 100
9 2014-01-12 23:43:10 192.168.1.5 600
10 2014-01-12 23:43:10 192.168.1.1 888
11 2014-01-12 23:43:10 192.168.1.1 888
12 2014-01-12 23:43:10 192.168.1.8 200
13 2014-01-12 23:43:10 192.168.1.9 300
14 2014-01-12 23:43:10 192.168.1.10 100
15 2014-01-12 23:43:10 192.168.1.11 400
16 2014-01-12 23:43:10 192.168.1.1 888
17 2014-01-12 23:43:10 192.168.1.12 200
18 2014-01-12 23:43:10 192.168.1.2 777
19 2014-01-12 23:43:10 192.168.1.2 100
20 2014-01-12 23:43:10 192.168.1.1 200
21 2014-01-12 23:43:10 192.168.1.4 789
In the end I want to display a count of all tracking codes used, but to ignore any rows where an ip address looks to be from a bot. Because of the nature of this setup, we kind of assume that ip addresses would only hit the site once, maybe twice. So I figure, if I can get the count of tracking codes, excluding any row where the ip address is greater than 1 (or maybe 2).
So the final result from that data set would be
tracking count
-------- -----
100 3
200 2
300 1
400 1
600 1
789 1
Basically from the results we are not counting anything from 192.168.1.1 and 192.168.1.2 because those ips visited more than 1 time.
EDIT - I added a row #21 to have one of the IPs visit twice, therefore both of their visits should count if we are using <3 in the query. It looks like the below answer isn't working correctly. When I add the row #21, the code 789 doesn't get counted
Hope this helps understand it better?
I know how to get the overall count of either ips or tracking, but I can't figure out how to put the two together in one query.
-Kevin
EDIT 2/4/14 - So I what I think is happening is the query below is only counting the tracking code of the first instance of the IP. So in the case let's change the table to have a better set of data
id timestamp ip tracking
-- --------- -- --------
1 2014-01-10 23:43:10 192.168.1.1 100
2 2014-01-10 23:43:10 192.168.1.222 100
3 2014-01-10 23:43:10 192.168.1.1 200
4 2014-01-10 23:43:10 192.168.1.2 100
5 2014-01-10 23:43:10 192.168.1.1 999
6 2014-01-12 23:43:10 192.168.1.1 100
7 2014-01-12 23:43:10 192.168.1.2 100
8 2014-01-12 23:43:10 192.168.1.3 100
9 2014-01-12 23:43:10 192.168.1.4 100
10 2014-01-12 23:43:10 192.168.1.5 600
11 2014-01-12 23:43:10 192.168.1.1 888
12 2014-01-12 23:43:10 192.168.1.1 888
13 2014-01-12 23:43:10 192.168.1.8 200
14 2014-01-12 23:43:10 192.168.1.9 300
15 2014-01-12 23:43:10 192.168.1.10 100
16 2014-01-12 23:43:10 192.168.1.11 400
17 2014-01-12 23:43:10 192.168.1.1 888
18 2014-01-12 23:43:10 192.168.1.12 200
19 2014-01-12 23:43:10 192.168.1.222 777
20 2014-01-12 23:43:10 192.168.1.2 100
21 2014-01-12 23:43:10 192.168.1.1 200
22 2014-01-12 23:43:10 192.168.1.4 789
In this case, I would want the query to be where any IP appears 2 or less times. So the results SHOULD be:
tracking count
-------- -----
100 4
200 2
300 1
400 1
600 1
777 1
789 1
Basically, 192.168.1.1 and .2 are the only ones that appear more than 2 times, so they should be excluded. Some IPs, like .4 and .222 appear twice, which is fine, but each time they use a different code.
Using the query below:
select xyz.tracking,count(xyz.tracking) as `count` from (select ip,count(ip),tracking from tracking group by ip having count(ip)<3) xyz group by xyz.tracking;
I seems to only pick up the code for the first instance of each IP. So the results I get are:
tracking count
-------- -----
100 4
200 2
300 1
400 1
600 1
So in this case it's picking up the code 100 for IP .222 but not the code 777 for IP .222 It's picking up code 100 for IP .4 but not the code 789 for IP .4.
Anyone have any ideas how to resolve this?
EDIT: So I think I have a solution. It's returning the correct values. Can someone verify?
SELECT t.tracking, count(t.tracking) as COUNT FROM tracking t
JOIN (
SELECT s.ip, count(s.ip) FROM tracking s GROUP BY s.ip HAVING COUNT(s.ip)<=2) d
ON d.ip = t.ip
GROUP BY t.tracking
I believe I found the answer in case anyone else needs a query like this.
SELECT t.tracking, count(t.tracking) as COUNT FROM tracking t
JOIN (
SELECT s.ip, count(s.ip) FROM tracking s GROUP BY s.ip HAVING COUNT(s.ip)<=2) d
ON d.ip = t.ip
GROUP BY t.tracking
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments