I really need help with this one and couldn't find related answers after hours of searching.
MySQL, Entity Framework 6, database with a few millions of records, record looks like:
Indexint(11) NOT NULL
TaskIDint(11) NOT NULL
DeviceIDbigint(20) NOT NULL
Commentslongtext NULL
ExtendedResultslongtext NULL
RunResultint(11) NOT NULL
JobResultint(11) NOT NULL
JobResultValuedouble NOT NULL
ReporterIDbigint(20) NOT NULL
FieldIDbigint(20) NOT NULL
TimeOfRundatetime NOT NULL
What I need is to get all the records for a specific taskID, then group by DeviceID and sort by TimeOfRun in order to get the latest data for each deviceID in a specific taskID.
This is my code:
List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).
Select(x => x.OrderByDescending(y => y.TimeOfRun).FirstOrDefault()).ToList();
But this is the generated query:
{SELECT
`Apply1`.`Index`,
`Apply1`.`TaskID`,
`Apply1`.`DEVICEID1` AS `DeviceID`,
`Apply1`.`RunResult`,
`Apply1`.`JobResult`,
`Apply1`.`JobResultValue`,
`Apply1`.`ExtendedResults`,
`Apply1`.`Comments`,
`Apply1`.`ReporterID`,
`Apply1`.`FieldID`,
`Apply1`.`TimeOfRun`
FROM (SELECT
`Project2`.`p__linq__0`,
`Project2`.`DeviceID`,
(SELECT
`Project3`.`Index`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Index`,
(SELECT
`Project3`.`TaskID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TaskID`,
(SELECT
`Project3`.`DeviceID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `DEVICEID1`,
(SELECT
`Project3`.`RunResult`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `RunResult`,
(SELECT
`Project3`.`JobResult`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResult`,
(SELECT
`Project3`.`JobResultValue`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `JobResultValue`,
(SELECT
`Project3`.`ExtendedResults`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ExtendedResults`,
(SELECT
`Project3`.`Comments`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `Comments`,
(SELECT
`Project3`.`ReporterID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `ReporterID`,
(SELECT
`Project3`.`FieldID`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `FieldID`,
(SELECT
`Project3`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Project3`
WHERE (`Project3`.`TaskID` = @p__linq__0) AND (`Project2`.`DeviceID` = `Project3`.`DeviceID`)
ORDER BY
`Project3`.`TimeOfRun` DESC LIMIT 1) AS `TimeOfRun`
FROM (SELECT
@p__linq__0 AS `p__linq__0`,
`Distinct1`.`DeviceID`
FROM (SELECT DISTINCT
`Extent1`.`DeviceID`
FROM `JobsRecordHistories` AS `Extent1`
WHERE `Extent1`.`TaskID` = @p__linq__0) AS `Distinct1`) AS `Project2`) AS `Apply1`}
Which takes way too long.
I don't know SQL good enough, I admit, but if insert a ToList() after the WHERE statement, then I get the results much quicker, though it's still not the right thing to do since there's a lot of un-needed data that the database passes to my app in this situation, and it's still slow = 30 seconds for 40k records.
I also tried this:
Dictionary<long, DateTime> DeviceIDAndTime = db.JobsRecordHistories.AsNoTracking().Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID)
.Select(g => new DeviceIDaAndTime { deviceID = g.Key, timeOfRun = g.Max(gi => gi.TimeOfRun) }).ToDictionary(x => x.deviceID, x => x.timeOfRun);
In order to use the dictionary this way:
List<JobsRecordHistory> newH = db.JobsRecordHistories.AsNoTracking().Where(x => DeviceIDAndTime.Keys.Contains(x.DeviceID) && x.TimeOfRun == DeviceIDAndTime[x.DeviceID]).ToList();
But I get this error:
Additional information: LINQ to Entities does not recognize the method 'System.DateTime get_Item(Int64)' method, and this method cannot be translated into a store expression.
Which makes sense cause from what I understand, when comparing the timeOfRun to the dictionary value, LINQ needs a specific value and not a collection when composing the query.
It's weird to me that I didn't find any related post and that other people didn't encounter this problem. I guess I missed something.
Appreciate any help, Thanks
Finally figured it out and improved performance.
I needed a query and a sub query, And I needed the MAX function instead of ORDER because I don't care about the order of the results, I only care about the biggest (timeOfRun) one.
Additionally, things were simplified once I noticed that bigger Index column (my PK, auto increment) means more recent data, so I didn't need MAX(timeOfRun), instead I used MAX(Index), though I'm quite sure it would have worked the same way.
This is my LINQ:
var historyQuery = db.JobsRecordHistories.AsNoTracking().Where(y => y.TaskID == taskID &&
db.JobsRecordHistories.Where(x => x.TaskID == taskID).GroupBy(x => x.DeviceID).Select(g => g.Max(i => i.Index)).Contains<int>(y.Index));
And this is the generated SQL:
{SELECT
`Extent1`.`Index`,
`Extent1`.`TaskID`,
`Extent1`.`DeviceID`,
`Extent1`.`RunResult`,
`Extent1`.`JobResult`,
`Extent1`.`JobResultValue`,
`Extent1`.`ExtendedResults`,
`Extent1`.`Comments`,
`Extent1`.`ReporterID`,
`Extent1`.`FieldID`,
`Extent1`.`TimeOfRun`
FROM `JobsRecordHistories` AS `Extent1`
WHERE (`Extent1`.`TaskID` = @p__linq__0) AND (EXISTS(SELECT
1 AS `C1`
FROM (SELECT
`Extent2`.`DeviceID` AS `K1`,
MAX(`Extent2`.`Index`) AS `A1`
FROM `JobsRecordHistories` AS `Extent2`
WHERE `Extent2`.`TaskID` = @p__linq__1
GROUP BY
`Extent2`.`DeviceID`) AS `GroupBy1`
WHERE `GroupBy1`.`A1` = `Extent1`.`Index`))}
I hope this will help somebody since It took me 1.5 days of googling, looking at SQL queries, LINQ, debugging and optimizing
Collected from the Internet
Please contact [email protected] to delete if infringement.
Comments