I recently ran a Whole School Mailing List in Query Builder and discovered an unusual anomaly. There were two students in my system with two entries on the returned query. Both students have a complete record (with all personal information and family attachments) and a record that only contains their name. When I search my list of users, though (in User Admin or student profiles), I only see one entry for each student.
Is there a way for me to clear this up? I believe the error came about from an extra user profile that got created through the application process and deleted improperly, but I’m only guessing. It’s not causing a problem right now, but I’m concerned that something will get messed up down the road.
Hi Kevin, it does sound like some possible orphaned records in your database. But, in a query, it’s easy to end up with duplicates listed based on how table joins work, even if there is really only one copy of the record. So, to find the cause you’d need to do a bit of database investigation. One way to do this is to modify the query so that it includes the gibbonPersonID column, as this would tell you if there is truly more than one user record. If there isn’t, you could then check that gibbonPersonID on tables like gibbonStudentEnrolment, to see if there is perhaps a duplicate there. Hope this helps!
I was able to confirm that both students showing up twice in my query each have a single gibbonPersonID, so good news there. I was also able to pull them up and isolate their records on the table gibbonStudentEnrolment, but I didn’t see anything that would explain the two records in the query. The student who had been enrolled in our programs for two years had two entries for gibbonStudentEnrolmentID, while the student who was new only had one entry. Nothing else about the data in the table seemed off to me, but I will admit it gets a little confusing at times.
What other tables should I check? (You mentioned tables like gibbonStudentEnrolment, but I’m not sure which others to explore.) If I find something that’s off, how do I decide what to delete so I don’t break the database further?
Duplicates often happen in a query if you’re joining tables with a “one to many” relationship rather than a “one to one.” In this case, the lines from the first table will be multiplied by the lines in the second table when the query set is returned. This is the default behavior for table joins and often the desired result. However, to solve for this, you’ll likely want to use a GROUP BY statement, such as GROUP BY gibbonPerson.gibbonPersonID to return only one line per person. In this case, the student doesn’t actually exist in the database twice, the query is returning them twice, in context to the tables that have been queries (joining tables often multiplies the results).
Use grouping carefully though, as grouping things means that the results in the “one to many” relationship get flattened down into a single record, which can cause unexpected results. In the case of student enrollment, if you have gibbonPerson and join gibbonStudentEnrolment to it, there could be multiple enrollments per person. If you GROUP BY gibbonPerson.gibbonPersonID, it will flatten the gibbonStudentEnrolment results, but not necessarily in and specific order.