-
-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(#9601): prototype duplicate prevention #9609
base: master
Are you sure you want to change the base?
fix(#9601): prototype duplicate prevention #9609
Conversation
48d32cc
to
e266863
Compare
webapp/src/ts/main.ts
Outdated
queryParams: { | ||
valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'], | ||
// eslint-disable-next-line eqeqeq | ||
query: (duplicate, action) => duplicate === 'yes' && action != null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester @ChinHairSaintClair My feeling here is that this would need to end up with some kind of operator syntax support that lets the user do logic, maybe something like (though doesn't have to be) JsonLogic: https://jsonlogic.com/
The way I see it there are two point of config -- the excel files and the configuration json -- and probably neither should take serialised JS (?).
If we return the key => { key, value }
as a tuple or object on key lookup, we could let the user express logic maybe like this (with the value found at the key tested against the value provided (with
) using the operator):
let andOnly = {
logic: [
[
// array bracket denotes a logical grouping
{
key1: { with: value, op: $in },
key2: { with: value, op: $contains },
key3: { with: value, op: $eq },
},
{
key1: { with: value, op: $in },
key2: { with: value, op: $ne },
key3: { with: value, op: $startsWith },
},
],
],
};
let orWithNestedAnds = {
logic: [
[
{
key1: { with: value, op: $in },
},
{
key2: { with: value, op: $eq },
},
],
// each grouping in its own bracket
[
{
key1: { with: value, op: $in },
},
{
key2: { with: value, op: $in },
},
],
],
};
let orWithNestedOrWithNestedAnds = {
logic: [
[
{
key1: { with: value, op: $in },
},
{
key2: { with: value, op: $in },
},
],
// OR
[
[
// nested groupings acceptable
{
key1: { with: value, op: $ne },
},
// AND
{
key2: { with: value, op: $ne },
},
],
// OR
[
{
key1: { with: value, op: $contains },
},
// AND
{
key1: { with: value, op: $startsWith },
},
],
],
],
};
The other stuff seems easier to move to config.
webapp/src/ts/polyfills.ts
Outdated
@@ -88,7 +88,14 @@ declare global { | |||
CHTCore: any; | |||
angular: any; | |||
EnketoForm:any; | |||
_phdcChanges: { // Additional namespace | |||
// Specify your own contact_types here | |||
hierarchyDuplicatePrevention: Partial<{[key in 'person' | 'health_center']: Strategy;}>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester @ChinHairSaintClair Following on from my above comment:
The way I see it there are two point of config -- the excel files and the configuration json.
We probably couldn't expect the user to do config here. I agree the type safety is lovely and I'm not sure whether it's possible to generate a type from the config, but if not I don't see how we can keep the type safety and still maintain the existing configuration contract.
@@ -299,6 +376,152 @@ export class ContactsEditComponent implements OnInit, OnDestroy, AfterViewInit { | |||
}; | |||
} | |||
|
|||
private parseXmlForm(form): Document | undefined { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move as much of this sort of stuff as is possible to a lib/file so it can be unit tested easily and then just called internally. Not sure if private
is needed here.
(I know this was just a conceptual prototype not productionised, but for forward looking feedback.)
} | ||
} | ||
|
||
return count > 0 ? totalScore / count : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the alternative return type need to be null
or is there an appropriate default value of type int
?
} | ||
|
||
// Promise.allSettled is not available due to the app's javascript version | ||
private allSettledFallback(promises: Promise<Exclude<any, null | undefined>>[]): Promise<ReturnType[]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester @ChinHairSaintClair Would promise allSettled
not be transpiled to a target version?
const $duplicateInfoElement = $('#contact-form').find('#duplicate_info'); | ||
$duplicateInfoElement.empty(); // Remove all child nodes | ||
$duplicateInfoElement.show(); | ||
// TODO: create a template component where these values are fed into. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need DOM mutation? Can the angular template not handle it?
count++; | ||
} | ||
$duplicateInfoElement.append(content); | ||
$duplicateInfoElement.on('click', '.duplicate-navigate-link', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No anon functions on event listeners please. They don't get properly cleaned up by the browser. They need to be named.
webapp/src/ts/polyfills.ts
Outdated
|
||
export const NormalizedLevenshtein: Strategy = { | ||
type: 'NormalizedLevenshtein', | ||
threshold: 0.334, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably want to set this in config
}; | ||
formService.render.resolves(form); | ||
|
||
await createComponent(); | ||
await fixture.whenStable(); | ||
|
||
formService.saveContact.resolves({ docId: 'new_clinic_id' }); | ||
|
||
// TODO: figure out why this test's dbLookupRef is null despite being set in the beforeEach | ||
component.dbLookupRef = Promise.resolve({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want sinon.resolves
webapp/src/ts/main.ts
Outdated
{form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'}, | ||
{form_prop_path: '/data/health_center/external_id', db_doc_ref: 'external_id'} | ||
], | ||
queryParams: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe formQuestion
? Name is still bothering me.
webapp/src/ts/main.ts
Outdated
health_center: { | ||
...Levenshtein, | ||
props: [ | ||
{form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase is the js standard.
webapp/src/ts/main.ts
Outdated
], | ||
queryParams: { | ||
valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'], | ||
// eslint-disable-next-line eqeqeq |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkuester Continuing to lobby for !=
as the most excellent exception to the strict equality lint rule. Checks for both null
and undefined
.
e266863
to
c9e1ddf
Compare
5442b21
to
9eb471f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, this is amazing! I think we are 100% on the right track here and I am excited to get this functionality released and in the hands of users.
I have not had a chance to look at all the changes yet, but I wanted to go ahead and share my comments so far (mostly focused on the internal service stuff and have not looked much yet at the UI code). Mostly just a bunch of minor questions/suggestions.
I provided some suggestions regarding the Sonar complaints for hasOwnProperty
, but I have not had a chance to dig into the circular navigation issue yet.
Thank you for the great PR and the patience to work with us and get the across the finish line!
(FYI, I am following conventional comments for my PR comments, so that is where the comment prefixes are coming from...)
@@ -31,4 +32,12 @@ export class XmlFormsContextUtilsService { | |||
return this.getDateDiff(contact, 'years'); | |||
} | |||
|
|||
levenshteinEq(threshold: number, current: string, existing: string){ | |||
return current && existing ? levenshteinEq(current, existing) < threshold : false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: should this be checking if the distance is less than the threshold or less than or equal to the threshold? If current
and existing
have a distance of 3
and I call levenshteinEq(3, current, existing)
, I would expect it to return true
. However, with the way you have it now, it would return false
since the threshold is exclusive.
I am fine leaving it how it is if that makes more sense to you (maybe I am just weird to expect the threshold to be inclusive....) Either way, we can be sure to document the behavior in the docs to make sure folks know what to expect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Originally, I thought of the distance calculation and threshold as defining a "range" (0–3), where the upper bound is typically exclusive in programming.
I can see how that might be confusing, so I've updated the code to make the threshold inclusive. Good catch!
@@ -31,4 +32,12 @@ export class XmlFormsContextUtilsService { | |||
return this.getDateDiff(contact, 'years'); | |||
} | |||
|
|||
levenshteinEq(threshold: number, current: string, existing: string){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: in your experience, is there a threshold
value that would make a sensible default? If so, maybe we should move threshold
to be the last parameter, so we can bake in a default value here (but one that can still be overridden when desired).
I have not done enough work with levenshtein in practice to understand what a good value would be. I assume it is highly dependent on the domain of the value being measured (e.g. the threshold for names might be different from a threshold for address strings). Maybe we pick a default value that is good for comparing names?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see now that you used 3
as the threshold for the default name comparison. Does it make sense to just use that as a general default here for these functions? I just expect that other app devs are going to struggle to know where to start with a threshold value and it seems convenient to provide a default. Then they can customize up or down as necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved the threshold parameter as suggested. You raise a good point. While I’m hesitant to define a universal threshold since it depends heavily on the domain, as noted, a default of 3
seems reasonable for names based on:
- Name length considerations – Shorter names (e.g., Amy) tolerate fewer edits than longer names (e.g., Christopher).
- Common typos and variations – A threshold of
3
allows for small spelling differences (Jon vs. John) without excessive false positives. - Phonetic similarity – Names with different but phonetically similar spellings (Katherine vs. Catherine) would still be matched.
A stricter threshold (2
) could be useful for high-precision matching as data normalizes over time, whereas a threshold of 4+
might introduce too many false positives.
One thing to keep in mind is that these considerations are based primarily on English names. If we want a more general approach across different languages, we may need to revisit this in the future.
That said, I don’t have extensive experience working with Levenshtein distance, so I’m open to adjustments if we find a better general default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 My vote is we start with 3 as the default and we can adjust later if needed.
normalizedLevenshteinEq, | ||
levenshteinEq, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: IMHO these functions can just live in the xml-forms-context-utils.service
since that is the only place they are being used.
const siblings = await requestSiblings(this.dbService, parentId, contactType); | ||
const expression = extractExpression(duplicateCheck); | ||
const isCanonical = doc.is_canonical ? doc.is_canonical === 'true' : false; | ||
acknowledged = acknowledged ?? false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question: I had expected when I checked "Acknowledge" in the form that it would set is_canonical
to true
on my contact doc. That way, if I go back and edit the same contact later, it does not hit me with the dupe error again since I already confirmed the contact was real. How were you planning to have is_canonical
get set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these are two separate concepts.
- Acknowledge: This allows the user to review and compare possible duplicates, then proceed with saving the record—knowing a duplicate may exist—without blocking progress. It does not change how future duplicate checks behave.
is_canonical
: This explicitly marks the contact as the true record, signalling to the backend that this is the authoritative version and that other duplicate records should be dealt with (either merged, deleted, or archived).
Acknowledgment is available to all users and primarily serves to let them continue working despite potential duplicates. In contrast, is_canonical
is an opt-in mechanism that bypasses duplicate checks entirely for that record.
While we're on the subject, have we decided on a prefix standard for xlsx questions? I have not seen any additional comments on the forum thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, so conceptually your distinction here makes sense. Can you help me understand a bit more about the is_canonical
workflow? How will the value get set? Who has permission to mark a contact as canonical? If two contacts match as duplicates, but are both real people (e.g. a father and son with the exact same name) should both be marked as is_canonical
?
The main reason I expected the "Acknowledge" to set is_canonical
was because I was hoping to avoid the user being warned about duplicates multiple times for the same contact. (Each time the contact was edited it would trigger the same dupe warnings if is_canonical
was not set....)
Regarding the prefix standard, I have bumped my forum question to some of the other devs that have done a lot of CHT work.
This comment was marked as outdated.
This comment was marked as outdated.
Sorry, something went wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After thinking through it again we've made some adjustments. Please note that some of this still needs to be developed.
Who Sets is_canonical
?
CHWs, who have the most contextual knowledge, are responsible for marking records as canonical.
Captured in the following ways:
- No duplicate found during submission → Set
is_canonical=true
automatically on save. - Duplicate found → Prompt the user to acknowledge that the current record is canonical before proceeding.
- Editing an existing record → Display the current
is_canonical
status (if any) and allow the user to reset it. Additionally, provide a "Mark for Deletion" option for records that need cleanup (particularly useful for users without delete permissions).
For scenario 3, consider a user reviewing records after this feature is introduced. They may initially mark a record as canonical but later decide to investigate further. The ability to reset the status allows for flexibility. Meanwhile, marking a record for deletion signals that it should be removed from the system.
Duplicate Checks and Warnings
The goal is to eventually remove non-canonical records, so they no longer trigger duplicate warnings or remain available for editing.
Until then, we should not suppress duplicate warnings. Doing so would prevent users from revisiting and resolving potential issues later.
- For non-canonical records → Duplicates will continue to be displayed as usual.
- For canonical records → Introduce a "Check for Possible Duplicates" button to allow users to manually trigger a duplicate check. Useful if you want to fact check as mentioned in the previous section.
Handling Similar but Distinct Contacts (e.g., Father and Son with the Same Name)
Even if two contacts appear similar, they should both be marked as is_canonical
if they are distinct. For instance, in the father-son scenario, a key differentiator like date of birth should be used. Implementers may need to fine-tune duplicate detection criteria to reduce false positives.
Detail on duplicate items
We're of the opinion that duplicate items should contain enough data, not all, to defer navigating until absolutely necessary - as losing the current form data could cause frustration.
If we don't give the user enough info ahead of time, we're worried users might feel forced to submit incorrect data just to bypass the duplicate screen and not lose form data. Leading to bad records in the system.
The is_canonical
property should be displayed in the duplicate summary.
What are your thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the additional context here. This is super helpful!
For instance, in the father-son scenario, a key differentiator like date
of birth should be used. Implementers may need to fine-tune duplicate
detection criteria to reduce false positives.
This statement has got me to thinking that perhaps we do not actually need embedded CHT logic for handling false positives. Instead, I think most of your needed is_canonical
workflows could be supported by custom config. This would give you (and other projects) total control over the specifics of the is_canonical
logic without adding additional data-model overhead in the CHT to other projects that do not need it.
You can already include custom logic in your contact
forms for adding/editing an is_canonical
field on your contact docs. Then, in your custom duplicate_check.expression
logic, you can always include logic for checking the is_canonical
field. This would let you avoid duplicate warnings on a contact marked as is_canonical
. For these workflows, I don't see a clear need/benefit to have extra functionality built into the CHT.
Does this sound reasonable? Am I missing a workflow here that requires some kind of specific support via CHT logic?
Additionally, provide a "Mark for Deletion" option for records that need
cleanup (particularly useful for users without delete permissions).
The same goes for this logic. I guess your current contact
forms can support marking contacts for review/deletion, right? (Thinking that this would basically just be a custom field set on contact docs that your custom server-side scripting could detect.)
There is definitely a lot more possible functionality that could be built into the CHT around this to make managing contacts more reasonable and less of a manual/custom process, but ultimately, that probably needs its own project and dedicated design discussions... (One idea that would be interesting to pursue in a future project would be a Sentinel transition that could perhaps be responsible for flagging duplicate contacts. However, I think it is important to recognize that detecting/handling duplicate contacts on the server is somewhat distinct from the main goal of this PR, which is simply to reduce the number of duplicates contacts that get created in the first place (by warning the user)).
Detail on duplicate items
I really appreciate all the great discussion on this! Forcing the user to navigate away from the form and loose everything they entered just to discover that the contact is not actually a duplicate is a big issue. Lets definitely continue the conversation in the forum thread and I am confident we can find the optimal balance between data and simplicity for this MVP!
async checkForDuplicates(doc, duplicateCheck, acknowledged) { | ||
const parentId = doc ? doc.parent?._id : undefined; | ||
const contactType = doc ? doc.contact_type ?? doc.type : undefined; | ||
const siblings = await requestSiblings(this.dbService, parentId, contactType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
issue: we should definitely move this down after the if-check below since we do not want to query the contact siblings unless we are really going to be doing the dupe check.
if (!isCanonical && expression && !acknowledged){ | ||
const duplicates = getDuplicates( | ||
doc, | ||
siblings, | ||
{ | ||
expression, | ||
parseProvider: this.parseProvider, | ||
xmlFormsContextUtilsService: this.xmlFormsDuplicateUtilsService | ||
} | ||
); | ||
return duplicates; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: we can simplify/flatten this a bit by reversing the if-check:
if (!isCanonical && expression && !acknowledged){ | |
const duplicates = getDuplicates( | |
doc, | |
siblings, | |
{ | |
expression, | |
parseProvider: this.parseProvider, | |
xmlFormsContextUtilsService: this.xmlFormsDuplicateUtilsService | |
} | |
); | |
return duplicates; | |
} | |
if (isCanonical || !expression || acknowledged) { | |
return []; | |
} | |
return getDuplicates( | |
doc, | |
siblings, | |
{ | |
expression, | |
parseProvider: this.parseProvider, | |
xmlFormsContextUtilsService: this.xmlFormsDuplicateUtilsService | |
} | |
); |
duplicateCheck, | ||
acknowledged | ||
); | ||
if (duplicates && duplicates.length > 0){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: if we keep the return type for checkForDuplicates
consistent, then we can simplify this to just:
if (duplicates && duplicates.length > 0){ | |
if (duplicates.length){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I made it through the rest of the files. Added a few more comments, but all in all this is good stuff! I really appreciate the quality unit tests!
I am going to make a post out on the forum that demos the current "Duplicates Found" UI we have here. The idea would be to maybe crowd-source some UX design in case other folks have input on the look/feel/function.
Finally, before we merge this code (but probably after we finalize the UX) we will need to add some e2e tests. Since there are a lot of moving parts here (config + db docs + forms), we should probably be thorough in validating the main flows. My current thought is to add some new tests cases to tests/e2e/default/contacts/edit.wdio-spec.js
and maybe create a new tests/e2e/default/contacts/create.wdio-spec.js
.
this.router.navigate(['/contacts', _id, 'edit']); | ||
} | ||
|
||
duplicates: Duplicate[] = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: WebStorm is complaining to me that the reported_date: number
value for the Duplicate
interface does not match the reported_date: string | Date
value for the DuplicateInfoComponent.duplicates
property. I don't think it is a functional issue probably since JS can auto-box a number into a string, but I wonder if we should be type-matching ContactsEditComponent.duplicates
and DuplicateInfoComponent.duplicates
?
extractExpression, | ||
DEFAULT_CONTACT_DUPLICATE_EXPRESSION, | ||
getDuplicates, | ||
} from '../../../../../src/ts/services/utils/deduplicate'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: can shorten with an alias
} from '@mm-services/utils/deduplicate';
}); | ||
}); | ||
|
||
describe('extractExpression', () => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: can you also include test cases for:
- When the
duplicateCheck
param is an object with anexpression
string - When the
duplicateCheck
param is an object with anexpression
string anddisabled: true
- When the
duplicateCheck
param is an object with noexpression
ordisabled
properties (make sure the default expression is returned)
if (duplicateCheck != null) { | ||
if (Object.prototype.hasOwnProperty.call(duplicateCheck, 'expression')) { | ||
return duplicateCheck.expression as string; | ||
} else if (Object.prototype.hasOwnProperty.call(duplicateCheck, 'disabled') && duplicateCheck.disabled) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at this again, I think we should probably do the disabled
check first, before the expression
check. If I was writing some properties for a form and I added disabled: true
, I would expect it to skip the duplicate check even if I also had expression
configured.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@@ -193,6 +203,8 @@ describe('Form service', () => { | |||
{ provide: ExtractLineageService, useValue: extractLineageService }, | |||
{ provide: TargetAggregatesService, useValue: targetAggregatesService }, | |||
{ provide: ContactViewModelGeneratorService, useValue: contactViewModelGeneratorService }, | |||
{ provide: ParseProvider, useValue: parserProvider}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: If we do make the deduplicate.ts
into its own service, it should simplify these tests since we can just mock out the service instead of needing to leak its dependencies in here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: can we get a couple unit tests in here around calling the onNavigateToDuplicate
and onAcknowledgeChange
functions? And a test for the save
flow where calling formService.saveContact
throws a DuplicatesFoundError
?
Okay, I created a forum post demoing some of the functionality we are building here. Would love any UX feedback from your team on the look/feel/functionality! 🙏 |
Description
This feature prevents duplicate hierarchy contact siblings from being created, as discussed with @jkuester and @mrjones-plip in a "technical working session" and through interactions on the linked issue thread.
To achieve this, we hook into duplicate detection strategies through "configuration", updated the
saveContact
method in theform.service.ts
file to include an additional check, and display potential duplicates in aduplicate_info
section added to theenketo.component.html
file. Each duplicate is displayed as a card with expandible/collapsible segments, accompanied by an "acknowledgement" prompt to allow form submission.Configuration:
Currently two strategies are supported:
levenshteinEq
andnormalizedLevenshteinEq
, with the ability to customize properties based on implementation needs.Example implementation:
Here, the
duplicate_check.expression
defines the logic for comparing the current record with its sibling. If noduplicate_check
is provided, the system defaults to evaluating thename
field. E.g:street_number
,street_name
, andpostal_code
:name
,sex
, anddate_of_birth
:In these expressions,
current
refers to the created/edited form, whileexisting
refers to a "sibling" loaded from the database.Conditional Duplicate Check:
This is a key requirement. For now, the
is_canonical
form question will be used to control conditional duplicate checking. When the backend flags a record as a duplicate, the CHW can mark the record accordingly. Downstream it will allow us tomerge
ordelete
the record based on the specified action.Opt-out:
Use the following configuration to disable the duplicate-checking functionality for a specific form:
Misc:
We use the CHT provided
medic-client/contacts_by_parent
view to query for siblings.@kennsippell, since we've touched on the duplicate topic before, it would be great to get your thoughts as this as well.
#Issue
#9601
Code review checklist
License
The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.