Objects H
Use this page to Register Objects.
To access this page:
- Click Common > Analyze in the Navigation pane.
- Click the Duplicates icon for a data source.
Objects V (All Tabs)
Use this page to Register Objects.
This page contains the following tabs:
General tab
Field
Description
Dictionaries
Click to open the Dictionaries page in System Administration to add, edit and delete dictionary entries. Users must have access to System Administration to manage dictionaries.
Stop List
Click to open the Stop Lists page in System Administration to add, edit and delete stop list entries. Users must have access to System Administration to manage stop lists.
BDD
Click to open the Bulk Duplicate Detection page in System Administration to run the bulk duplicate detection administration process for the object.
NOTE: Users must have access to System Administration to run duplicate detection.
Object Settings
Object
Displays name of object being analyzed for duplicate records.
View Name
Displays name of object. Click to view details about the object.
View
Click to open the Objects page to view object’s data.
Advanced Settings
Search ID
Displays ID of the search table that controls which record pairs in the source data are stored as a duplicate. The DSP® is delivered with a default search table, DSPCommon.ttDuplcate, that has been set up for the BDD process.
NOTE: If a search table other than DSPCommon.ttDuplicate is to be used, it must be set up in System Administration.
Non Searchable Characters
Displays characters excluded from the duplicate detection search.
Stop List ID
Displays ID for list of words ignored during the duplicate detection search. Default value is managed in Configuration > Modules > Parameters-Duplicates.
Search Threshold
Displays level to ignore false positives.
Duplicate Detection Threshold
Displays weight percent of the calculated value for matched words. Words that match carry more weight than words that sound alike. Default value is managed in Configuration > Modules > Parameters-Duplicates.
Synonym Weight
Displays weight value of synonym matches.
Sound Ex Weight
Displays percentage of combined calculated value for words found within the search (number of words found plus the number of words that sound alike divided by the total number of words). Words that match carry more weight than words that sound alike.
Custom Sound Ex Function ID
Displays ID for custom SQL Server Sound Ex function. Selecting a custom function improves accuracy of duplicate detection, but consequently, decreases performance.
Index Batch Size
Displays number of records to process in one pass through the data. Default valus is 1000.
Duplicate Detection Batch Size
Displays number of records queued up in the duplicate detection process. This field allows a subset of large files to be processed and at the same time, limits the resources required.
Word Ratio Threshold
Displays number of words in each duplicate pair. A value less than 50% marks a duplicate value for removal. For example, if A has 10 words and B has 1 word, which matches on one of A’s words, then A-B matches 10%, but B-A matches 100%. This 100% is the false positive; the Word Ratio will remove this as a potential match. Default value is managed in Configuration > Modules > Parameters-Duplicates.
Remove Blank Lines
Click to remove blank lines from the HTML formatted output in the Candidates page. This action makes each object block smaller because white space is removed. When comparing objects with multiple lines, such as address data, multiple lines may cause the data to not line up on the page. If needed, the Remove Blank Lines check box can be disabled and the object can be re-built. Default value is managed in Configuration > Modules > Parameters-Duplicates.
Unicode Separate Characters
If enabled, Unicode characters (double-byte) are included in the duplicate detection process.
Actions tab
Field | Description |
Build | Click to find duplicates. |
History | Click to open the Object History page to view details on previous builds (i.e., duplicate searches) for the object. |
Reset Status | Click to change the status of the build from Processing to Procedures Completed. Only reset status if the build process is aborted and the status is still processing |
Reset Results | Click to remove all previous duplicates and non-duplicate results for the object. Button is only available for objects that returned duplicates (which display on the Results page). This action is not reversible; it is recommended to back up the drResultsDuplicate table before the results are reset. |
Post Process | Click to continue running a stopped process. |
Duplicate Detection Results tab
Field | Description |
Duplicate Detection Status | Displays current status of the duplicate detection build process. |
Duplicate Detection Records | Displays number of records to be processed. |
Duplicate Detection Processed | Displays actual number of records processed at this point in time; value changes during processing. |
Duplicate Detection Queued | Displays number of records that still need to be processed. |
Duplicate Detection Duplicates | Displays number of records marked as duplicates. |
Duplicate Detection Execution Time | Displays time to run the duplicate detection process. |
Was this article helpful?
Sorry about that.
Why wasn't this helpful? (check all that apply)
Thanks for your feedback.
Want to tell us more?
Send an email to our authors to leave your feedback.
Great!
Thanks for your feedback.