Associate Data Automatically
When using the Associate Data Automatically option, associations are created based on matching the name of the selected data elements with the information in the key column of a metadata table previously saved in the Navigation Area. Matching is done according to three possible schemes: Exact, Prefix and Suffix (see Matching schemes).
Note: This option is to be used carefully when data elements already have associations with the metadata table. In addition to adding any new associations, the already existing associations will be updated to reflect the current information in the metadata table. This means associations will be deleted for a selected data element if there are no rows in the metadata table that match the name of that data element. This could happen if, for example, you changed the name of a data element with a metadata association, and did not change the corresponding key entry in the metadata table.
To associate data automatically, click the Associate Data button at the bottom of the Metadata Table view, and select Associate Data Automatically.
Select the data the tool should consider when setting up metadata associations. This can be done by selecting individual files, or the content of an entire folder as seen in figure 11.10
Figure 11.10: Selecting all data elements in a folder.
Specify a role that should be assigned to each data element that is associated to a metadata row (figure 11.11). The role can be anything that describes the data element best.
Figure 11.11: Provide a role for the data elements. The default role provided is "Sample data".
Select whether the matching of the data element names to the entries in the key column should be based on exact or partial matching.
Figure 11.12: Data element names can be matched either exactly or partially to the entries in the key column.
Choose to Save the outputs. Data associations and roles will be saved for data elements where the name matches a key column entry according to the selected matching scheme.
Matching schemes
A data element name must match an entry in the key column of a metadata table for an association to be set up between that data element at the corresponding row of the metadata table. Two schemes are available in the Association Data Automatically for matching up names with key entries:
- Exact - data element names must match a key exactly to be associated. If any aspect of the key entry differs from the name of a selected data element, no association will be created.
- Prefix - data elements with names partially matching a key will be associated: here the first whole part(s) of a name must match a key entry in the metadata table for an association to be established. This option is explained in more detail below.
- Suffix - data elements with names partially matching a key will be associated: here the last whole part(s) of a name must match a key entry in the metadata table for an association to be established. This option is explained in more detail below.
Partial matching rules
For each data element being considered, the partial matching scheme involves breaking a data element name into components and searching for the best match from the key entries in the metadata table. In general terms, the best match means the longest key that matches entire components of the name.
The following describes the matching process in detail:
- Break the data element name into its component parts based on the presence of delimiters. It is these parts that are used for matching to the key entries of the metadata table.
Delimiters are any non-alphanumeric characters. That is, anything that is not a letter (a-z or A-Z) or number (0-9). So, for example, characters like hyphens (-), plus symbols (+), spaces, brackets, and so on, would be used as delimiters.
If partial matching was chosen with a data element called Sample234-1 (mapped) (trimmed) would be split into 4 parts: Sample234, -1, (mapped) and (trimmed).
- Matches are made at the component level. A whole key entry must match perfectly to at least the first (with the Prefix option) or the last (with the Suffix option) complete component of a data element name.
For example, a key entry Sample234 would be a match to the data element with name Sample234-1 (mapped) (trimmed) because the whole key entry matches the whole of the first component of the data element name. Conversely, if they key entry had been Sample23, no match would be identified, because they whole key entry does not match to at least the whole of the first component of the data element name.
In cases where a data element could be matched to more than one key, the longest key matched determines the metadata row the data will be associated with.
The table below provides examples to illustrate the partial matching system, on a table that has the keys with sample IDs like in figure 11.13) (i.e., ETC-001, ETC-002, ..., ETC-013),
Data Element Key Reason for association ETC-001 (Reads) ETC-001 Key ETC-001 matches the first part of the name ETC-001 un-m...(single) ETC-001 '' ETC-001 un-m...(paired) ETC-001 '' ETC-002 ETC-002 Key ETC-002 matches the whole name ETC-003 None No keys match this data element name ETC-005 ETC-005 Key ETC-005 matches the whole name ETC-005-1 ETC-005 Key ETC-005 matches the first part of the name ETC-006-5 ETC-006 Key ETC-006 matches the first part of the name ETC-007 None No keys match this data element name ETC-007 (mapped) None '' ETC-008 None '' ETC-008 (report) None '' ETC-009 ETC-009 Key ETC-009 matches the whole name