Associate Data Automatically
The main characteristics of the Associate Data Automatically tool are:
- Suited to associated large metadata tables or associating to many data elements.
- Well suited for use with newly imported data, where no associations already exist.
- Associations are created based on matching the information in the key colum of the metadata table with the name of the selected data elements.
- Two matching schemes are available: Exact and Partial (see section 3.2.2).
- A key column must be identified for the metadata table for this option to be available.
- Use with care with data elements that already have associations with the metadata table being worked with. As well as adding any new associations, existing associations will be updated to reflect the current information in the metadata table. This means associations will be deleted for a selected data element if there are no rows in the metadata table that match the name of that data element. See also the warning at the end of this section about this.
To run the Associate Data Automatically tool,
- Click the Associate Data button at the bottom of the
Metadata Table view, and select Associate Data Automatically.
Your metadata table must be saved and a key column designated for the metadata table for this option to be available.
- Select the data the tool should consider when setting up metadata associations in the window that appears. An example of this is shown in figure 3.16. You can select an item or sets of items in the navigation area on the left and move these into the selected elements list. Alternatively, you can right click on a folder and specify that all elements in the folder should be put in the selected elements list. This is illustrated in figure 3.17.
Figure 3.16: Select data for automatic metadata association..
Figure 3.17: Selecting all data elements in a folder. - Click on the button labeled Next.
- Set the role that should be assigned to each data element that is associated to a metadata row (figure 3.18).
Figure 3.18: Provide a role for the data elements. The default role provided is "Sample data". - Select whether the matching of the data element names to the entries in the key column should be based on exact or partial matching. These options are explained further below.
Figure 3.19: Data element names can be matched either exactly or partially to the entries in the key column. - Click on the button labeled Next and then choose to Save the
outputs.
Data associations and roles will be saved for data elements where the name matches a key column entry according to the selected matching scheme.
Warning: It is safest only to select data elements that have no existing association to the metadata table being worked with, or carefully selecting any data elements with an existing association which you wish to update. All data selected that has an association with the metadata table being worked with will be updated by the automatic association tool. This means that any new or updated information in a metadata row can be added, but it also means that if no rows in the metadata match such a data element anymore, then the data association will be removed. This could happen if, for example, you changed the name of a data element with a metadata association, and did not change the corresponding key entry in the metadata table.
Matching schemes
A data element name must match an entry in the key column of a metadata table for an association to be set up between that data element at the corresponding row of the metadata table. Two schemes are available in the Association Data Automatically for matching up names with key entries:
- Exact - data element names must match a key exactly to be associated. If any aspect of the key entry differs from the name of a selected data element, no association will be created.
- Partial - data elements with names partially matching a key will be associated. Here, data element names are broken into parts using common delimiters. The first whole part(s) must match a key entry in the metadata table for an association to be established. This option is explained in more detail below.
Partial matching rules
For each data element being considered, the partial matching scheme involves breaking a data element name into components and searching for the best match from the key entries in the metadata table. In general terms, the best match means the longest key that matches entire components of the name.
The following describes the matching process in detail:
- Break the data element name into its component parts based on the presence of delimiters. It is these parts that are used for matching to the key entries of the metadata table.
Delimiters are any non-alphanumeric characters. That is, anything that is not a letter (a-z or A-Z) or number (0-9). So, for example, characters like hyphens (-), plus symbols (+), spaces, brackets, and so on, would be used as delimiters.
If partial matching was chosen with a data element called Sample234-1 (mapped) (trimmed) would be split into 4 parts: Sample234, -1, (mapped) and (trimmed).
- Matches are made at the component level. A whole key entry must match perfectly to at least the first complete component of a data element name.
For example, a key entry Sample234 would be a match to the data element with name Sample234-1 (mapped) (trimmed) because the whole key entry matches the whole of the first component of the data element name. Conversely, if they key entry had been Sample23, no match would be identified, because they whole key entry does not match to at least the whole of the first component of the data element name.
In cases where a data element could be matched to more than one key, the longest key matched determines the metadat row the data will be associated with.
The table below provides examples to illustrate the partial matching system, on a table that has the keys with sample IDs like in figure 3.20) (i.e. ETC-001, ETC-002, ..., ETC-013),
Data Element Key Reason for association ETC-001 (Reads) ETC-001 Key ETC-001 matches the first part of the name ETC-001 un-m...(single) ETC-001 '' ETC-001 un-m...(paired) ETC-001 '' ETC-002 ETC-002 Key ETC-002 matches the whole name ETC-003 None No keys match this data element name ETC-005 ETC-005 Key ETC-005 matches the whole name ETC-005-1 ETC-005 Key ETC-005 matches the first part of the name ETC-006-5 ETC-006 Key ETC-006 matches the first part of the name ETC-007 None No keys match this data element name ETC-007 (mapped) None '' ETC-008 None '' ETC-008 (report) None '' ETC-009 ETC-009 Key ETC-009 matches the whole name