CONSolidated REcommendations for sharing Individual participant Data (CONSIDER)

1 Format

1.2 Group data and data elements into relevant data domains (e.g., medication history, laboratory results history, medical procedure history)

Consider integration of research and routine healthcare data (e.g., from Electronic Health Record system or healthcare billing data). The emergence of several common data models (CDMs) for healthcare data shows that there is a common way to organize clinical data. For example, rather than considering numerical value as test result and unit in which the value is expressed as two separate data elements, a review of several CDMs shows that these are typically grouped into a single data row.

Positive Example:

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if data elements grouped into relevent domains, 0 if data elements are not grouped

1.3 Follow a convention when using relative time.

The recomendation applies to a context where absolute dates in raw data were replaced with relative dates. The convention is to start counting at day 1, not 0. Assumme that index event (e.g., day when patient consented to the trial or day of first visit) has been specified at datetime granularity. Refer to the first day as relative day 1. Do not use day 0 as a relative date. For example if the index event is signing of informed consent and it was signed at 10:31am on March 10, 2011, the index date-time is midnight of March 10, 2011. In relative time, of an event on the next day (on March 11) at 11:15am, would have relative time of Day 2, 11:15am. (for analogous discussion in astronomical data see https://en.wikipedia.org/wiki/Sol_(day_on_Mars)#Usage_in_Mars_landers)

Positive Example: NCT00262522 uses relative time as day 1 for each patient is the date of their first visit, with all events referenced as days relative to day 1.

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if a convention for releative time is used, 0 if not

1.4 Utilize previosly defined Common Data Elements and reference them by their identifiers

If you considered formally defined research Common Data Elements at study design (more common for studies initiated after 2015), provide a spreadsheet file that lists all CDEs utilized by your study. Include unique CDE identifiers (e.g., PhenX VariableID). This recommendation promotes two aspects. First is to adopt established CDEs. Second, if the study did adopt CDE, it must clearly indicte which DEs are CDEs and which are not (such DEs would be unique to the study).

Positive Example: Data elements in AllofUs study are linked to LOINC codes. Elements are listed here and example of mapping can be seen for this weight (http://athena.ohdsi.org/search-terms/terms/903121) data element (mapped to LOINC CDE 29463-7 (https://loinc.org/29463-7/)).

Challenging Example: AllofUs study provides the source for each data element (http://athena.ohdsi.org/search-terms/terms?vocabulary=PPI&conceptClass=Clinical+Observation&page=1&pageSize=50&query=). Identifiers for individual CDE are not provided. Although, in many cases, the instrument does not have a formal identifier that could be listed.

Score: For studies initiatied after Jan 1, 2015,1 if common data elements are used, and 0 if no defined common data elements are used

1.5 Use formats that can be natively loaded (without highly specialized add-ons) into multiple statistical platforms

The preferred file types are comma/tab separated values (.CSV) files instead of SAS XPT, XLS/XSLX), which require add ons or conversions to be read in and used in different statistical platforms (e.g., SAS, STATA, R, etc.)

Positive Example: NCT01751646 provides IPD in CSV files easily usable in a multitude of statistical platforms. Trial https://clinicaltrials.gov/ct2/show/study/NCT00005159 provides 3 formats

Challenging Example: NCT00951249 provides IPD as SAS XPT files which require processing and conversion to use in any non-SAS platform for view and analysis

Score: Score 1 or 0. No Partial score. 1if a format that can be natively loaded into multiple platforms is used, 0 if a format is used that cannot be natively loaded into multiple platforms and requires conversion, add-ons, or specialty software.

2 Data Sharing

2.1 Register your study at ClinicalTrials.gov registry

Trial titled RC-HIVMAB060-00-AB (VRC01) in People With Chronic HIV Infection Undergoing Analytical Treatment Interruption is registered at ClinicalTrials.gov under NCT02471326 https://clinicaltrials.gov/ct2/show/study/NCT02471326. It allows retrieval of study metadata that is unified across various platform.

Positive Example:

Challenging Example: dbGaP repository contains datasets that originate from a clinical trial but the trial reference at ClinicalTrials.gov is not provided.

Score: Score 1 or 0. No partial score. 1 if study is registered, 0 if stidy is not registered.

2.5 Provide basic summary results using results registry component of Clinicaltrials.gov

If the clinical trial registry allows (such as on ClinicalTrials.gpv) upload basic summary results to the registry at the completion of the study or when first available.

Positive Example: NCT00962780 (https://clinicaltrials.gov/ct2/show/results/NCT00962780) has basic summary results of the study posted on ClinicalTrials.gov using the results registry component

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if results are included, 0 if no results are uploaded. If a registry does not support posting results, score is 0.

2.6 Utilize ClinicalTrials.gov fields for uploading study protocol, empty case report forms, statistical analysis plan and study URL link

ClinicalTrials.gov allows for the upload of relevant documents related to the study page including protocols, analysis plans and other related documents while also allowing for the providing of links to relevant materials such as data dictionaries and IPD.

Positive Example: NCT02755818 (https://clinicaltrials.gov/ct2/show/NCT02755818) provides documents, such as protocol, informed consent form and results as both links to external sites and uploads to ClinicalTrials.gov.

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if study documents are uploaded to the clinical trial registry, 0 if no study documents are uploaded.

2.7 Provide de-identified Individual Participant Data

Several sponsors require sharing of IPD data to allow for external validation of results and to facilitate secondary research. This recommendation is linked to Data Sharing: Registry: Link IPD

Positive Example: NCT00933595 in the ClinicalTrial.gov record provides a link to request IPD through the data sharing platform https://biolincc.nhlbi.nih.gov/studies/lung_hiv/

Challenging Example: For studies registerd on ClinicalTrials.gov during 2019, 68.2% of studies answering whether they plan to share IPD, answered ‘No’.

Score: Score 1 or 0. No Partial score. 1 if study shared IPD, 0 if no IPD were shared.

3 Study Design

3.1 Adopt previously defined applicable Common Data Elements

This recommendation assumes there are significant resources (financial or staff) that can be used for this goal. Common Data Elements initiatives in various domains (e.g., PhenX, PROMIS) aim to standardize data collection.

Positive Example: AllofUs study adopted LOINC terminology to document body measurements.

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if CDEs are used, 0 if no CDEs are used.

4 Case Report Forms

4.3 List all CRFs

Provide a machine readable list of forms.

Positive Example:

Challenging Example:

Score:

5 Data Dictionary

5.1 Provide data dictionary

Data can not be interpreted if necessary metadata is missing. If spreadsheet format of data is provided, a data dictionary explains and describes what each data column contains.

Positive Example: On ClinicalTrials.gov approximately 1260 studies provided uploaded data dictionaries or links to the data dictionary

Challenging Example: NCT00711009 provided IPD and other documents in the shared daata package, but excluded a data dictionary

Score:

5.2 Provide data dictionary in machine readable format

DD can be in PDF, that is not fully machine readable without processing issues that require human attention. E.g., removal of header and footer text.

Positive Example: NCT01751646 (https://dash.nichd.nih.gov/study/18343) provides data dictionary as a single CSV file.

Challenging Example:

Score:

5.3 Separate data dictionary from de-identified individual participant data. Since it contains no participant level data, do not require local ethical approval as a condition of releasing the data dictionary (avoid a requestwall for data dictionary).

If DD does contain important intellectual property (IP), consider creating a smaller list of DEs that do not contain any IP and release this limited subset of DEs without employing a request-wall.

Positive Example: NCT01769456 has the data dictionary on the data sharing platform and is available for download without requiring any request, approval or the filling out of any documents. Another example for study NCT00005159 is here (https://biolincc.nhlbi.nih.gov/media/studies/nlms/Code_Manuals_and_Forms.pdf).

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if the data dictionary is provided seperately and in advance of any request for de-identified data, 0 if the data dictionary is only provided as part of the de-identified data package.

5.5 Provide data dictionary in a single, machine-readable file.

This simplifies machine processing of available study data. Using a single file approach also ensures that each file (if scattered across multiple) uses the same structure (e.g., DE label, DE data type, DE permissible values [for categorical DEs])

Positive Example: NCT01772823 provides a single data dictionary document as a CSV containing all data elements and information pertaining to the data elements such as data type and description

Challenging Example: Trials NCT00005274 and NCT00005274 provided DD in several files. In order to relate those to CDEs, manual processing is required. NCT01233531 has 17 data dictionary files, which includes documents in different formats and document types. Also includes identical file names that represent dictionaries for different data based on visit type and study population group. NIDA Data Share trial had 65 data files but only 63 data dictionary PDF files. Matching data files with data dictionary requires manual matching.

Score: Score 1 or 0. No Partial score. 1if data dictionary inlcuded in a single macheiene-readable format, 0 if the data dictionary is in multiple files or a non maciene-readable format.

5.6 For each data element, provide a data type (such as numeric, date, string, categorical)

Specifying data type helps computers to process the information properly. Data type also helps with semantic matching to corresponding CDEs. For example, date of death data type is stated as date (not as character). Most studies collect categorical data types.

Positive Example: NCT00491556 provides the data type for each of the data elements in the DD UK Biobank uses a comprehensive set of data types available at http://biobank.ctsu.ox.ac.uk/showcase/help.cgi?cd=value_type The UK Biobank description and listed types are in italic below The Value Type of a Data-Field describes the type of variable corresponding to it. There are 10 categories: 1) Integer - whole numbers, for example the age of a participant on a particular date; 2) Categorical (single) - a single answer selected from a coded list or tree of mutually exclusive options, for example a yes/no choice; 3) Categorical (multiple) - sets of answers selected from a coded list or tree of options, for instance concurrent medications; 4) Continuous - floating-point numbers, for example the height of a participant; 5) Text - data composed of alphanumeric characters, for example the first line of an address; 6) Date - a calendar date, for example 14th October 2010; 7) Time - a time, for example 13:38:05 on 14th October 2010; 8) Compound - a set of values required as a whole to describe some compound property, for example an ECG trace; 9) Binary object - a complex dataset (blob), for example an image; 10) Records - a summary showing the volume of records data available via the secure portal.

Challenging Example: NCT00046280 does not provide data type at all.

Score: Score is a percent. 1 if all data elements havedata type, 0 if no elements have type. A score of 0.5 means 50% of data elements incldue a data type.

5.7 For categorical data elements, provide a list of permissible values and distinguish when numerical code or string code is a code for a permissible value (versus actual number or string)

For example, for educational level data element, it is important to know what possible values were considered during data collection. While it is possible to discover those permissible values from IPD, if some values were never applicable to any of the subjects, the reverse-engineered permissible value list will be incomplete. In terms of standards, CDISC ODM and REDCap provide a mechanism to list permissible values.

Positive Example: NCT00683579 provides permissible values and definitions for the values associated with categorical data elements.

Challenging Example: NCT00000590 does not provide permissible vales for categorical variables in the data dictionary. Another example is providing permissible value in the same document that describes the data elements, but in non-machine readable way.

Score: Score 1 or 0. No Partial score. 1 if a list of permissable values are included, 0 if no list is included

5.8 Distinguish categorical string data elements from free-text string data elements

Categorical data type is often not properly assigned to string and numerical data elements. If this is the case, the data dictionary must separate true free text strings from strings that are picked from a list of enumerated possible values. The same problem applies to numerical data elements. Data dictionary should distinguish proper numerical data elements from numerical-categorical.

Positive Example: While NCT01751646 does not provide a separate permissable values dictionary or have a catorgorical data type listed, it does provide the permissable values and a categorical data type flag in the primary data dictionary.

Challenging Example: NCT01233531 does not provide a permissable values dictionary or label any data element as categorical. There is also no flagindicating data elements as categorical making it imporssible from the provided data dictionaries to know which elements are string-proper compared to string-categorical and which are numeric-proper versus numeric-categorical.

Score:

5.9 Link utilized Common Data Elements adopted by your study to appropriate terminologies

This recommendation does not mean that all DEs (and permissible values) are linked to a terminology or applicable standard. Only where a relevant code exist (or is easy to find and reference) this link is recommended. This is linked to another item that deals with external terminologies.

Positive Example: AllOfUs study links DEs to SNOMED CT and LOINC codes. For example, Body Mass Index data element is clearly linked to LOINC code of 39156-5 (for BMI)

Challenging Example: NCT01751646 has no link to any standardized vocabulary

Score: Score 1 or 0. No Partial score. 1 if used CDEs are linked to used terminologies, 0 if used CDEs are not linked.

5.10 Link data elements or permissible values to applicable routine healthcare terminologies (either because you designed them to be linked or post-hoc, they can be semantically linked as equivalent)

Positive Example: NCT00963235 states in the data dictionary the use of LOINC for the coding of lab tests in the study

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if data elements linked to routing healthcare terminologies, 0 if t daat elements are not linked to routine terminologies and custom values are used.

5.11 Provide complete data dictionary (all elements in data are listed in a dictionary) and all types of applicable dictionaries (date elements, forms [or groupings], and permissible values)

Note: NIDA trial had 5 data files that were missing a data dictionary file.

Positive Example: NCT00000590 (https://biolincc.nhlbi.nih.gov/studies/pactg/) provides 100% of the data elements found in the IPD, in the data dictionary

Challenging Example: NCT01751646 (https://dash.nichd.nih.gov/study/18343) includes less then 50% of the data elements in the data dictionary

Score: Score is a percent. 1 if all data elements are included in dictionary, 0 if no data elements are included. A score of 0.5 means 50% of data elements are incldued in the data dictionary.

5.12 Include sufficient description for data elements

In some cases, a descriptive name can be sufficient to define a data element and interpreting a data element is straightforward. However, if a study has two distinct data elements with identical name, data can be hard to interpret.

Positive Example: All elements in UK Biobank have description. See http://biobank.ctsu.ox.ac.uk/showcase/field.cgi?id=22501 for example for data element titled: (Year ended full time education) has the following detailed description: Some values have special meanings defined by Data-Coding 100306. Units of measurement are calendar year.

Challenging Example: In study NCT01751646: Vitamin D Absorption in HIV Infected Young Adults Being Treated With Tenofovir Containing cART: a description for forms C100 and B100 both state: Specimen Tracking Form. This makes it to interpret if the data represent the same or two distinct specimen. Avoid identical descriptions for 2 separate items.

Score: Score is a percent. 1 if all data elements have adequate dsecriptions, 0 if no descriptions are included. A score of 0.5 means 50% of data elements incldue a sufficient description.

5.13 Use identifiers (unique where applicable) for data element, forms and permissible values.

Permissible value data dictionary should be linked to data element data dictionary. Provide annonated case report forms

Positive Example:

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if all identifiers for elements, forms and permissable values are unique, 0 if duplicate identifiers are used.

6 Data de-identification

6.1 Provide data de-identification notes

Data de-identification notes state how identifiers have been removed or redacted to ensure compliance with privacy regulations. This includes the remoeal of personal identifers and the shifting or relativsation of dates.

Positive Example: NCT00490412 (https://dash.nichd.nih.gov/study/17335) provides data de-identification notes and methodology prior to sharing IPD and in shared data packages.

Challenging Example:

Score: Score 1 or 0. No Partial score. 1 if data de-identification notes are provided, 0 if de-identification notes are not provided.

1 Format

1.1 Share person table in CDISC or OMOP format

1.2 Group data and data elements into relevant data domains (e.g., medication history, laboratory results history, medical procedure history)

1.3 Follow a convention when using relative time.

1.4 Utilize previosly defined Common Data Elements and reference them by their identifiers

1.5 Use formats that can be natively loaded (without highly specialized add-ons) into multiple statistical platforms

2 Data Sharing

2.1 Register your study at ClinicalTrials.gov registry

2.2 Do not limit study metadata to the legally required elements. Also populate optional elements (such as data sharing metadata)

2.3 Fully populate data_sharing_plan text filed on ClinicalTrials.gov (if sharing data)

2.4 If Individual Participant Data is shared on a data sharing platform, update the ClinicalTrials.gov record with the URL link to the data.

2.5 Provide basic summary results using results registry component of Clinicaltrials.gov

2.6 Utilize ClinicalTrials.gov fields for uploading study protocol, empty case report forms, statistical analysis plan and study URL link

2.7 Provide de-identified Individual Participant Data

3 Study Design

3.1 Adopt previously defined applicable Common Data Elements

4 Case Report Forms

4.1 Share all Case Report Forms used in a study

4.2 Share Case Report Forms in non-PDF, machine-readable format.

4.3 List all CRFs

5 Data Dictionary

5.1 Provide data dictionary

5.2 Provide data dictionary in machine readable format

5.3 Separate data dictionary from de-identified individual participant data. Since it contains no participant level data, do not require local ethical approval as a condition of releasing the data dictionary (avoid a requestwall for data dictionary).

5.4 Share a data dictionary as soon as possible. Do not wait until the data collection is complete.

5.5 Provide data dictionary in a single, machine-readable file.

5.6 For each data element, provide a data type (such as numeric, date, string, categorical)

5.7 For categorical data elements, provide a list of permissible values and distinguish when numerical code or string code is a code for a permissible value (versus actual number or string)

5.8 Distinguish categorical string data elements from free-text string data elements

5.9 Link utilized Common Data Elements adopted by your study to appropriate terminologies

5.10 Link data elements or permissible values to applicable routine healthcare terminologies (either because you designed them to be linked or post-hoc, they can be semantically linked as equivalent)

5.11 Provide complete data dictionary (all elements in data are listed in a dictionary) and all types of applicable dictionaries (date elements, forms [or groupings], and permissible values)

5.12 Include sufficient description for data elements

5.13 Use identifiers (unique where applicable) for data element, forms and permissible values.

6 Data de-identification

6.1 Provide data de-identification notes

7 Choice of a Data Sharing platform

7.1 Use platforms that allows download of all studies available on the platform

7.2 Choose a platform that supports batch request (ability to request multiple studies with one request)