Discovery Data Repository
An analytic-ready de-identified data repository
The DDR is an Analytic-Ready de-identified data repository and a set of tools and technologies to enable back and front-end access to underlying data. The DDR was developed by UCLA's Office of Health Informatics & Analytics (OHIA) to provide researchers with secured and compliant access to health system data assets. We offer two bundle options to use the DDR.
1. Discovery Data Repository Desktop Bundle
This bundle provides the user access to the Discovery Data Mart, and these tools for data discovery and analysis: SQL Server Management Studio, Jupyter Notebook, and Anaconda / R Studio
The Discovery data mart contains de-identified patient electronic medical records from March 2nd, 2013, the CareConnect Go Live date, and any legacy data that was converted into Epic before Go Live. Currently, the data mart includes seven key domains: Allergy, Diagnosis Events, Encounter Events, Immunization Events, Lab Component Results, Medication Orders, and Procedure Orders. This DataMart is refreshed monthly, on the 10th of each month, by the EIA Data Architecture team, in the Office of Health Information & Analytics.
To view the data dictionary, click this link: DDR Data Dictionary
Click here to request access to DDR Data Mart
2. Discovery Genomic Integration Desktop Bundle
This bundle provides a user with the following data/tools and software:
- Discovery Data Mart
- Genomic Files
- Linux Virtual Machine with the following Genomics software installed: PLINK, GCTA, Eigenstrat, PyLMM
Users can integrate genomic data from the UCLA AtLAs BioBank with EHR phenotypic data from the DDR Data Mart to conduct genome wide association studies (GWAS), principal component analyses and, other analyses using commonly utilized genomics command line tools.
Click here to request access to the Genomics Data & Tools
How is the DDR being used at UCLA Health?
Here are some examples of how DDR is being at UCLA today.
1. DDR data are used to support quality improvement initiatives which include but are not limited to:
- Examining risk factors for the diagnosis and progression of various diseases
- Studying disease trajectories
- Predicting health outcomes
- Degeneration prediction
- Developing cost efficiency indices
2. Research
- Utilizing machine-learning techniques and DDR data to explore factors associated with improved outcomes for surgical patients.
- Using machine learning methods and DDR data, to identify genetic and non-genetic risk factors and predict disease progression for various eye disease and related conditions, such as age-related macular degeneration and thyroid eye disease.
- Developing risk prediction models of various post-liver transplant outcome measures.
3. DDR data are used to perform genome-wide association studies to identify genetic variants that are associated with several traits recorded in electronic health records. One can also perform other genetic analyses such as estimating heritability and risk prediction.
How to access the DDR Data Mart & Genomics Data?
To request access to any data sets or tools, you must submit a data access request. If you do not already have a User Profile in Collibra, you must create one. Please follow these steps to get access to the DDR:
- Submit a ServiceNow ticket to get access to Collibra; wait for confirmation
- Fill out DDR User Profile on Collibra: CART DDR Dashboard
- Enter Name, AD, Center of Excellence, your Director/CAO
- Select a bundle of Tools/Software/Data you would like to access
- Enter a detailed Justification and explain how you intend to use the data
- Submit the profile creation request.
- Upon receiving an email from Collibra, accept OHIA and Compliance DDR Data Usage Agreement.
- Please allow at least 2 weeks for your request to be approved. You will receive an email notification upon provisioning of your request from ServiceNow and Collibra.
For more detailed instructions: CART DDR Training Material
Training Material
Please download the CART DDR training material here: CART DDR Training Material
Support
For any questions relating to the request process or issues, please contact SelfServiceAnalytics@mednet.ucla.edu