6 Synthetic Data
6.1 map_dbmart_ccsr_icd.R
This R script maps the ICD codes to CCSR to obtain the phenx for further analysis and generate the summary statistics for the study population. The user need to manually enter the site name and the codes for COVID positive lab test.
Modify the corresponding lines and the run the code in the R file.
- Input:
- site (which should be your site name): in line 23 set to ‘SYNTHETIC’
- labtest_code (which is the positive lab test codes): in line 24 set to ‘U071’
- data folder (where save the cases.csv and dems_cases.csv): run line 25 and select ‘/home/rstudio/data/syntethic_data/’
- output directory run line 26 and select ‘/home/rstudio/output/’
- cohort (using the default cases): in line 27 set to ‘cases’
- CCSR_PASC_ICD.csv file (provided in the docker container, the path is already set)
- Output:
- cov_pats.RData
- cases_map_CCSR_site.csv
- cases_race_stat_site.csv
- cases_dems_stat_site.csv
- cases_eth_stat_site.csv
6.2 cases_incidences.R
This R script creates incident-level data from patient encounters for COVID infections. The rule is to cluster infections dates and recognize an infection if a cluster is 90 days or longer apart from another. Again, modify the required lines
- Input:
- site (which should be your site name): in line 28 set to ‘SYNTHETIC’
- cov_pats.RData (which is from the summarizing.R and should be saved under data folder): run line 29 and select ‘/home/rstudio/data/syntethic_data/cov_pats.RData’
- output directory: run line 30 and select ‘/home/rstudio/output/’
- Output (which should be saved under output folder):
- cov_pats.RData - site_cov_pats_summary.RData
6.3 Ref_DxX.R
This R script implements WHO definition of long COVID using the reference J thresholds from the MGB study. Run the file line by line and modify the corresponding input variables/select the corresponding files.
- Input:
- site (which should be your site name) in line 8 set to ‘SYNTHETIC’
- cov_pats.RData (which is from the cases_incidences.R and should be saved under output folder): run line 56 and select ‘/home/rstudio/output/cov_pats.RData’
- cases_map_CCSR_site.csv (which is from the map_dbmart_ccsr_icd.R and should be saved under data folder): run line 57 and select ‘/home/rstudio/data/syntethic_data/cases_map_CCSR_SYNTHETIC.csv’
- ref_corrs.RData (provided in the docker container): run line 58 and select ‘/home/rstudio/data/scripts/long_covid_ai_scripts/ref_corrs.RData’
- ref_J_thresholds.RData (provided in the docker container): run line 59 and select ‘/home/rstudio/data/scripts/long_covid_ai_scripts/ref_J_thresholds.RData’
- ref_J.RData (provided in the docker container): run line 60 and select ‘/home/rstudio/data/scripts/long_covid_ai_scripts/ref_J.RData’
- ref_phenxlookup.RData (provided in the docker container): run line 61 and select ‘/home/rstudio/data/scripts/long_covid_ai_scripts/ref_phenxlookup.RData’
- output directory: run line 63 and select ‘/home/rstudio/output/’
- Output:
- longCOVID_patients_site_ref_thresholds0.05.csv (which is the raw result)
- longCOVID_summary_site_ref_thresholds0.05.csv
- db_longhauler_chunk_x.RData
- patlookup.RData
- phenxlookup_site.RData
6.4 map_back_nonenact.R
This R script maps the maps the phenx back to ICD10 description and adds organ and clinical problem for results analysis.
- Input:
- site (which should be your site name): in line 16 set to ‘SYNTHETIC’
- longCOVID_patients_site_ref_thresholds0.05.csv (which should be saved under output folder): run line 17 and select ‘/home/rstudio/output/longCOVID_patients_SYNTHETICref_thresholds0.05.csv’
- cov_pats.RData (which should be saved under output folder): run line 18 and select ‘/home/rstudio/output/cov_pats.RData’
- cases_map_CCSR_site.csv (which should be saved under data folder): run line 19 and select ’’home/rstudio/data/syntethic_data/cases_map_CCSR_SYNTHETIC.csv
- ref_phenxlookup.RData (provided in the docker container): run line 20 and select ‘/home/rstudio/data/scripts/long_covid_ai_scripts/ref_phenxlookup.RData’
- combo_updated.csv (provided in the docker container): run line 21 and select ‘/home/rstudio/scripts/long_covid_ai_scripts/combo_updated.csv’
- ccsr_pasc_icd.csv (provided in the docker container): run line 22 and select ‘/home/rstudio/scripts/long_covid_ai_scripts/CCSR_PASC_ICD.csv’
- output directory: run line 23 and select ‘/home/rstudio/output/’
- Output:
- longhaulers_site.RData (which is the final result)
- longhaulers_duration_organ_combo_count_site.RData
- longhaulers_organ_count_site.RData
- longhaulers_organ_combo_count_site.RData