Research Hackathons - a path forward when Clinical "Standard of Care" and Clinical trials run out
Research Grade Data
May 2018 and March 2020 type 1 papillary kidney cancer hackahons
Note that the 2018 Teams had no access to RNA-seq data, only DNA Data. As such, the joined, sorted table helps determine which of the 17 Team's theories "held up" when expression data became available in 2020. Of particular note are the genes most over and under expressed (located at the begining and end of the table), which were suggested by Biomarkers.ai (github, 20180520 presentation) and, to a lesser extent, studentec (github, 20180520 presentation).
Studentec 2018 email
Chr 1 FLG2 - https://www.ncbi.nlm.nih.gov/gene/388698 - Filagrin family 2. It's flagged for skin where it has very high expression, but it's also got blips in adrenal, duodenum, prostate and small intestine. https://www.ncbi.nlm.nih.gov/gene/388698#gene-expression
Is this mutation oncogenic, or is high expression is a result of oncogenesis? I lean toward the latter.
Biomarkers.ai 2018 Presentation
KNG1 | kininogen 1 | Complement and coagulation cascades |
PTGER3 | prostaglandin E receptor 3 | Calcium signaling pathway |
DMRT2 | doublesex and mab-3 related transcription factor 2 | Sequence-specific DNA binding |
UMOD | uromodulin | Most abundant protein in normal urine |
FHL1 | Four And A Half LIM Domains 1 | Tumor suppressor gene on X chromosome. Points to JAK-STAT pathway |
Biomarkers.ai: Argued in 2018 that
- KNG1 uses alternative splicing to generate two different proteins: High MWt kininogen (HMWK) and MWt kininogen (LMWK). HMWK is essential for blood coagulation and assembly of the kallikrein-kinin system. This might explain my medical history.
- 1. Got warfarin/coumadin for diagnosis of deep vein thrombosis
- 2. DVT Symptoms returned. Went back and found: 7 cm mass left kidney, cerebral meningioma and spots in lung.
- Uromodulin (encoded by UMOD; also known as Tamm-Horsfall protein) is the most abundant protein in mammalian urine under normal physiological conditions.
- UMOD can distinguish Normal Tissue from p1RCC with 100% accuracy.
- Is UMOD also a good urine-based biomarker for p1RCC?
- FHL1 was an indicator for petrochemical exposure. For a time I worked in chemical refineries and on oil rigs. This might be the source of my somatic mutation.
- exposure to benzopyrene and several other agents enhances FHL1 expression
Comments from 2018 and 2020 Participants
- Saed Sayad (Biomarker.ai Team Lead): Let me share our last finding related to Kidney cancers. Using Broad institute's DepMap data we detected that all the top 100 gene deletion (less copy number) are on chromosome 14 and all the top 100 gene insertion (more copy number) are on chromosome 5. A good question here is, can we use this phenomenon to find a highly specific drug to target only the cancer cells?
- Dr. Alex Feltus (TSPG co-author): "No question it is a good biomarker but is a good drug target? Maybe your kidney was sensing an “infection” and produce lots of UMOD protein? Here is a great article on the subject."
- Reed Bender (TSPG co-author): UMOD is indicated in all of those cases, so it would appear that this is a pattern common to a wider array of various kidney cancers and not only your sample. As Alex noted, we have no way to determine whether the changes to UMOD are a causal element of kidney disease or an after-effect of the transcriptional response to a pre-existing problem in the microenvironment. Regardless, I like your point that UMOD could be a target for diagnosis in either case if it is prevalent enough. I found this review of the UMOD locus that you might find relevant, as it seems to support the hypothesis that UMOD is a relevant factor for a variety of kidney diseases. I think you are definitely on the right path here, it does appear that UMOD is consistently reported as being relevant to the progression of kidney disease.
Discussion
Unknowngly, I have been running an ensemble 'machine learning' meta-algorithm to gain insight into my disease. Due to the fact that the 2018 RNA-seq vendor failed to produce an RNA data set (mixture of bad luck and poor planning on my part), I was only able to give my DNA data (no RNA) to 17 different classifiers (teams) to determine which genes they would predict to be important. The 2020 RNA-seq vendor (Yale) did a better job and the 2020 Clemson Team was able to create a normalized set of 6 "patients" (including me) to test the 2018 predictions against. Biomarkers.ai sorts to the top/bottom in the table, not only for my data, but also, as Reed points out, for the normalized data from the other 5 TCGA patients. Note that the "followup algorithm" I ran in 2018: "genes were found by more than one team: BARD1, found by 'DeeperDrugs' and 'GNOME' and 'PDE4DIP found by 'GNOME' and 'HelloKidney2'", seems less convincing in retrospect. However, it IS a second prioritization algorithm I ran on the ensemble of 17 'machine learners'.
My next personal step will be to validate Biomarker.ai's results against a larger cohort (e.g. KNG1 seems overexpressed in in all p1RCC patients. Do they have DVT problems too?) and determine therapeutic implications. But in a wider context, more interestingly, re-imagining hackathons as frameworks for an ensemble of intelligent learners to solve medical research problems will allow future hackathon organizers to design other, more efficient meta-algorithms to provide solutions to hackathon patients.
An obvious point needs to be made explicitly here. Though I conributed data on the front end and did do an SQL table join and sort on the back end, NONE of this is my work. It is the work of the many volunteer researchers who spent many hours exploring my data on my behalf. The students from Alex Feltus' Lab, Reed Bender in particular, and the venues provided by Pete Kane's RTTP deserve a special shoutout, as do the 17 different 2018 Teams from the 2018 hackathon. Thank you.
Appendix: Teams ranked by differential expression
This table was formed by
- joining the "Gene" column from the 18,368 row Supplemental Table S5- TSPG Perturbations for all 5 TCGA Patients and for BP Towards a Normal Target Tissue.csv (S5) fromTargonski and Bender's TSPG paper, using my 2020 RNA-seq Hackathon data
- with the "GeneSymbol" column from the 2018 hackathon teams table (gene_team_description_table) which used my DNA data.
- and sorted by BP-Tumor differential ('BP' was the cohort's 6th patient. The other 5 patients came from the cancer genome atlas (TCGA).
BP-Tumor | TCGA-BQ-5884 | TCGA-BQ-7051 | TCGA-DZ-6131 | TCGA-GL-7966 | TCGA-Y8-A8RY | Team | |
---|---|---|---|---|---|---|---|
FLG2 | -0.569807 | -0.424620 | -0.624137 | -0.371452 | -0.888344 | -0.489152 | studentec |
FHL1 | -0.370446 | 0.004442 | -0.350791 | -0.008073 | 0.132820 | -0.459530 | BioMarkers.ai |
TAS2R19 | -0.363179 | -0.106670 | -0.337550 | -0.149153 | -0.630748 | -0.370376 | HelloKidney2 |
TERT | -0.358329 | -0.517497 | -0.324755 | -0.561428 | -0.979976 | -0.506805 | ExpressForce |
ANO9 | -0.300735 | -0.092386 | -0.150459 | -0.134747 | -0.033803 | -0.326815 | DamTheRiver |
TYMS | -0.287382 | -0.221745 | -0.385608 | -0.326908 | -0.464341 | -0.266986 | HelloKidney2 |
CDKN2A | -0.281222 | -0.210282 | -0.181982 | -0.353952 | -0.260816 | -0.331856 | HSIEH |
ST6GALNAC5 | -0.271928 | 0.065584 | -0.176794 | 0.088286 | -0.036964 | -0.338715 | studentec |
KRT81 | -0.267610 | -0.099378 | -0.267459 | -0.232783 | -0.472184 | -0.353074 | DamTheRiver |
E2F2 | -0.266385 | -0.258009 | -0.400997 | -0.443134 | -0.571322 | -0.224849 | HIF1AIsNotAnOncogene |
MET | -0.248146 | -0.133741 | -0.325427 | -0.220626 | -0.264011 | -0.267857 | HSIEH |
CDK9 | -0.244613 | -0.020311 | -0.031391 | -0.136258 | -0.178657 | -0.314119 | DeeperDrugs |
ITGAM | -0.239577 | -0.220240 | -0.062832 | -0.382688 | -0.098020 | -0.265496 | HelloKidney |
HOMER3 | -0.198717 | -0.357575 | -0.135580 | -0.363529 | -0.195335 | -0.203997 | DamTheRiver |
GRIN3B | -0.190098 | 0.129151 | -0.201077 | 0.097358 | -0.288442 | -0.178411 | DamTheRiver |
PLEKHO1 | -0.178240 | -0.111378 | 0.242540 | -0.222314 | -0.233759 | -0.235705 | Aizheng |
AQP12B | -0.169196 | -0.257862 | -0.106634 | -0.407076 | -0.793035 | -0.128892 | DamTheRiver |
TRABD2B | -0.164646 | 0.019348 | -0.074464 | -0.015357 | 0.062198 | -0.120814 | trimericOGs |
FAT1 | -0.164506 | -0.090656 | -0.256427 | -0.119463 | -0.100841 | -0.139434 | HSIEH |
ACSM2A | -0.156993 | -0.167751 | -0.184301 | -0.137565 | 0.175250 | -0.093546 | DamTheRiver |
MTHFR | -0.152779 | -0.187325 | -0.152542 | -0.036041 | 0.231739 | -0.114585 | HelloKidney2 |
HEXB | -0.151017 | -0.387950 | -0.023140 | -0.408949 | -0.247012 | -0.173371 | DamTheRiver |
SMARCB1 | -0.141196 | 0.039332 | -0.051870 | -0.051003 | -0.018657 | -0.067500 | HSIEH |
ASXL1 | -0.138275 | 0.000883 | 0.016468 | 0.025706 | -0.201556 | -0.033106 | ExpressForce |
NF2 | -0.134751 | -0.130819 | -0.067939 | -0.090925 | -0.004646 | -0.083543 | HSIEH |
NF2 | -0.134751 | -0.130819 | -0.067939 | -0.090925 | -0.004646 | -0.083543 | ExpressForce |
PALB2 | -0.134495 | -0.158659 | -0.213922 | -0.124376 | -0.317513 | -0.212576 | ExpressForce |
SCYL1 | -0.128744 | -0.126796 | -0.014015 | -0.132130 | 0.032286 | -0.227148 | GNOME |
HLA-DQA1 | -0.110334 | -0.165424 | 0.095696 | -0.246602 | 0.000223 | -0.155567 | DamTheRiver |
PSPN | -0.101657 | -0.085681 | -0.144439 | -0.062210 | -0.090204 | -0.164772 | HelloKidney2 |
TP53 | -0.100091 | -0.071941 | -0.260981 | -0.117477 | -0.383892 | -0.103676 | HSIEH |
CLEC2B | -0.096949 | 0.002800 | 0.228392 | -0.124560 | -0.137228 | -0.163133 | Aizheng |
CDK4 | -0.089716 | -0.064250 | -0.054293 | -0.252693 | -0.493711 | -0.038073 | HIF1AIsNotAnOncogene |
KDM5C | -0.087041 | -0.103329 | -0.058299 | -0.133834 | -0.180234 | 0.003652 | HSIEH |
BAP1 | -0.082421 | -0.058563 | -0.117987 | -0.006955 | 0.011244 | -0.078467 | ExpressForce |
BAP1 | -0.082421 | -0.058563 | -0.117987 | -0.006955 | 0.011244 | -0.078467 | HSIEH |
MAX | -0.076246 | -0.027232 | -0.033623 | -0.170371 | -0.101038 | -0.020786 | ExpressForce |
PFKP | -0.071520 | -0.032153 | -0.082845 | -0.037480 | -0.088870 | -0.090990 | HelloKidney2 |
AKR1B10 | -0.069564 | -0.171488 | -0.119164 | -0.256313 | -0.319291 | 0.029396 | Aizheng |
ATM | -0.069434 | -0.145945 | -0.118277 | -0.082008 | -0.033498 | -0.012883 | ExpressForce |
PABPC1 | -0.069175 | -0.086619 | -0.098706 | -0.245232 | -0.475914 | 0.003399 | GNOME |
PLEKHO2 | -0.059689 | -0.263049 | 0.253779 | -0.382753 | 0.065914 | -0.024072 | Aizheng |
CYP4F11 | -0.047624 | -0.020944 | -0.099524 | -0.031584 | -0.206356 | -0.026298 | Aizheng |
ANAPC1 | -0.034538 | -0.183558 | -0.079436 | -0.106482 | -0.449320 | 0.017958 | GNOME |
VHL | -0.030325 | -0.057974 | -0.134539 | 0.027279 | -0.224792 | 0.025418 | HSIEH |
SCAP | -0.025325 | -0.044531 | -0.118038 | 0.058991 | 0.018731 | -0.048174 | DamTheRiver |
AMPD2 | -0.019986 | 0.045026 | 0.023201 | 0.024956 | -0.209049 | -0.010809 | studentec |
AHNAK | -0.019587 | 0.018133 | -0.051411 | 0.046804 | 0.039939 | -0.018200 | GNOME |
FGFR1 | -0.012715 | 0.039134 | -0.112655 | -0.096765 | -0.117174 | 0.146514 | HIF1AIsNotAnOncogene |
FGFR1 | -0.012715 | 0.039134 | -0.112655 | -0.096765 | -0.117174 | 0.146514 | ExpressForce |
ABL1 | -0.012398 | -0.235752 | -0.076963 | -0.240885 | -0.140027 | 0.126637 | codeomics |
RBMX | -0.010541 | 0.037161 | -0.078827 | -0.079234 | -0.232528 | 0.066623 | GNOME |
BRAF | -0.007729 | -0.089601 | -0.135295 | -0.129156 | -0.089351 | 0.005839 | ExpressForce |
STAT5B | -0.007008 | 0.037958 | -0.119251 | 0.023475 | 0.134777 | -0.058842 | HIF1AIsNotAnOncogene |
RASAL1 | -0.006677 | 0.137588 | 0.056787 | 0.129254 | 0.120261 | -0.005573 | DamTheRiver |
SETD2 | 0.028468 | -0.014335 | -0.134763 | 0.221765 | 0.005985 | 0.088818 | HSIEH |
SETD2 | 0.028468 | -0.014335 | -0.134763 | 0.221765 | 0.005985 | 0.088818 | ExpressForce |
HLA-DRB5 | 0.042813 | -0.050248 | 0.212950 | -0.048054 | 0.096399 | 0.015600 | DamTheRiver |
KDM6A | 0.043495 | -0.022210 | 0.026036 | 0.008206 | -0.060767 | 0.093137 | HSIEH |
KDM6A | 0.043495 | -0.022210 | 0.026036 | 0.008206 | -0.060767 | 0.093137 | ExpressForce |
TUBB8 | 0.044153 | -0.116301 | 0.046298 | -0.210237 | 0.032830 | 0.017188 | KidneyBean |
BARD1 | 0.046575 | 0.004801 | -0.046508 | -0.036159 | -0.298620 | 0.055909 | GNOME |
BARD1 | 0.046575 | 0.004801 | -0.046508 | -0.036159 | -0.298620 | 0.055909 | DeeperDrugs |
APOB | 0.048625 | -0.014961 | 0.020182 | -0.076627 | 0.133317 | 0.021670 | DeeperDrugs |
RPS4Y1 | 0.051268 | 0.428156 | 0.103052 | 0.370285 | 0.741120 | -0.051931 | Aizheng |
TNFSF4 | 0.053382 | -0.000859 | 0.016590 | -0.070763 | -0.046204 | 0.008195 | HelloKidney |
DKK1 | 0.087564 | 0.103666 | 0.265039 | 0.033729 | 0.125600 | 0.147935 | HSIEH |
MSH2 | 0.092324 | -0.082261 | -0.071915 | -0.030180 | -0.189262 | 0.210167 | HIF1AIsNotAnOncogene |
ARID1A | 0.092712 | 0.028999 | -0.090535 | 0.019229 | 0.081958 | 0.172678 | ExpressForce |
PARP1 | 0.103807 | -0.162385 | 0.009363 | -0.184518 | -0.151341 | 0.145763 | HIF1AIsNotAnOncogene |
BCLAF1 | 0.104503 | 0.006003 | 0.012686 | 0.046388 | -0.054703 | 0.188558 | GNOME |
ZNF595 | 0.104678 | -0.013910 | 0.091758 | -0.030920 | -0.231314 | 0.133547 | DamTheRiver |
MTOR | 0.115340 | 0.019068 | 0.012066 | 0.245174 | 0.276105 | 0.198810 | codeomics |
MTOR | 0.115340 | 0.019068 | 0.012066 | 0.245174 | 0.276105 | 0.198810 | HSIEH |
HIVEP3 | 0.115363 | 0.117935 | 0.163794 | 0.052332 | -0.292806 | 0.160574 | DamTheRiver |
PBRM1 | 0.122078 | -0.185186 | -0.065475 | -0.020189 | 0.059176 | 0.171033 | ExpressForce |
PBRM1 | 0.122078 | -0.185186 | -0.065475 | -0.020189 | 0.059176 | 0.171033 | HSIEH |
PIK3CA | 0.124403 | -0.111106 | -0.026734 | -0.029699 | -0.082421 | 0.179556 | HSIEH |
PIK3CA | 0.124403 | -0.111106 | -0.026734 | -0.029699 | -0.082421 | 0.179556 | codeomics |
PTEN | 0.125224 | -0.081410 | 0.075240 | -0.038048 | -0.064603 | 0.100576 | HSIEH |
NFE2L2 | 0.127931 | 0.064057 | 0.012326 | -0.003859 | 0.013883 | 0.261751 | HSIEH |
EP300 | 0.128085 | -0.146216 | -0.063585 | -0.067288 | 0.047421 | 0.145411 | ExpressForce |
STARD13 | 0.143107 | 0.111454 | 0.084183 | 0.055009 | 0.090562 | 0.178444 | GNOME |
IQSEC3 | 0.158512 | -0.066876 | 0.292704 | 0.059387 | 0.206981 | 0.106501 | DamTheRiver |
BCOR | 0.161863 | 0.100278 | 0.122183 | 0.040666 | 0.163492 | 0.145450 | ExpressForce |
PDE4DIP | 0.186372 | -0.049034 | 0.168757 | 0.047555 | 0.264347 | 0.108051 | GNOME |
PDE4DIP | 0.186372 | -0.049034 | 0.168757 | 0.047555 | 0.264347 | 0.108051 | HelloKidney2 |
RET | 0.187033 | -0.002475 | 0.209691 | -0.002045 | 0.105982 | 0.296697 | HelloKidney2 |
STAG2 | 0.191113 | 0.032079 | 0.041945 | 0.093482 | 0.041825 | 0.225187 | HSIEH |
AGBL4 | 0.220241 | 0.162339 | 0.071565 | 0.204783 | 0.403928 | 0.291984 | trimericOGs |
MAPK8 | 0.225272 | -0.030687 | 0.062145 | 0.122059 | 0.139782 | 0.305888 | HIF1AIsNotAnOncogene |
FOLH1 | 0.277281 | 0.130650 | 0.326914 | 0.139867 | 0.003966 | 0.318961 | HelloKidney2 |
PCDH11Y | 0.321786 | 0.294592 | 0.324795 | 0.402807 | 0.410289 | 0.285618 | DamTheRiver |
GDNF | 0.327962 | 0.063461 | 0.400451 | 0.135010 | 0.443830 | 0.394280 | HelloKidney2 |
MYCN | 0.452179 | 0.389882 | 0.400562 | 0.377811 | 0.425292 | 0.391151 | ExpressForce |
DPP6 | 0.452463 | 0.239977 | 0.530162 | 0.371247 | 0.419513 | 0.471681 | studentec |
SFRP1 | 0.499159 | 0.378897 | 0.491471 | 0.353222 | 0.425376 | 0.517536 | HSIEH |
HPSE2 | 0.567236 | 0.478326 | 0.559584 | 0.434282 | 0.469360 | 0.533156 | trimericOGs |
PTGER3 | 0.596030 | 0.390865 | 0.605511 | 0.345318 | 0.606393 | 0.588081 | BioMarkers.ai |
DMRT2 | 0.621588 | 0.537606 | 0.656846 | 0.553880 | 0.629735 | 0.661270 | BioMarkers.ai |
UMOD | 0.657959 | 0.764807 | 0.649663 | 0.705116 | 0.818524 | 0.625151 | BioMarkers.ai |
KNG1 | 0.668831 | 0.417882 | 0.626413 | 0.476370 | 0.756276 | 0.685329 | BioMarkers.ai |
Add new comment