Blog Post
February 24, 2021

TSPG and the 2018 p1RCC DNA Hackathon Results

Research Hackathons - a path forward when Clinical "Standard of Care" and Clinical trials run out

As I describe here, after clinical "Standard of Care"and "clinical trials" run out, research is one of the few remaining sources of hope for someone with a terminal disease.  And when little research progress is being made on their particular disease, one solution is for the patient to sponsor a "hackathon".  The hackathon process starts by having the patient gather research grade data on their condition.  Whereas "clinical grade" data is great for making decisions within the Standard of Care, "research grade" data is typically broader, more detailed and lends itself to more novel insights.  Next, Bioinformaticians, Cancer Biologists, Doctors, Data Scientists and a range of other researchers are invited to work on the data together.   One draw for them is the ability to access this "state of the art" research data.  Another draw is that the "French salon"-like atmosphere of the hackathon allows the researchers to freely discuss novel diagnostic and therapeutic theories.  This group then applies advanced algorithmic analysis to the patient's data.  The ultimate goal is to try and provide these patients with effective treatment paths that fit their unique conditions.

Research Grade Data

Beyond EHR (Electronic Health Records), the most common form of research data are various forms of DNA analysis.  DNA (the famous "double helox" described in Biology classes) carries genetic instructions describing the development, functioning, growth and reproduction of all known organisms.  DNA is typically extracted from the patient's blood and diseased organ.  DNA is especially valuable for studying a patient's genetic (inherited) diseases.  But wheras DNA represents a "Master Blueprint", carrying instructions for every process running in every organ in your body, RNA, copied from portions of the DNA, carries the "per organ" instructions.  RNA also is useful in understanding somatic (acquired after birth) diseases.  Both DNA and RNA's instructions are encoded as "genes".

May 2018 and March 2020 type 1 papillary kidney cancer hackahons

I was the patient in two separate hackathons in 2018 and 2020 for type 1 papillary kidney cancer, also known as type 1 papillary renal cell carcinoma (abbreviated p1RCC).  In 2018, 17 hackathon teams examined my DNA and reported​ "genes" they felt ought be explored further.  Interestingly, only two genes were found by more than one team: BARD1, found by 'DeeperDrugs' and 'GNOME' and 'PDE4DIP found by 'GNOME' and 'HelloKidney2'. At the 2020 hackathon, the Clemson team created a normalized (consistent) table of differential RNA expressions from 6 patients (TSPG paper). Differential refers to the difference between the number of RNA genes expressed in my tumor and normal kidney tissue. 
 
In this post, we join the 2018 Gene-Team table with the 2020 Gene-'differential expression' table and sort by 'differential expression' to see how well the 2018 teams did.  A few 2018 Hackathon genes were not in the 2020 results, namely BASP1P1, LINC00621, TTRAP, DNAJ27, PSMA, HER2, FTMT, CUL-2, LAMC-1, ARIDA1, SK3, NRF2-ARE, OR51A2, AC139425.3, MAB21L4, OR4F5, and Z95704.2.  However, most were and are shown in  "Appendix: Teams ranked by differential expression",

Note that the 2018 Teams had no access to RNA-seq data, only DNA Data. As such, the joined, sorted table helps determine which of the 17 Team's theories "held up" when expression data became available in 2020.  Of particular note are the genes most over and under expressed (located at the begining and end of the table), which were suggested by Biomarkers.ai (github, 20180520 presentation) and, to a lesser extent, studentec (github, 20180520 presentation).

Studentec 2018 email

Chr 1   FLG2 - https://www.ncbi.nlm.nih.gov/gene/388698 - Filagrin family 2. It's flagged for skin where it has very high expression, but it's also got blips in adrenal, duodenum, prostate and small intestine.  https://www.ncbi.nlm.nih.gov/gene/388698#gene-expression 

        Is this mutation oncogenic, or is high expression is a result of oncogenesis? I lean toward the latter. 

Biomarkers.ai 2018 Presentation 

KNG1 kininogen 1 Complement and coagulation cascades
PTGER3 prostaglandin E receptor 3 Calcium signaling pathway
DMRT2 doublesex and mab-3 related transcription factor 2 Sequence-specific DNA binding
UMOD uromodulin Most abundant protein in normal urine
FHL1 Four And A Half LIM Domains 1 Tumor suppressor gene on X chromosome. Points to JAK-STAT pathway

Biomarkers.ai: Argued in 2018 that

  • KNG1 uses alternative splicing to generate two different proteins: High MWt kininogen (HMWK) and MWt kininogen (LMWK). HMWK is essential for blood coagulation and assembly of the kallikrein-kinin system. This might explain my medical history.
    • 1. Got warfarin/coumadin for diagnosis of deep vein thrombosis
    • 2. DVT Symptoms returned. Went back and found: 7 cm mass left kidney, cerebral meningioma and spots in lung.
  • Uromodulin (encoded by UMOD; also known as Tamm-Horsfall protein) is the most abundant protein in mammalian urine under normal physiological conditions.
    • UMOD can distinguish Normal Tissue from p1RCC with 100% accuracy. 
    • Is UMOD also a good urine-based biomarker for p1RCC?
  • FHL1 was an indicator for petrochemical exposure.  For a time I worked in chemical refineries and on oil rigs.  This might be the source of my somatic mutation. 
    • exposure to benzopyrene and several other agents enhances FHL1 expression

Comments from 2018 and 2020 Participants 

  • Saed Sayad (Biomarker.ai Team Lead)Let me share our last finding related to Kidney cancers. Using Broad institute's DepMap data we detected that all the top 100 gene deletion (less copy number) are on chromosome 14 and all the top 100 gene insertion (more copy number) are on chromosome 5. A good question here is, can we use this phenomenon to find a highly specific drug to target only the cancer cells?
  • Dr. Alex Feltus (TSPG co-author): "No question it is a good biomarker but is a good drug target?  Maybe your kidney was sensing an “infection” and produce lots of UMOD protein?  Here is a great article on the subject."
  • Reed Bender (TSPG co-author): UMOD is indicated in all of those cases, so it would appear that this is a pattern common to a wider array of various kidney cancers and not only your sample. As Alex noted, we have no way to determine whether the changes to UMOD are a causal element of kidney disease or an after-effect of the transcriptional response to a pre-existing problem in the microenvironment. Regardless, I like your point that UMOD could be a target for diagnosis in either case if it is prevalent enough. I found this review of the UMOD locus that you might find relevant, as it seems to support the hypothesis that UMOD is a relevant factor for a variety of kidney diseases. I think you are definitely on the right path here, it does appear that UMOD is consistently reported as being relevant to the progression of kidney disease.

Discussion

Unknowngly, I have been running an ensemble 'machine learning' meta-algorithm to gain insight into my disease.  Due to the fact that the 2018 RNA-seq vendor failed to produce an RNA data set (mixture of bad luck and poor planning on my part),  I was only able to give my DNA data (no RNA) to 17 different classifiers (teams) to determine which genes they would predict to be important.  The 2020 RNA-seq vendor (Yale) did a better job and the 2020 Clemson Team was able to create a normalized set of 6 "patients" (including me) to test the 2018 predictions against.  Biomarkers.ai sorts to the top/bottom in the table, not only for my data, but also, as Reed points out, for the normalized data from the other 5 TCGA patients. Note that the "followup algorithm" I ran in 2018: "genes were found by more than one team: BARD1, found by 'DeeperDrugs' and 'GNOME' and 'PDE4DIP found by 'GNOME' and 'HelloKidney2'", seems less convincing in retrospect.  However, it IS a second prioritization algorithm I ran on the ensemble of 17 'machine learners'.

My next personal step will be to validate Biomarker.ai's results against a larger cohort (e.g. KNG1 seems overexpressed in in all p1RCC patients.  Do they have DVT problems too?) and determine therapeutic implications.  But in a wider context, more interestingly, re-imagining hackathons as frameworks for an ensemble of intelligent learners to solve medical research problems will allow future hackathon organizers to design other, more efficient meta-algorithms to provide solutions to hackathon patients.

An obvious point needs to be made explicitly here.  Though I conributed data on the front end and did do an SQL table join and sort on the back end, NONE of this is my work.  It is the work of the many volunteer researchers who spent many hours exploring my data on my behalf.  The students from Alex Feltus' Lab, Reed Bender in particular, and the venues provided by Pete Kane's RTTP deserve a special shoutout, as do the 17 different 2018 Teams from the 2018 hackathon.  Thank you.

Appendix: Teams ranked by differential expression

This table was formed by

  BP-Tumor TCGA-BQ-5884 TCGA-BQ-7051 TCGA-DZ-6131 TCGA-GL-7966 TCGA-Y8-A8RY Team
FLG2 -0.569807 -0.424620 -0.624137 -0.371452 -0.888344 -0.489152 studentec
FHL1 -0.370446 0.004442 -0.350791 -0.008073 0.132820 -0.459530 BioMarkers.ai
TAS2R19 -0.363179 -0.106670 -0.337550 -0.149153 -0.630748 -0.370376 HelloKidney2
TERT -0.358329 -0.517497 -0.324755 -0.561428 -0.979976 -0.506805 ExpressForce
ANO9 -0.300735 -0.092386 -0.150459 -0.134747 -0.033803 -0.326815 DamTheRiver
TYMS -0.287382 -0.221745 -0.385608 -0.326908 -0.464341 -0.266986 HelloKidney2
CDKN2A -0.281222 -0.210282 -0.181982 -0.353952 -0.260816 -0.331856 HSIEH
ST6GALNAC5 -0.271928 0.065584 -0.176794 0.088286 -0.036964 -0.338715 studentec
KRT81 -0.267610 -0.099378 -0.267459 -0.232783 -0.472184 -0.353074 DamTheRiver
E2F2 -0.266385 -0.258009 -0.400997 -0.443134 -0.571322 -0.224849 HIF1AIsNotAnOncogene
MET -0.248146 -0.133741 -0.325427 -0.220626 -0.264011 -0.267857 HSIEH
CDK9 -0.244613 -0.020311 -0.031391 -0.136258 -0.178657 -0.314119 DeeperDrugs
ITGAM -0.239577 -0.220240 -0.062832 -0.382688 -0.098020 -0.265496 HelloKidney
HOMER3 -0.198717 -0.357575 -0.135580 -0.363529 -0.195335 -0.203997 DamTheRiver
GRIN3B -0.190098 0.129151 -0.201077 0.097358 -0.288442 -0.178411 DamTheRiver
PLEKHO1 -0.178240 -0.111378 0.242540 -0.222314 -0.233759 -0.235705 Aizheng
AQP12B -0.169196 -0.257862 -0.106634 -0.407076 -0.793035 -0.128892 DamTheRiver
TRABD2B -0.164646 0.019348 -0.074464 -0.015357 0.062198 -0.120814 trimericOGs
FAT1 -0.164506 -0.090656 -0.256427 -0.119463 -0.100841 -0.139434 HSIEH
ACSM2A -0.156993 -0.167751 -0.184301 -0.137565 0.175250 -0.093546 DamTheRiver
MTHFR -0.152779 -0.187325 -0.152542 -0.036041 0.231739 -0.114585 HelloKidney2
HEXB -0.151017 -0.387950 -0.023140 -0.408949 -0.247012 -0.173371 DamTheRiver
SMARCB1 -0.141196 0.039332 -0.051870 -0.051003 -0.018657 -0.067500 HSIEH
ASXL1 -0.138275 0.000883 0.016468 0.025706 -0.201556 -0.033106 ExpressForce
NF2 -0.134751 -0.130819 -0.067939 -0.090925 -0.004646 -0.083543 HSIEH
NF2 -0.134751 -0.130819 -0.067939 -0.090925 -0.004646 -0.083543 ExpressForce
PALB2 -0.134495 -0.158659 -0.213922 -0.124376 -0.317513 -0.212576 ExpressForce
SCYL1 -0.128744 -0.126796 -0.014015 -0.132130 0.032286 -0.227148 GNOME
HLA-DQA1 -0.110334 -0.165424 0.095696 -0.246602 0.000223 -0.155567 DamTheRiver
PSPN -0.101657 -0.085681 -0.144439 -0.062210 -0.090204 -0.164772 HelloKidney2
TP53 -0.100091 -0.071941 -0.260981 -0.117477 -0.383892 -0.103676 HSIEH
CLEC2B -0.096949 0.002800 0.228392 -0.124560 -0.137228 -0.163133 Aizheng
CDK4 -0.089716 -0.064250 -0.054293 -0.252693 -0.493711 -0.038073 HIF1AIsNotAnOncogene
KDM5C -0.087041 -0.103329 -0.058299 -0.133834 -0.180234 0.003652 HSIEH
BAP1 -0.082421 -0.058563 -0.117987 -0.006955 0.011244 -0.078467 ExpressForce
BAP1 -0.082421 -0.058563 -0.117987 -0.006955 0.011244 -0.078467 HSIEH
MAX -0.076246 -0.027232 -0.033623 -0.170371 -0.101038 -0.020786 ExpressForce
PFKP -0.071520 -0.032153 -0.082845 -0.037480 -0.088870 -0.090990 HelloKidney2
AKR1B10 -0.069564 -0.171488 -0.119164 -0.256313 -0.319291 0.029396 Aizheng
ATM -0.069434 -0.145945 -0.118277 -0.082008 -0.033498 -0.012883 ExpressForce
PABPC1 -0.069175 -0.086619 -0.098706 -0.245232 -0.475914 0.003399 GNOME
PLEKHO2 -0.059689 -0.263049 0.253779 -0.382753 0.065914 -0.024072 Aizheng
CYP4F11 -0.047624 -0.020944 -0.099524 -0.031584 -0.206356 -0.026298 Aizheng
ANAPC1 -0.034538 -0.183558 -0.079436 -0.106482 -0.449320 0.017958 GNOME
VHL -0.030325 -0.057974 -0.134539 0.027279 -0.224792 0.025418 HSIEH
SCAP -0.025325 -0.044531 -0.118038 0.058991 0.018731 -0.048174 DamTheRiver
AMPD2 -0.019986 0.045026 0.023201 0.024956 -0.209049 -0.010809 studentec
AHNAK -0.019587 0.018133 -0.051411 0.046804 0.039939 -0.018200 GNOME
FGFR1 -0.012715 0.039134 -0.112655 -0.096765 -0.117174 0.146514 HIF1AIsNotAnOncogene
FGFR1 -0.012715 0.039134 -0.112655 -0.096765 -0.117174 0.146514 ExpressForce
ABL1 -0.012398 -0.235752 -0.076963 -0.240885 -0.140027 0.126637 codeomics
RBMX -0.010541 0.037161 -0.078827 -0.079234 -0.232528 0.066623 GNOME
BRAF -0.007729 -0.089601 -0.135295 -0.129156 -0.089351 0.005839 ExpressForce
STAT5B -0.007008 0.037958 -0.119251 0.023475 0.134777 -0.058842 HIF1AIsNotAnOncogene
RASAL1 -0.006677 0.137588 0.056787 0.129254 0.120261 -0.005573 DamTheRiver
SETD2 0.028468 -0.014335 -0.134763 0.221765 0.005985 0.088818 HSIEH
SETD2 0.028468 -0.014335 -0.134763 0.221765 0.005985 0.088818 ExpressForce
HLA-DRB5 0.042813 -0.050248 0.212950 -0.048054 0.096399 0.015600 DamTheRiver
KDM6A 0.043495 -0.022210 0.026036 0.008206 -0.060767 0.093137 HSIEH
KDM6A 0.043495 -0.022210 0.026036 0.008206 -0.060767 0.093137 ExpressForce
TUBB8 0.044153 -0.116301 0.046298 -0.210237 0.032830 0.017188 KidneyBean
BARD1 0.046575 0.004801 -0.046508 -0.036159 -0.298620 0.055909 GNOME
BARD1 0.046575 0.004801 -0.046508 -0.036159 -0.298620 0.055909 DeeperDrugs
APOB 0.048625 -0.014961 0.020182 -0.076627 0.133317 0.021670 DeeperDrugs
RPS4Y1 0.051268 0.428156 0.103052 0.370285 0.741120 -0.051931 Aizheng
TNFSF4 0.053382 -0.000859 0.016590 -0.070763 -0.046204 0.008195 HelloKidney
DKK1 0.087564 0.103666 0.265039 0.033729 0.125600 0.147935 HSIEH
MSH2 0.092324 -0.082261 -0.071915 -0.030180 -0.189262 0.210167 HIF1AIsNotAnOncogene
ARID1A 0.092712 0.028999 -0.090535 0.019229 0.081958 0.172678 ExpressForce
PARP1 0.103807 -0.162385 0.009363 -0.184518 -0.151341 0.145763 HIF1AIsNotAnOncogene
BCLAF1 0.104503 0.006003 0.012686 0.046388 -0.054703 0.188558 GNOME
ZNF595 0.104678 -0.013910 0.091758 -0.030920 -0.231314 0.133547 DamTheRiver
MTOR 0.115340 0.019068 0.012066 0.245174 0.276105 0.198810 codeomics
MTOR 0.115340 0.019068 0.012066 0.245174 0.276105 0.198810 HSIEH
HIVEP3 0.115363 0.117935 0.163794 0.052332 -0.292806 0.160574 DamTheRiver
PBRM1 0.122078 -0.185186 -0.065475 -0.020189 0.059176 0.171033 ExpressForce
PBRM1 0.122078 -0.185186 -0.065475 -0.020189 0.059176 0.171033 HSIEH
PIK3CA 0.124403 -0.111106 -0.026734 -0.029699 -0.082421 0.179556 HSIEH
PIK3CA 0.124403 -0.111106 -0.026734 -0.029699 -0.082421 0.179556 codeomics
PTEN 0.125224 -0.081410 0.075240 -0.038048 -0.064603 0.100576 HSIEH
NFE2L2 0.127931 0.064057 0.012326 -0.003859 0.013883 0.261751 HSIEH
EP300 0.128085 -0.146216 -0.063585 -0.067288 0.047421 0.145411 ExpressForce
STARD13 0.143107 0.111454 0.084183 0.055009 0.090562 0.178444 GNOME
IQSEC3 0.158512 -0.066876 0.292704 0.059387 0.206981 0.106501 DamTheRiver
BCOR 0.161863 0.100278 0.122183 0.040666 0.163492 0.145450 ExpressForce
PDE4DIP 0.186372 -0.049034 0.168757 0.047555 0.264347 0.108051 GNOME
PDE4DIP 0.186372 -0.049034 0.168757 0.047555 0.264347 0.108051 HelloKidney2
RET 0.187033 -0.002475 0.209691 -0.002045 0.105982 0.296697 HelloKidney2
STAG2 0.191113 0.032079 0.041945 0.093482 0.041825 0.225187 HSIEH
AGBL4 0.220241 0.162339 0.071565 0.204783 0.403928 0.291984 trimericOGs
MAPK8 0.225272 -0.030687 0.062145 0.122059 0.139782 0.305888 HIF1AIsNotAnOncogene
FOLH1 0.277281 0.130650 0.326914 0.139867 0.003966 0.318961 HelloKidney2
PCDH11Y 0.321786 0.294592 0.324795 0.402807 0.410289 0.285618 DamTheRiver
GDNF 0.327962 0.063461 0.400451 0.135010 0.443830 0.394280 HelloKidney2
MYCN 0.452179 0.389882 0.400562 0.377811 0.425292 0.391151 ExpressForce
DPP6 0.452463 0.239977 0.530162 0.371247 0.419513 0.471681 studentec
SFRP1 0.499159 0.378897 0.491471 0.353222 0.425376 0.517536 HSIEH
HPSE2 0.567236 0.478326 0.559584 0.434282 0.469360 0.533156 trimericOGs
PTGER3 0.596030 0.390865 0.605511 0.345318 0.606393 0.588081 BioMarkers.ai
DMRT2 0.621588 0.537606 0.656846 0.553880 0.629735 0.661270 BioMarkers.ai
UMOD 0.657959 0.764807 0.649663 0.705116 0.818524 0.625151 BioMarkers.ai
KNG1 0.668831 0.417882 0.626413 0.476370 0.756276 0.685329 BioMarkers.ai

Add new comment