Blog Post
November 3, 2021

Catalyzing Research V2

201805 p1Rcc Team

2018 p1RCC Hackathon Teams

My name is Bill Paseman. I have stage 3a papillary RCC type 1, an incurable, terminal (but fortunately indolent) disease. In 2017, Rare kidney cancer researcher Dr. Laurence Albiges noted that Overall Survival for Papillary RCC had not increased in more than a decade (slide 5). So in 2018, I started looking into new ways of doing cancer research.

I was inspired by the work of NIH’s Ben Busby and SV.ai’s Pete Kane who ran “hackathons”. These hackathons were usually weekend events attracting young researchers and concentrating on a single topic. Teams typically include Biologists, Bioinformaticians and Computer Scientists. In 2018, Ben, Pete and I gathered seventeen teams together in San Francisco from Stanford, Harvard, UCSF, Rutgers, Clemson (which has been a great supporter) and many other institutions. Each team was given WGS (Whole Genome Sequencing) of my normal blood and my FFPE tumor tissue. Each Team’s goal was to find “Genes of Interest” that they felt were worth investigating.

202003 TRI-con Team

2018 Hackathon flow and scoring

In 2018, “Genes of Interest” were scored two ways. The first score used “known results”. That is, if the team discovered a gene on Dr. James Hsieh's curation list that was already known in the papillary kidney cancer literature, they scored! But not particularly highly since teams can find these genes just by using Google. This produced genes like NF2, MTOR and BAP1. The second score used “results overlap”. That is, if two teams working independently each recommend the same gene, they scored! The thought here is that if two paths of research lead to the same gene, it ought be investigated. This produced genes like PDE4DIP, BARD1 and FGFR1.  In addition to "Genes of Interest", this hackathon produced a paper written by Clemson's William Poehlman in conjunction with Dr. James Hsieh.

202003 TRI-con Team

2020 Hackathon Flow

In the 2020 hackathon, at the beginning of the Covid-19 pandemic, we had light turnout. Nonetheless, great foundational work was done by Clemson's Reed Bender who created a “differential expression list” from my tumor and kidney normal RNA-seq data. Differential expression takes the list of RNA-seq counts from normal and diseased kidney tissue, and subtracts them, gene by gene. This difference list is then sorted. High positive values indicate that the gene is over expressed in the tumor. High negative values indicate that the gene is under expressed in the tumor. This enabled a therapeutic recommendation (Cabozantinib) by the GeneXplain team.

Cancer ClusteringCancer Clustering

Using Siblings and Parents as Controls

Quantum Insights, who I invested in, used Clemson’s normalized RNA-seq data to position my tumor in the Galaxy of Data from TCGA. There, they discovered that my tumor clustered closest to Thyroid cancer. This was of great interest to me because unbeknownst to Quantum Insights, my sibling had been diagnosed with Thyroid cancer (and a lung carcinoid) a month before.

This raises the possibility of using familial DNA as a "control" in our next hackathon.  This approach is already used by Derya Karaarsian (here and here).  Also, in a recent Desmoid tumor hackathon,  Vanessa discovered that her father had keloids, which pushed her research in a whole new direction.

 

202003 TRI-con Team

2020 Scoring of 2018 Results

Note that Reed’s differential expression data was not available in 2018. So post hoc, I created a third scoring approach of the 2018 data which asked, where do the “genes of interest” fall on the differential expression list? I then reapplied this third scoring mechanism to the 2018 results. Note that applying this scoring approach post hoc essentially meant that differential expression functioned as a “holdout set”.

Of the seventeen 2018 teams, Rutger’s Saed Sayad knocked this third score “out of the park”. In particular, the genes he recommended investigating included:

  • KNG1 - which is related to blood coagulation, and may explain a DVT I had while therapeutic on warfarin.
  • UMOD - which is the focus of the CKD community.
  • FHL1- which is an indicator for petrochemical exposure, and may have occurred as a result of my prior jobs in refineries and drilling rigs.

“Bioada.png"

Therapeutic Options

Dr. Sayad then used his Bioada tools to suggest therapeutic options, including Valproic acid and Baicalein.

 

 

SV.ai

TRIcon

Clemson

Stanford

 

2018

2020

2022

202x

Bill

x

x

x

x

Gigi

 

 

x

x

Patient X

 

 

 

x

 

Future Work

Clemson is holding their own hackathon in March of 2022 for Gigi, who suffers from hypophosphatasia. One research track will also include a group working on my p1RCC data.  We hope to hold a hackathon at Stanford after that.  Notice the opportunity we have here for network effects. The opportunity to bring together researchers of rare diseases with their patients.  The opportunity to create a social network for social good.

Summary

I’m investigating “Patient Centered” research approaches that incorporate  “Game Elements” and “Ensemble Learning”.

“Patient Centered” means that:

  • Patients view themselves as having a “rare disease” that is not served well by cohort analysis.  We hope to use sibling and parent genetic data as a "control" in future events.
  • Patients themselves host and maintain control of the event and are responsible for providing their own data.
  • Data Control allows patients to create a current, longitudinal record over time for each subsequent hackathon as their disease develops.

“Game Elements” means that:

  • Hackathon participants are divided up into teams.
  • The Game has “levels” which include diagnosis and therapeutics.
  • Team’s results are “scored” which helps the Patient prioritize future research approaches. 

Treating Research Teams as formal computational objects lets us apply an “Ensemble Learning” technique called "bucket of models".

  • For each model m in the bucket:
  •     Do c times: (where 'c' is some constant)
  •         Randomly divide the training dataset into two datasets: A, and B.
  •         Train m with A
  •         Test m with B
  • Select the model that obtains the highest average score

We are considering other ensemble techniques, and novel ways to do clinical trials.

If you want to learn further about this work, please contact bill@rarekidneycancer.org

This is a simplified version of an earlier post.

 

Add new comment

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.