Justin Aronson is the founder of variantexplorer.org, a data democratization advocate, and a junior in high school student. Justin believes strongly that decisions made by the stewards of data today will profoundly affect the evolution of society as a whole. He advocates for individuals, organizations, and corporations to take courageous stands to open data, thereby empowering his generation to begin attacking the tremendous challenges they will face. Justin himself has been the beneficiary of data altruism. The VariantExplorer.com website is entirely based on genetic variant interpretation data donated by laboratories around the world to ClinVar, who openly publishes the data. VariantExplorer displays the frequency of genetic variant classification conflicts across laboratories. Justin is now exploring how these data can be repackaged in a way that make it easier to incorporate genetic interpretation data into machine learning projects.
Machine learning, for good or ill, has the potential to fundamentally alter nearly every aspect of my generation’s lives as we move forward. The question is, what can be done to influence the way machine learning enters our lives to make sure the effects are as positive as possible? For my generation, getting access to the resources necessary to experiment with machine learning is a challenging task. Much of the world’s data is locked up in data silos that only large organizations can use. However, when data is made available, new possibilities emerge.
I will discuss how NIH’s ClinVar repository has paved the way for Variantexplorer.org. While paying only hosting fees (and having a great deal of support and advice from many generous people), I was able to construct a site that summaries cross laboratory genetic variant classification conflicts. In addition to describing how the site works, I will share the challenges I faced constructing it and share my views on steps that can be taken to enhance the usability of data that has already been published. In particular, I will provide an update on my progress in repurposing the ClinVar data file in a manner intended to facilitate machine learning. I greatly appreciate this opportunity to address the HAS19 audience and share my views on how this audience could fundamentally empower my generation, as we come of age, to profoundly improve healthcare for all.