How Autism Speaks is changing the future of autism research with open science
Background
Autism refers to a broad range of neurodevelopmental differences characterized by challenges with social skills, repetitive behaviours, speech, and nonverbal communication. Autism affects an estimated 1 in 36 (3%) children in the United States. Autism Speaks is a non-profit organization that is dedicated to creating an inclusive world for all individuals with autism throughout their lifespan. It is the largest autism research organization in the United States. MSSNG (pronounced “missing”) is a groundbreaking collaboration between Autism Speaks, Verily, DNAstack, The Hospital for Sick Children (SickKids), and the research community to create the world’s largest whole genome sequencing database on autism with deep phenotyping.
Need
Autism Speaks needed a software solution to support the processing and private sharing of whole genome sequence and deep phenotype data from over ten thousand people with autism and their relatives in MSSNG. They wanted a solution that is cloud-based, GA4GH compliant, and can connect additional datasets from collaborating organizations around the globe, with the goal of creating the world’s largest federated network of data for autism research.
Solution
Autism Speaks leveraged Omics AI to harmonize processing and sharing of this data through Neuroscience AI. The solution uses Publisher to connect data, Explorer to share it, and Workbench to process it. Bioinformatics and visualization services were used to author workflows and generate interactive visualizations.
Results
Autism Speaks partnered with DNAstack to help create and share a harmonized collection of whole genome sequences and deep phenotype data collected through MSSNG. The resulting dataset is controlled access and available on Neuroscience AI, the world’s first federated network for autism research. In order to create this collection, Autism Speaks enlisted bioinformatics services to author an open source pipeline for data processing. The pipeline runs automatically through Workbench and performs read alignment, quality control, haplotype calling, and joint variant calling. Genomics and metadata are connected using Publisher, and shared into Neuroscience AI powered by Explorer. This collaboration has enabled hundreds of researchers to access one of the world’s largest genomic datasets of its kind, leading to novel insights about the biology of autism.
12K+
Samples analyzed
3.8M
Core hours to process
100+
Genes indentified
138
Papers published
300+
Researchers worldwide