Uploaded on Mar 4, 2024
Biocuration is the process of collecting, organizing, and annotating biological data from various sources to make it accessible and understandable for further analysis by researchers. https://www.elucidata.io/book-a-demo
Biocuration- Off-page
Biocuration: Breaking
Barriers in the Use of
Biomedical Data
Data is Available, But Not Usable
Life Sciences R&D relies on 2 trillion GB data generated every year.
Public data and databases are
available, not usable
In-house data, often caught in
team level silos has low
interoperability and
reusability
Drug Discovery Initiatives Need
High Quality Data "The value is in the data, it is not
in the tools. That is the one thing,
it’s a bit of a hobby horse for me.
One thing I always point to in
these discussions around data,
don’t underestimate the amount
of time and value in doing what
is really often difficult and not so
Public Data rewarding directly work, like
(scRNA-Seq)
cleaning data sets isn’t always
fun, but it is often the most
In-house valuable thing you can do."
Experiments
(scRNA-Seq)
-Dr. Jeffrey Reid,
Metadata
Files Regeneron's Chief Data Officer
(csv,txt)
Getting to This High Quality Data Pool is Not Trivial
80% of Time 20% of Time
Source and prepare high quality data Analysis
QC and Link Data and
Determine Download Process Raw Ready for
Curate Metadata
relevance Data Files Data Analysis
Metadata files
30 mins per 60 mins per 8 hours per 8-16 hours per 1-2 hours per
dataset dataset dataset dataset dataset
A scientist can spend anywhere from days to weeks per month in
getting their data ready for analysis.
At Elucidata we’re flipping this 80-20 ratio by building
technology to harmonize biomedical data and make them
ML-Ready
Elucidata’s Biocuration Platform- Polly
Making Semi-structured Biomedical Data ML-Ready
Data Polly
Sources
Polly Harmonization Engine
Harmonized Data Stored
on your Atlas
Biocuration Workflow
Automated
Data Metadata Manual Stream
Acquisition Curation using Curation of Harmonized
LLMs Custom Fields Data
Data collection is Model assisted Elucidata’s Data
Human in the Loop
simplified via API or curation leads to Model is data type
allows for additional
GUI upload to higher throughput, agnostic; Data from
custom curation and
curation the ability to scale to disparate sources is
extensive QC checks
infrastructure hundreds of curators made interoperable
Impact: Millions of Datasets Harmonized in the Past 5 Years!
Harmonization with Polly Applications Powered
● 99% accurate, customizable & 10X faster than
● Patient Stratification
industry standard
●
● Biomarker DiscoveryMulti-Omics, Bioassays supported
●
● Target Discovery, Validation & QualificationData delivered on a 360 Degree Platform (Polly),
complete with APIs
● Data Management
● Allows public and in-house data integration
● Knowledge Graphs
● On-going support for evolving data needs (Data
● Training Models
Concierge)
Reach out to us at [email protected] or
Book a Demo with us to learn more.
Comments