Role of Data Science During COVID Times
ROLE OF DATA SCI
ENCE DURING COV
ID TIMES
HENRY HARVIN EDUCATION
WHAT IS DATA SCIENCE?
Defi nit ion One:
Data science proceeds to emerge as one of the var ious encouraging and in-demand profession pathways
for exper ienced special ists. Present ly , fl our ishing data special ists understand that people need to
develop beyond some universal ta lents of invest igat ing massive quant i t ies of data, data mining, and
programming ski l ls . To reveal valuable knowledge for their businesses, data experts need to master the
ent i re spectrum of the data science development c i rc le and maintain a level of adaptabi l i ty.
Defi ni t ion Two:
Data science course is an interdisc ipl inary course that ut i l izes experimental techniques, methods,
a lgor i thms, and methods to obtain benefi t from data. Data scient ists consol idate a var iety of sk i l ls—
including stat ist ics, network sc ience, and industry exper ience—to interpret data secured f rom the
interconnect ion, smartphones, c l ients , sensors, including other references. Data science exposes
courses and provides penetrat ions that manufacturers can pract ice to obtain better choices and
generate more innovat ive products and services. Data is the bedrock of var iat ion, but i ts s ignifi cance
or iginates f rom the knowledge data sc ient ists can discover f rom i t and then act s imultaneously.
The rapid spread and global impacts of COVID-
19 can make people feel helpless and scared as
the novel coronavirus escalates and forces
them to change many aspects of their everyday
lives.
However, people can feel glimmers of hope in
these uncertain times by understanding more
about how data scientists are working hard to
learn as much about COVID-19 as they can.
DATA SCIENCE CAN GIVE ACCURATE
PICTURES OF CORONAVIRUS
OUTCOMES:
Medical professionals and others must get correct and up-to -date
information about how the coronavirus s i tuat ion changes day by day.
Several organizat ions, including Johns Hopkins Universi ty, IBM, and
Tableau, have re leased interact ive databases that off er real - t ime views of
what’s happening with the v i rus.
Many of these sources pul l f rom data provided by trusted bodies such as
the U.S Centers for Disease Control and Prevent ion (CDC) and the World
Health Organizat ion (WHO). They a lso inc lude direct l inks to those places
so that people have quick, easy access to rel iable information.
Using these databases can inform people of the number of confi rmed
cases, fatal i t ies and recoveries. Then, whether a person is on the front
l ines of the coronavirus fi ght or a concerned c i t izen try ing to stay
informed, they can get a l l or most of the information they need in one
place.
DATA SCIENTISTS DEVISE A SPEEDIER WAY
TO HANDLE CONTACT TRACING:
Contact tracing is an effective way to slow COVID-19. It involves getting in touch with a
person’s close contacts after that individual tests positive for the virus and telling them to
self-isolate. Contact tracing is time-consuming, although it’s getting easier as more people
take social distancing seriously.
Data scientists and medical experts teamed up at Oxford University to make contact
tracing even more efficient. The experts working on the project asserted that
mathematical models showed them how traditional methods of contact tracing used in
public health are not fast enough to thoroughly slow the spread of COVID-19.
They created a mobile phone-based solution to eliminate the need for people to call the
contacts manually. Instead, those parties get text messages confirming the need for self-
isolation. The researchers clarify that their approach would be most effective if it gets
support from national leaders and is not an effort primarily spearheaded by independent
app developers.No nations are using this method yet. Given the market penetration of
mobile phones and the familiarity people have with receiving texts, however, it’s easy to
see why this approach makes sense.
Many people with COVID-19 have only mild symptoms or none at all.
Plus, the classic symptoms include a fever and a cough — two issues
not restricted to the coronavirus. These things could make it easier
for people to unknowingly spread the disease. But, developers
EVERYONE created an app that uses data-sharing to help medical experts learn
CAN PLAY A more about the virus.
PART IN It’s called the COVID Symptom Tracker and already has at least
200,000 users. People can and should interact with the app even if
HELPING they are asymptomatic or do not think their symptoms are COVID-
SCIENTISTS 19-related. The more researchers know about the coronavirus, the
FIGHT THE better equipped they are to tackle it.
CORONAVIRU
S People interact with the app to do a short daily symptom check-in.
They also give their age and zip code, plus disclose any preexisting
conditions. That information helps scientists determine the groups
that are most affected or in danger. The app does not take user data
for commercial purposes, but it gives it to people who are working to
stop the coronavirus, including some at health organizations.
DATA SCIENTISTS USE MACHINE LEARNING
TO FIND POSSIBLE CURES FASTER:
Besides the race to restrict the COVID-19 The team taking this approach says this
spread, scientists are working as quickly as method is less costly than traditional
possible to uncover effective treatments. ones, too. Humans are still part of the
Two graduates of the data science program process because they have to test the
at Columbia University have turned to gene sequences identified as most
machine learning to help. The typical promising by the machine learning
process of antibody discovery during a lab algorithm. However, using this
takes years. This approach, however, takes expedited method could be crucial in
only every week to screen for therapeutic efficiently finding interventions that
antibodies with a high likelihood of success. work for coronavirus patients.
DATA SCIENCE CAN HELP
TRACK THE SPREAD:
Data science specialists have also concluded that graph
databases are instrumental in showing them how
COVID-19 spreads. Each plan database explains the
connections between personalities, situations, or
objects. Scientists refer to each of those entities as a
node, and the connections between them are the
“edges.” The results give a visual representation of the
relationship between things, if any.
In the youth of the coronavirus outbreak, Chinese data
scientists built a graph database tool called Epidemic
Spread. It allowed people to type in identifying
information associated with the journeys they took,
such as a flight number or even a car’s license plate.
The database would then tell those users whether
anyone with a confirmed coronavirus case took those
same trips and may have spread it to fellow
passengers.
MAKING PROGRESS
AGAINST COVID-19
Knowing as much about the
coronavirus as possib le wi l l
save l ives. These are only some
of the fascinat ing ways that
data sc ient ists are using their
sk i l ls to help.
OPEN SOURCE DATA SCIENCE ON FIGHT
COVID-19
(CORONA VIRUS):
With the spread of COVID-19 becoming an ever more assert ive
force in our l ives, the healthcare data science community has
an opportunity to p lay an important role in the mit igat ion of
th is emerging pandemic. Chronic le has g iven acknowledgment
to the before-ment ioned compl icat ions vessel drast ical ly
change those most harmful consequences of the before-
ment ioned infect ions. Many ci t ies have imposed socia l
d istancing measures, c los ing any place where large numbers of
people gather, and further measures can be taken to help
iso late and protect the most vulnerable among the populat ion.
To do so, we must fi rst ident i fy who is at greatest r isk, which
mot ivated my team to create an open-sourced project , the
COVID-19 vulnerabi l i ty Index. This COVID-19 Dict ionary is an
open-source, AI -based imminent design that d ist inguishes
personal i t ies that do ant ic ipate to maintain a heightened
vulnerabi l i ty before cr i t ical complexit ies of COVID-19.
The COVID-19 Index is meant to assist hospita ls , federal /
state / local publ ic health agencies, and other healthcare
organizat ions in their work to spot , p lan for , answer, and
reduce the impact of COVID-19 in their communit ies. In this
post , we’ l l be going over the high- level detai ls of th is open-
sourced project .
DETAILED
DESCRIPTION OF
DATA SELECTION:
STEP 1
Making a Labeled Data Set Data on COVID-19 hospitalizations do not yet
exist. While data begins to emerge, we can look at the affected populations
and events that serve as proxies for the real event. Given that the disease’s
worst outcomes are concentrated on the elderly, we can focus on medicare
billing data. Instead of predicting COVID-19 hospitalizations, we can instead
predict proxy medical events, specifically hospitalizations due to respiratory
infections. Examples include Pneumonia, Influenza, and Acute bronchitis. We
identify these labels by parsing medical billing data and searching for
specific ICD-10 codes that describe these types of events. All predictions are
made on a specific day. From a particular day, we look back in time 15
months for features. We exclude any events happening within three months
of the prediction date, due to the lag in medical claims data reporting. Any
diagnoses within the last year become the features we use in all of our
models.
STEP 2 Models There are hosts of model considerations that need to be made with these
kinds of projects. Ultimately, we wanted these models to balance being as
effective as possible, and still accessible to healthcare data scientists as quickly
as possible. One of the reasons for choosing the data that we used in because
medicare claims data is widely available to healthcare data scientists. If your
organization has access to additional data sources, you may observe performance
increases by incorporating such information. Balancing those considerations led
us to create 3 models based on the ease of adoption and model effectiveness.
The first is a logistic regression model using a small number of features. At Closed
Loop, we use the quality Python data science stack.
The motivation for a very simple model is that it can be ported to environments
like R or SAS without having to read or write a line of python. At low alert rates,
the model performs close to parity with the more sophisticated versions of the
model. The aforementioned white paper has all of the weights for the limited
feature set, so it can be ported over by hand. ROC graph comparing the
performance of all three models. The next two models are both made using
XGBoost. XGBoost consistently gives the best performance for making predictions
on well-structured data, and given the right data transformation, medical billing
data has that structure. The first XGBoost model is featured in our open-sourced
package. A pickled version of the model exists, so you simply need to build a data
transformation pipeline that will get your billing data into the format specified in
the repo.
Add instruct ions or guidel ines here. You
can I f you can bui ld a funct ion that wi l l
parse your data for a specifi c code, then
you can s imply i terate through a l l of the
codes. That’s the reason we selected a
l imited feature set for the open-sourced
model . I t ’s very eff ect ive, whi le st i l l
requir ing only a reasonable level of l i ft
f rom the data p ipel ine standpoint. We’re
a lso g iving healthcare organizat ions
access to our model within the platform.
This vers ion of the model uses fu l l
d iagnosis h istory, p lus a large set of
engineered features. Perceive, the ROC
s l ider determines that the open-source
report becomes an approximately a l ike
appearance as the report inc luded in our
programput in the amount of t ime a l lotted
for th is .
ERROR 404: DATA NOT
FOUND!
Have i t in the bag? Data (or i ts lack thereof) can
be the biggest and most over looked chal lenge
when i t comes to the adopt ion of data sc ience.
Many organizat ions don’t have the necessary data
to perform data science. Legacy pract ices,
common examples of which include – data
captured through physical forms, unstructured
data, no scalable IT infrastructure in p lace to
process data, and data stored in remote s i los, are
the pr imary reason that some organizat ions are
not even aware that the data they have is of no
pract ical use. Pr ior i t iz ing data col lect ion and
dig i t izat ion of data from exist ing sources is the
f ront l ine solut ion to this problem. However, i t is
a lso important for companies to explore new data
sources whi le enhancing data access ibi l i ty for a l l
key stakeholders.
WHAT IS BUSINESS INTELLIGENCE – BI?
Business intelligence (BI) refers to the procedural and technical support that handles, buildings, and interprets the data
provided by a company’s actions. Business intelligence (BI) is a broad term that encompasses data mining, process
analysis, performance benchmarking, and descriptive analytics. Business intelligence (BI) parses all the data generated by
a business and presents easy-to-digest reports, performance measures, and trends that inform management decisions.
Origins of Business intelligence (BI) The need for Business intelligence (BI) was derived from the concept that managers
with inaccurate or incomplete information will tend, on average, to make worse decisions than if they had better
information. Creators of financial models recognize this as “garbage in, garbage out.” Business intelligence (BI) attempts
to solve this problem by analyzing current data that is ideally presented on a dashboard of quick metrics designed to
support better decisions. Most companies can benefit from incorporating Business intelligence (BI) solutions; managers
with inaccurate or incomplete information will tend, on average, to make worse decisions than if they had better
information. The Growing Field To obtain helpfully, Business intelligence (BI) needs to attempt to improve the efficiency,
opportunity, and significance of data. These requirements mean finding more ways to capture information that is not
already being recorded, checking the information for errors, and structuring the information in a way that makes broad
analysis possible. In practice, however, companies have data that is unstructured or in diverse formats that do not make
for easy collection and analysis. Software firms thus provide business intelligence solutions to optimize the information
gleaned from data. These are enterprise-level software administrations intended to join a company’s information including
analytics. Although software solutions continue to evolve and are becoming increasingly sophisticated, there is still a need
for data scientists to manage the trade-offs between speed and the depth of reporting. Some of the insights emerging
from big data have companies scrambling to capture everything, but data analysts can usually filter out sources to find a
selection of data points that can represent the health of a process or business area as an entire. This can reduce the
necessity to capture and reformat everything for analysis, which saves analytical time and increases the reporting speed.
You might have come across this cliché – COVID-19
has accelerated the shift to digital. With ongoing
lockdowns and a projected recession, businesses are
struggling to keep up with day-to-day operations and
making tough decisions like layoffs, salary-cuts, and
Capex rollbacks. While the present seems bleak and
the future looks uncertain, businesses find
themselves amidst an unprecedented crisis with only
one thing certain: The future is digital. While some
industries quickly adapted to remote work and digital
INDIA: RETHINKING tools, others had to deal with multiple challenges to
maintain business continuity. Business leaders have
DIGITIZATION AND DATA been busy with the adaptation of new operating
SCIENCE IN COVID-19 models, optimizing business processes, measuring RoI
WORLD: of various spending, and gauging long-term business
impact through data science and data-driven scenario
simulations.
GOVERNMENTS AND HEALTHCARE
PROVIDERS WORLDWIDE HAVE ADOPTED
DATA SCIENCE IN MITIGATING THE IMPACT
OF COVID-19:
This has been possible through the digital tracking of patients to monitor disease spread
through epidemic forecast models to allocate healthcare resources through molecular modeling
in drug and vaccine discovery and more. Access to quality data and data science experts to
apply enhanced techniques is proving to be critical for faster recovery. With no travel and
reduced meeting hours, it could be an apt time for CXOs to rethink their future in this changing
business environment.
DATA AND DATA SCIENCE ARE TWO KEY INGREDIENTS OF
ANY DIGITAL OPERATING MODEL:
While data science might seem l ike a luxury today
amidst th is struggle for surv ival , i t could be a
diff erent iat ing factor in decid ing winners of
tomorrow. With a few vis ionary companies al ready
ahead on the curve, a l l organizat ions must plan
their d ig i tal strategy to adapt to the post-COVID
normal before i t looks us in the eye. In the same
context , th is art ic le d iscusses the potent ia l
roadblocks in the adopt ion of data science and
poss ible ways to s idestep them. Culture
Conundrums The storming of the Bast i l le! Yes, th is
is where we get to complain about corporate
cul ture.
Gut-based decis ion-making, mult i tudes of excel
reports , never-ending budget forecasts, and out-of-
the-wor ld sales targets! But things could be better
i f a l l decis ions were backed by data and facts so
that everyone could see the underly ing rat ionale
whi le support ing and contr ibut ing to the decis ion-
making process. Undoubtedly, commitment from
the top leadership is v i ta l . A top-down mandate
alone can’t ensure the wide use of data science for
decis ion-making throughout the business. A
bottom-up adopt ion to embed data science into the
way the organizat ion thinks, decides, and acts are
necessary for good results .
COMING TO END:
The whole world is participating in a
fight against this pandemic. The
healthcare data science community can
have a big impact on combating this
disease. There have been many
excellent efforts to use data
visualization and Carlo simulations to
help combat the spread of this
pandemic. We feel our model addresses
a complementary and important aspect
of health policy, identifying those most
at risk. By combining the efforts of these
and many other excellent efforts in the
healthcare technology space, we hope to
mitigate the effects of this terrible
disease. If reading this article has given
you ideas for ways in which you’d like to
contribute, we encourage you to be
locked.
GET IN TOUCH
https: / /www.henryharvin.com/
[email protected]
+91 15266266
Comments