Role of Data Science During COVID Times
ROLE OF DATA SCI ENCE DURING COV ID TIMES HENRY HARVIN EDUCATION WHAT IS DATA SCIENCE? Defi nit ion One: Data science proceeds to emerge as one of the var ious encouraging and in-demand profession pathways for exper ienced special ists. Present ly , fl our ishing data special ists understand that people need to develop beyond some universal ta lents of invest igat ing massive quant i t ies of data, data mining, and programming ski l ls . To reveal valuable knowledge for their businesses, data experts need to master the ent i re spectrum of the data science development c i rc le and maintain a level of adaptabi l i ty. Defi ni t ion Two: Data science course is an interdisc ipl inary course that ut i l izes experimental techniques, methods, a lgor i thms, and methods to obtain benefi t from data. Data scient ists consol idate a var iety of sk i l ls— including stat ist ics, network sc ience, and industry exper ience—to interpret data secured f rom the interconnect ion, smartphones, c l ients , sensors, including other references. Data science exposes courses and provides penetrat ions that manufacturers can pract ice to obtain better choices and generate more innovat ive products and services. Data is the bedrock of var iat ion, but i ts s ignifi cance or iginates f rom the knowledge data sc ient ists can discover f rom i t and then act s imultaneously. The rapid spread and global impacts of COVID- 19 can make people feel helpless and scared as the novel coronavirus escalates and forces them to change many aspects of their everyday lives. However, people can feel glimmers of hope in these uncertain times by understanding more about how data scientists are working hard to learn as much about COVID-19 as they can. DATA SCIENCE CAN GIVE ACCURATE PICTURES OF CORONAVIRUS OUTCOMES: Medical professionals and others must get correct and up-to -date information about how the coronavirus s i tuat ion changes day by day. Several organizat ions, including Johns Hopkins Universi ty, IBM, and Tableau, have re leased interact ive databases that off er real - t ime views of what’s happening with the v i rus. Many of these sources pul l f rom data provided by trusted bodies such as the U.S Centers for Disease Control and Prevent ion (CDC) and the World Health Organizat ion (WHO). They a lso inc lude direct l inks to those places so that people have quick, easy access to rel iable information. Using these databases can inform people of the number of confi rmed cases, fatal i t ies and recoveries. Then, whether a person is on the front l ines of the coronavirus fi ght or a concerned c i t izen try ing to stay informed, they can get a l l or most of the information they need in one place. DATA SCIENTISTS DEVISE A SPEEDIER WAY TO HANDLE CONTACT TRACING: Contact tracing is an effective way to slow COVID-19. It involves getting in touch with a person’s close contacts after that individual tests positive for the virus and telling them to self-isolate. Contact tracing is time-consuming, although it’s getting easier as more people take social distancing seriously. Data scientists and medical experts teamed up at Oxford University to make contact tracing even more efficient. The experts working on the project asserted that mathematical models showed them how traditional methods of contact tracing used in public health are not fast enough to thoroughly slow the spread of COVID-19. They created a mobile phone-based solution to eliminate the need for people to call the contacts manually. Instead, those parties get text messages confirming the need for self- isolation. The researchers clarify that their approach would be most effective if it gets support from national leaders and is not an effort primarily spearheaded by independent app developers.No nations are using this method yet. Given the market penetration of mobile phones and the familiarity people have with receiving texts, however, it’s easy to see why this approach makes sense. Many people with COVID-19 have only mild symptoms or none at all. Plus, the classic symptoms include a fever and a cough — two issues not restricted to the coronavirus. These things could make it easier for people to unknowingly spread the disease. But, developers EVERYONE created an app that uses data-sharing to help medical experts learn CAN PLAY A more about the virus. PART IN It’s called the COVID Symptom Tracker and already has at least 200,000 users. People can and should interact with the app even if HELPING they are asymptomatic or do not think their symptoms are COVID- SCIENTISTS 19-related. The more researchers know about the coronavirus, the FIGHT THE better equipped they are to tackle it. CORONAVIRU S People interact with the app to do a short daily symptom check-in. They also give their age and zip code, plus disclose any preexisting conditions. That information helps scientists determine the groups that are most affected or in danger. The app does not take user data for commercial purposes, but it gives it to people who are working to stop the coronavirus, including some at health organizations. DATA SCIENTISTS USE MACHINE LEARNING TO FIND POSSIBLE CURES FASTER: Besides the race to restrict the COVID-19 The team taking this approach says this spread, scientists are working as quickly as method is less costly than traditional possible to uncover effective treatments. ones, too. Humans are still part of the Two graduates of the data science program process because they have to test the at Columbia University have turned to gene sequences identified as most machine learning to help. The typical promising by the machine learning process of antibody discovery during a lab algorithm. However, using this takes years. This approach, however, takes expedited method could be crucial in only every week to screen for therapeutic efficiently finding interventions that antibodies with a high likelihood of success. work for coronavirus patients. DATA SCIENCE CAN HELP TRACK THE SPREAD: Data science specialists have also concluded that graph databases are instrumental in showing them how COVID-19 spreads. Each plan database explains the connections between personalities, situations, or objects. Scientists refer to each of those entities as a node, and the connections between them are the “edges.” The results give a visual representation of the relationship between things, if any. In the youth of the coronavirus outbreak, Chinese data scientists built a graph database tool called Epidemic Spread. It allowed people to type in identifying information associated with the journeys they took, such as a flight number or even a car’s license plate. The database would then tell those users whether anyone with a confirmed coronavirus case took those same trips and may have spread it to fellow passengers. MAKING PROGRESS AGAINST COVID-19 Knowing as much about the coronavirus as possib le wi l l save l ives. These are only some of the fascinat ing ways that data sc ient ists are using their sk i l ls to help. OPEN SOURCE DATA SCIENCE ON FIGHT COVID-19 (CORONA VIRUS): With the spread of COVID-19 becoming an ever more assert ive force in our l ives, the healthcare data science community has an opportunity to p lay an important role in the mit igat ion of th is emerging pandemic. Chronic le has g iven acknowledgment to the before-ment ioned compl icat ions vessel drast ical ly change those most harmful consequences of the before- ment ioned infect ions. Many ci t ies have imposed socia l d istancing measures, c los ing any place where large numbers of people gather, and further measures can be taken to help iso late and protect the most vulnerable among the populat ion. To do so, we must fi rst ident i fy who is at greatest r isk, which mot ivated my team to create an open-sourced project , the COVID-19 vulnerabi l i ty Index. This COVID-19 Dict ionary is an open-source, AI -based imminent design that d ist inguishes personal i t ies that do ant ic ipate to maintain a heightened vulnerabi l i ty before cr i t ical complexit ies of COVID-19. The COVID-19 Index is meant to assist hospita ls , federal / state / local publ ic health agencies, and other healthcare organizat ions in their work to spot , p lan for , answer, and reduce the impact of COVID-19 in their communit ies. In this post , we’ l l be going over the high- level detai ls of th is open- sourced project . DETAILED DESCRIPTION OF DATA SELECTION: STEP 1 Making a Labeled Data Set Data on COVID-19 hospitalizations do not yet exist. While data begins to emerge, we can look at the affected populations and events that serve as proxies for the real event. Given that the disease’s worst outcomes are concentrated on the elderly, we can focus on medicare billing data. Instead of predicting COVID-19 hospitalizations, we can instead predict proxy medical events, specifically hospitalizations due to respiratory infections. Examples include Pneumonia, Influenza, and Acute bronchitis. We identify these labels by parsing medical billing data and searching for specific ICD-10 codes that describe these types of events. All predictions are made on a specific day. From a particular day, we look back in time 15 months for features. We exclude any events happening within three months of the prediction date, due to the lag in medical claims data reporting. Any diagnoses within the last year become the features we use in all of our models. STEP 2 Models There are hosts of model considerations that need to be made with these kinds of projects. Ultimately, we wanted these models to balance being as effective as possible, and still accessible to healthcare data scientists as quickly as possible. One of the reasons for choosing the data that we used in because medicare claims data is widely available to healthcare data scientists. If your organization has access to additional data sources, you may observe performance increases by incorporating such information. Balancing those considerations led us to create 3 models based on the ease of adoption and model effectiveness. The first is a logistic regression model using a small number of features. At Closed Loop, we use the quality Python data science stack. The motivation for a very simple model is that it can be ported to environments like R or SAS without having to read or write a line of python. At low alert rates, the model performs close to parity with the more sophisticated versions of the model. The aforementioned white paper has all of the weights for the limited feature set, so it can be ported over by hand. ROC graph comparing the performance of all three models. The next two models are both made using XGBoost. XGBoost consistently gives the best performance for making predictions on well-structured data, and given the right data transformation, medical billing data has that structure. The first XGBoost model is featured in our open-sourced package. A pickled version of the model exists, so you simply need to build a data transformation pipeline that will get your billing data into the format specified in the repo. Add instruct ions or guidel ines here. You can I f you can bui ld a funct ion that wi l l parse your data for a specifi c code, then you can s imply i terate through a l l of the codes. That’s the reason we selected a l imited feature set for the open-sourced model . I t ’s very eff ect ive, whi le st i l l requir ing only a reasonable level of l i ft f rom the data p ipel ine standpoint. We’re a lso g iving healthcare organizat ions access to our model within the platform. This vers ion of the model uses fu l l d iagnosis h istory, p lus a large set of engineered features. Perceive, the ROC s l ider determines that the open-source report becomes an approximately a l ike appearance as the report inc luded in our programput in the amount of t ime a l lotted for th is . ERROR 404: DATA NOT FOUND! Have i t in the bag? Data (or i ts lack thereof) can be the biggest and most over looked chal lenge when i t comes to the adopt ion of data sc ience. Many organizat ions don’t have the necessary data to perform data science. Legacy pract ices, common examples of which include – data captured through physical forms, unstructured data, no scalable IT infrastructure in p lace to process data, and data stored in remote s i los, are the pr imary reason that some organizat ions are not even aware that the data they have is of no pract ical use. Pr ior i t iz ing data col lect ion and dig i t izat ion of data from exist ing sources is the f ront l ine solut ion to this problem. However, i t is a lso important for companies to explore new data sources whi le enhancing data access ibi l i ty for a l l key stakeholders. WHAT IS BUSINESS INTELLIGENCE – BI? Business intelligence (BI) refers to the procedural and technical support that handles, buildings, and interprets the data provided by a company’s actions. Business intelligence (BI) is a broad term that encompasses data mining, process analysis, performance benchmarking, and descriptive analytics. Business intelligence (BI) parses all the data generated by a business and presents easy-to-digest reports, performance measures, and trends that inform management decisions. Origins of Business intelligence (BI) The need for Business intelligence (BI) was derived from the concept that managers with inaccurate or incomplete information will tend, on average, to make worse decisions than if they had better information. Creators of financial models recognize this as “garbage in, garbage out.” Business intelligence (BI) attempts to solve this problem by analyzing current data that is ideally presented on a dashboard of quick metrics designed to support better decisions. Most companies can benefit from incorporating Business intelligence (BI) solutions; managers with inaccurate or incomplete information will tend, on average, to make worse decisions than if they had better information. The Growing Field To obtain helpfully, Business intelligence (BI) needs to attempt to improve the efficiency, opportunity, and significance of data. These requirements mean finding more ways to capture information that is not already being recorded, checking the information for errors, and structuring the information in a way that makes broad analysis possible. In practice, however, companies have data that is unstructured or in diverse formats that do not make for easy collection and analysis. Software firms thus provide business intelligence solutions to optimize the information gleaned from data. These are enterprise-level software administrations intended to join a company’s information including analytics. Although software solutions continue to evolve and are becoming increasingly sophisticated, there is still a need for data scientists to manage the trade-offs between speed and the depth of reporting. Some of the insights emerging from big data have companies scrambling to capture everything, but data analysts can usually filter out sources to find a selection of data points that can represent the health of a process or business area as an entire. This can reduce the necessity to capture and reformat everything for analysis, which saves analytical time and increases the reporting speed. You might have come across this cliché – COVID-19 has accelerated the shift to digital. With ongoing lockdowns and a projected recession, businesses are struggling to keep up with day-to-day operations and making tough decisions like layoffs, salary-cuts, and Capex rollbacks. While the present seems bleak and the future looks uncertain, businesses find themselves amidst an unprecedented crisis with only one thing certain: The future is digital. While some industries quickly adapted to remote work and digital INDIA: RETHINKING tools, others had to deal with multiple challenges to maintain business continuity. Business leaders have DIGITIZATION AND DATA been busy with the adaptation of new operating SCIENCE IN COVID-19 models, optimizing business processes, measuring RoI WORLD: of various spending, and gauging long-term business impact through data science and data-driven scenario simulations. GOVERNMENTS AND HEALTHCARE PROVIDERS WORLDWIDE HAVE ADOPTED DATA SCIENCE IN MITIGATING THE IMPACT OF COVID-19: This has been possible through the digital tracking of patients to monitor disease spread through epidemic forecast models to allocate healthcare resources through molecular modeling in drug and vaccine discovery and more. Access to quality data and data science experts to apply enhanced techniques is proving to be critical for faster recovery. With no travel and reduced meeting hours, it could be an apt time for CXOs to rethink their future in this changing business environment. DATA AND DATA SCIENCE ARE TWO KEY INGREDIENTS OF ANY DIGITAL OPERATING MODEL: While data science might seem l ike a luxury today amidst th is struggle for surv ival , i t could be a diff erent iat ing factor in decid ing winners of tomorrow. With a few vis ionary companies al ready ahead on the curve, a l l organizat ions must plan their d ig i tal strategy to adapt to the post-COVID normal before i t looks us in the eye. In the same context , th is art ic le d iscusses the potent ia l roadblocks in the adopt ion of data science and poss ible ways to s idestep them. Culture Conundrums The storming of the Bast i l le! Yes, th is is where we get to complain about corporate cul ture. Gut-based decis ion-making, mult i tudes of excel reports , never-ending budget forecasts, and out-of- the-wor ld sales targets! But things could be better i f a l l decis ions were backed by data and facts so that everyone could see the underly ing rat ionale whi le support ing and contr ibut ing to the decis ion- making process. Undoubtedly, commitment from the top leadership is v i ta l . A top-down mandate alone can’t ensure the wide use of data science for decis ion-making throughout the business. A bottom-up adopt ion to embed data science into the way the organizat ion thinks, decides, and acts are necessary for good results . COMING TO END: The whole world is participating in a fight against this pandemic. The healthcare data science community can have a big impact on combating this disease. There have been many excellent efforts to use data visualization and Carlo simulations to help combat the spread of this pandemic. We feel our model addresses a complementary and important aspect of health policy, identifying those most at risk. By combining the efforts of these and many other excellent efforts in the healthcare technology space, we hope to mitigate the effects of this terrible disease. If reading this article has given you ideas for ways in which you’d like to contribute, we encourage you to be locked. GET IN TOUCH https: / /www.henryharvin.com/ [email protected] +91 15266266
Comments