Data sources in medical field
Healthcare system produces a large amount of data; even during a short-term hospitalization for an interventional procedure, an unbelievable volume of information is generated: operative reports, therapy administration, medical imaging, discharge letters, laboratory/diagnostic tests, daily annotations or other written prescriptions.
Most of them are stored in hard copy form; even so, despite the current trend toward the conversion to digital forms, the explosion of data related to health, the gathered speed they are increasingly produced, the variability of their sources, and the effort to manage them with traditional software and databases, make difficult even to store and retrieve them without losing the broader view, which means their interrelation and complexity.
Woven through these issues are those of data processing and cleansing, together with data integration. Indeed, healthcare data are often disordered, fragmented, and generated by different operators (clinicians, surgeons, nurses and administrative personnel) in legacy IT systems with incompatible formats.
Currently, producing statistically sound data in healthcare requires an intense digging among a vast array of non-integrated sources, like going over a clinical folder with a fine tooth comb, or looking for patients in internal spreadsheets, filled by physicians or clinical data manager that collect data for their own department or for a specific clinical study, and integrate them with the information available from the Institute’s IT system.
The lack of integration and the need for cleansing data from raw text to column databases are crucial challenges to improve clinical and financial outcomes and to boost research projects in the healthcare field.
In this scenario, steps forward are being made: Medtronic, one of the largest companies in the biomedical devices’ field, is developing a novel mobile application able to ingest data from the sensors of its devices for diabetic treatment (insulin pumps) to better manage patients outside of the hospital setting, and IBM has recently introduced a novel technology, Watson Health, to enhance the data-driven approach with cognitive features for the healthcare system.
Here, we present a novel platform, named Galileo, able to ingest unstructured data generated from a patient’s hospitalization (in the setting of an Electrophysiology Department dedicated to arrhythmia management), to combine and deliver them to users at a single access point.
Facing the lag between data production and data collection/processing
The rationale behind Galileo is to face the lag between data production - when a clinical report is written by the physician - and data collection/processing - when the clinical data manager or statistician reads the plain text from the clinical folders, design proper databases and performs data entry.
Galileo was designed to process data from the very source, that are the hospital’s discharge letters, laboratory and pre-operatory exams, all in .txt format (.docx or .pdf may be applicable as well) before any data collection or cleansing, retrieve data applying text analytics algorithms and provide one single application able to connect all the sources related to the patient’s hospitalization and make them accessible.
With just one click.
How Galileo has been developed
The application was developed by using IBM Watson Explorer (WEX). In particular, two WEX modules were used:
• WEX Engine: the backbone of the final application. It was used to ingest, elaborate and index plain text (discharge letters, operatory reports, preoperatory exams and Wikipedia) with Text Analytics custom converters written with AQL (Annotation Query Language) language, and further refinements with other custom converters in plain XSL-T.
• WEX AppBuilder: Galileo’s front-end, consisting of a search engine provided in a user-friendly, responsive web application allowing searches through the medical records in input, enabling the application of filters and refinements to the search results, and able to connect the pages related to a patient’s hospitalization such as registry information, procedural data, pre-operatory exams, medical therapy and information related to the disease extracted from Wikipedia.
Moreover, a simplified algorithm was implemented in Galileo allowing the user to predict the position of the arrhythmogenic substrate of a ventricular tachycardia on the patient’s heart by compiling an HTML form with few questions on the electrocardiographic recording of the arrhythmia.
Galileo’s architecture is shown in Figure 1.