The following post describes the specific architectural and software design issues in our EBM demo, framed in the scenario described previously in Evidence Based Medicine Made Dead Easy – Part I. We hope health IT people out there will enjoy this small demo.
Requirements and Architecture
The described system may sit alongside the EHR system. Of course, it will need some integration in the EHR interface so that the doctor does not need to start yet another application. It will further need to understand the data model used in the EHR, to be able to allow the health professional to select the patient info to include in the tests. Finally, it must also provide libraries for numerical analysis, also including parallel, concurrent and distributed capabilities to some extent. A preliminary architecture is shown below.
We have decided to use Python as the programming language in this demo as it is suitable both for prototyping and for deployment in a web server like Django, many connector modules for different databases are available, is relatively easy to use for newcomers, and also includes libraries for numerical calculus and distributed processing. Note that we have not addressed the messaging interface (the dotted box in the architecture).
In the following sections, we will describe the components included in the architecture. We will not, however, address the integration with openEHR’s UI, as it is not the focus of these posts.
First of all, we need a way to obtain the patient data from the EHR. In our case, we chose PatientOS as explained in our previous post. In this aspect, the boon and curse of PatientOS is its flexibility. All forms, records and record items are not specified in code, but rather described in rows of different tables and the relations among them. So, we will have to find out which record item stores the information we want.
In our case, let’s assume we store the weight records as procedures in the patient’s medical history. The procedure we will use is 2001F “WEIGHT RECORDED” in CPT, with a freeform string value of “XX kg”, where XX is the weight in kilograms. It seemed a natural place to put this data; however, doctors might decide to use less structured data records to store the visit information. In this case we could think about using natural language processing techniques using e.g. NLTK.
After some research, we can obtain the query that gets the procedures we want for the user we want:
select p.first_name, p.last_name, f.title, fr.value_string, t.term_name, t.description from refs, forms f, patients p, form_records fr, terms t where p.last_name='Cave' and fr.patient_id=p.patient_id and fr.form_id=f.form_id and refs.ref_id=fr.record_item_ref_id and refs.ref_key='PROCEDUREDESCRIPTION' and t.abbreviation='2001F WEIGHT RECORD' and fr.term_id=t.term_id order by fr.record_dt;
import psycopg2 conn = psycopg2.connect("dbname='patientos_db' user='patientos_user' host='localhost' password='patientos_user'") cur = conn.cursor() cur.execute("""select p.first_name, p.last_name, f.title, fr.value_string, t.term_name, t.description from refs, forms f, patients p, form_records fr, terms t where p.last_name='Cave' and fr.patient_id=p.patient_id and fr.form_id=f.form_id and refs.ref_id=fr.record_item_ref_id and refs.ref_key='PROCEDUREDESCRIPTION' and t.abbreviation='2001F WEIGHT RECORD' and fr.term_id=t.term_id order by fr.record_dt;""") patients = cur.fetchall() conn.commit weights =  for patient in patients: weight = int(patient.split()) weights.append(weight) print "Weight: ",weight print weights
Once we have the data in place, we could use any statistical program to analyze them, such as SAS, SPSS, R, Matlab or, in our case, Scipy. Scipy is a Python package that includes many numerical methods. It even allows us to use optimized libraries like ATLAS, BLAS and LAPACK, or distribute our calculations with MPI (e.g. using mpi4Py).
Let’s suppose we have retrieved the weight of two groups of 30 patients at two points in time, before and after the prescription of a drug, which may alter the patient’s weight. One of the groups will be a control group that will not take it, but otherwise (and ideally) have all other independent variables controlled. We will simulate both groups using the following piece of code, which generates two populations with different means for the difference of pre-and post-drug conditions:
import numpy.random randn = numpy.random.randn floor = numpy.floor ttest = scipy.stats.ttest_ind group1_pre = floor(randn(30,1)*10 + 65) group2_pre = floor(randn(30,1)*10 + 65) print group1_pre.T print group2_pre.T group1_post = floor(randn(30,1)*3 + group1_pre) group2_post = floor(randn(30,1)*3 + group2_pre*0.95) print group1_post.T print group2_post.T group1_diff = group1_post - group1_pre group2_diff = group2_post - group2_pre
Now we will just apply a Student t-test to both populations, and check whether both groups have a different distribution:
(t, p) = ttest(group1_diff, group2_diff) # Remember we have aliased ttest before... print (t, p)
As simple as this. Our obtained p is 0.00386893, so it passes a 1% test.
There are many more statistical tests we can use in Scipy: ANOVA, Kolmogorov-Smirnoff… But, furthermore, there are many Python packages that can provide other data mining methods, like Orange or SciKit-learn. Even if you want to write your own, as we may want our statistician to do, Python is as good a language as, say, Matlab, for a quick implementation (and able to run fast too if you use an optimized numerical library).