Free Medical Data

A lot has been made over "free" software---what it is, how it's different from "open source" software, the merits of copyleft v.s. non-copyleft free software, and so on.

One issue that has come to my attention recently is that many medical data sets are proprietary, and this leads to worse patient treatment options.

Here's an example. Let's say you have some sort of cancer, and there are several treatment options available (e.g. radiation therapy, chemotherapy, surgery) to try to treat the cancer. There is something called a "nomogram" where doctors take a bunch of other historical (anonymized) cases with their pre-surgery data points, the surgery option chosen, and the outcome. Based on these numbers they give you an answer like "option X has A% chance of curing you", "option Y has B% chance of curing you", etc. Here's a concrete example. Let's say you have prostate cancer which has been confirmed by measuring your blood PSA levels and have the cancer has been confirmed by a prostate biopsy test. Based on these factors, and any other relevant factors (age, weight, etc.), they're able to create what is called a nomogram. The nomogram tells you for your specific numbers what they estimate you'll be fully cured of prostate cancer (measured after 5 years) in different situations, e.g. you chose radical prostatectomy as your treatment option instead of radiation therapy.

I'm not sure of the math behind this, but I believe they use some sort of clustering algorithm to find similar patients and calculate a score based on their treatment results and how similar you were to those patients.

This is really cool, and it lets doctors choose the best treatment option to patients based on statics of thousands of previous patients. In many cases there is some treatment option that is usually best, but under various special circumstances an alternative is better; this system lets the doctor really choose the best option. For instance, in my father's case normally a radical prostatectomy would be the treatment option for prostate cancer, but based on his nomogram it was discovered that radiation therapy has a much better treatment rate.

Unfortunately, basically all of these nomogram databases are proprietary. The way it works is a hospital internally collects these numbers, and may share this data with other hospitals (I'm not sure under what IP terms). Then as a hospital you have to choose which nomogram database to use. Typically you'd be paying for such access, and the quality of the nomorgram data is based on how many data points are in that nomogram.

Fox Chase Cancer Center has a large online free nomogram database for various cancers. In addition to their own data, which is signficant, Fox Chase has a way for other hospitals to submit their own nomogram data, which increases the total information and helps doctors lead to more accurate predictions. I don't know what the data licensing terms are; presumably you cannot directly download the Fox Chase cancer nomogram data. But at least you can use their online nomogram tools for free.

This issue also recently came up when I broke my scaphoid bone (wrist fracture), and then my sister broke her shoulder (humerus fracture). There are a number of treatment techniques. Fifty years ago, for either injury they would have just put you in a cast/sling which turns out to have a pretty high long-term cure rate. Now for both injuries there are multiple treatment options---you can wear a cast and not have surgery, and if you do have surgery there are multiple surgery options. For instance, when my sister fractured her shoulder the main two surgery options were plate and screw fixations vs. inserting an intramedullary rod.

There are a bunch of studies of techniques like this that you can find at the National Center for Biotechnology Information which is part of the NIH. For instance, here's a study on intramedullary rods vs plate and screw fixation to fix humerus fractures;

However, there a number of problems with this:

It can be hard to quantitatively measure what it means to be "cured" after a surgery like a bone fracture (e.g. how much mobility do you regain, what's the recovery period, etc.) but I think these issues could be worked out. My doctor appeared to be well read on the available literature, but I would have felt a lot more confident about my surgery (and my sisters') if I knew the doctor was making his decision based on a statistical analysis of thousands of similar cases, rather than just what he would "normally does" in my case and what a couple of Elsevier articles with small data sets suggest.

The Department of Health has a lot of issues on its hand, but this is one that I think they should focus on seriously. Consider the following class of medical conditions:

In every case I belive the NIH should build an open (anonymized) database about the pre-treatment data, treatment option chosen, and the efficacy of the treatment. In some cases (say, for very rare conditions) it may not be possible to do this while observing privacy concerns, but surely we can come to a common ground where we take common medical problems (many forms of cancer, bone fractures, etc.) and then use these databases to make medical treatment decisions.

Hospitals can be made to submit such data to the NIH as a result of these treatments (in fact, I wouldn't be surprised if they already do). The NIH can enforce this by making this type of data-sharing contingent of funding to the hospitals from the NIH.

I sincerely hope that an effort like this happens in the future. It could save millions of lives, save people from unnecessary pain, and I think frames the current hot-button topic debate of "intellectual property" in a good and reasonable way.