Document Type

Article

Publication Date

1-1-2008

Published In

Journal Of the American Medical Informatics Association

Abstract

As part of the 2006 i2b2 NLP Shared Task, we explored two methods for determining the smoking status of patients from their hospital discharge summaries when explicit smoking terms were present and when those same terms were removed. We developed a simple keyword-based classifier to determine smoking status from de-identified hospital discharge summaries. We then developed a Naïve Bayes classifier to determine smoking status from the same records after all smoking-related words had been manually removed (the smoke-blind dataset). The performance of the Naïve Bayes classifier was compared with the performance of three human annotators on a subset of the same training dataset (n = 54) and against the evaluation dataset (n = 104 records). The rule-based classifier was able to accurately extract smoking status from hospital discharge summaries when they contained explicit smoking words. On the smoke-blind dataset, where explicit smoking cues are not available, two Naïve Bayes systems performed less well than the rule-based classifier, but similarly to three expert human annotators.

Comments

This work is a preprint that has been provided to PubMed Central courtesy of Oxford University Press and the American Medical Informatics Association (AMIA).

Recommended Citation

Richard H. Wicentowski and M. R. Sydes. (2008). "Using Implicit Information To Identify Smoking Status In Smoke-Blind Medical Discharge Summaries". Journal Of the American Medical Informatics Association. Volume 15, Issue 1. 29-31. DOI: 10.1197/jamia.M2440
https://works.swarthmore.edu/fac-comp-sci/15

fac-comp-sci-15_accessible.docx (26 kB)
Accessible document [Word]

Download

Additional Files

fac-comp-sci-15_accessible.docx (26 kB)
Accessible document [Word]

Find in your library

Included in

Computer Sciences Commons

COinS

An accessible version of this publication has been made available courtesy of Swarthmore College Libraries. For further accessibility assistance, please contact openaccess@swarthmore.edu with the title or URL of any relevant works.

Computer Science Faculty Works

Using Implicit Information To Identify Smoking Status In Smoke-Blind Medical Discharge Summaries

Document Type

Publication Date

Published In

Abstract

Comments

Recommended Citation

Additional Files

Included in

Search

Browse

Resources

Computer Science Faculty Works

Using Implicit Information To Identify Smoking Status In Smoke-Blind Medical Discharge Summaries

Authors

Document Type

Publication Date

Published In

Abstract

Comments

Recommended Citation

Additional Files

Included in

Share

Search

Browse

Resources