File size: 6,000 Bytes
b7731cd
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
# Copyright 1999 by Jeffrey Chang.  All rights reserved.
#
# This file is part of the Biopython distribution and governed by your
# choice of the "Biopython License Agreement" or the "BSD 3-Clause License".
# Please see the LICENSE file that should have been included as part of this
# package.

"""Code to work with Medline from the NCBI.

Classes:
 - Record           A dictionary holding Medline data.

Functions:
 - read             Reads one Medline record
 - parse            Allows you to iterate over a bunch of Medline records

"""


class Record(dict):
    """A dictionary holding information from a Medline record.

    All data are stored under the mnemonic appearing in the Medline
    file. These mnemonics have the following interpretations:

    ========= ==============================
    Mnemonic  Description
    --------- ------------------------------
    AB        Abstract
    CI        Copyright Information
    AD        Affiliation
    IRAD      Investigator Affiliation
    AID       Article Identifier
    AU        Author
    FAU       Full Author
    CN        Corporate Author
    DCOM      Date Completed
    DA        Date Created
    LR        Date Last Revised
    DEP       Date of Electronic Publication
    DP        Date of Publication
    EDAT      Entrez Date
    GS        Gene Symbol
    GN        General Note
    GR        Grant Number
    IR        Investigator Name
    FIR       Full Investigator Name
    IS        ISSN
    IP        Issue
    TA        Journal Title Abbreviation
    JT        Journal Title
    LA        Language
    LID       Location Identifier
    MID       Manuscript Identifier
    MHDA      MeSH Date
    MH        MeSH Terms
    JID       NLM Unique ID
    RF        Number of References
    OAB       Other Abstract
    OCI       Other Copyright Information
    OID       Other ID
    OT        Other Term
    OTO       Other Term Owner
    OWN       Owner
    PG        Pagination
    PS        Personal Name as Subject
    FPS       Full Personal Name as Subject
    PL        Place of Publication
    PHST      Publication History Status
    PST       Publication Status
    PT        Publication Type
    PUBM      Publishing Model
    PMC       PubMed Central Identifier
    PMID      PubMed Unique Identifier
    RN        Registry Number/EC Number
    NM        Substance Name
    SI        Secondary Source ID
    SO        Source
    SFM       Space Flight Mission
    STAT      Status
    SB        Subset
    TI        Title
    TT        Transliterated Title
    VI        Volume
    CON       Comment on
    CIN       Comment in
    EIN       Erratum in
    EFR       Erratum for
    CRI       Corrected and Republished in
    CRF       Corrected and Republished from
    PRIN      Partial retraction in
    PROF      Partial retraction of
    RPI       Republished in
    RPF       Republished from
    RIN       Retraction in
    ROF       Retraction of
    UIN       Update in
    UOF       Update of
    SPIN      Summary for patients in
    ORI       Original report in
    ========= ==============================

    """


def parse(handle):
    """Read Medline records one by one from the handle.

    The handle is either is a Medline file, a file-like object, or a list
    of lines describing one or more Medline records.

    Typical usage::

        >>> from Bio import Medline
        >>> with open("Medline/pubmed_result2.txt") as handle:
        ...     records = Medline.parse(handle)
        ...     for record in records:
        ...         print(record['TI'])
        ...
        A high level interface to SCOP and ASTRAL ...
        GenomeDiagram: a python package for the visualization of ...
        Open source clustering software.
        PDB file parser and structure class implemented in Python.

    """
    # These keys point to string values
    textkeys = (
        "ID",
        "PMID",
        "SO",
        "RF",
        "NI",
        "JC",
        "TA",
        "IS",
        "CY",
        "TT",
        "CA",
        "IP",
        "VI",
        "DP",
        "YR",
        "PG",
        "LID",
        "DA",
        "LR",
        "OWN",
        "STAT",
        "DCOM",
        "PUBM",
        "DEP",
        "PL",
        "JID",
        "SB",
        "PMC",
        "EDAT",
        "MHDA",
        "PST",
        "AB",
        "EA",
        "TI",
        "JT",
    )
    handle = iter(handle)

    key = ""
    record = Record()
    for line in handle:
        line = line.rstrip()
        if line[:6] == "      ":  # continuation line
            if key in ["MH", "AD"]:
                # Multi-line MESH term, want to append to last entry in list
                record[key][-1] += line[5:]  # including space using line[5:]
            else:
                record[key].append(line[6:])
        elif line:
            key = line[:4].rstrip()
            if key not in record:
                record[key] = []
            record[key].append(line[6:])
        elif record:
            # Join each list of strings into one string.
            for key in record:
                if key in textkeys:
                    record[key] = " ".join(record[key])
            yield record
            record = Record()
    if record:  # catch last one
        for key in record:
            if key in textkeys:
                record[key] = " ".join(record[key])
        yield record


def read(handle):
    """Read a single Medline record from the handle.

    The handle is either is a Medline file, a file-like object, or a list
    of lines describing a Medline record.

    Typical usage:

        >>> from Bio import Medline
        >>> with open("Medline/pubmed_result1.txt") as handle:
        ...     record = Medline.read(handle)
        ...     print(record['TI'])
        ...
        The Bio* toolkits--a brief overview.

    """
    records = parse(handle)
    return next(records)


if __name__ == "__main__":
    from Bio._utils import run_doctest

    run_doctest()