Skip to content

document

autocorpus.ac_bioc.document ¤

This module defines the BioCDocument class.

BioCDocument objects include a list of BioCPassage objects and provide a method to convert the document to a dictionary representation.

Classes¤

BioCDocument(id=str(), inputfile=str(), infons=dict(), passages=list(), relations=list(), annotations=list()) dataclass ¤

Bases: DataClassJsonMixin

Represents a BioC document containing passages, annotations, and relations.

Functions¤
from_xml(elem) classmethod ¤

Create a BioCDocument instance from an XML element.

Parameters:

Name Type Description Default
elem Element

An XML element representing the document.

required

Returns:

Name Type Description
BioCDocument BioCDocument

An instance of BioCDocument created from the XML element.

Source code in autocorpus/ac_bioc/document.py
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
@classmethod
def from_xml(cls, elem: ET.Element) -> BioCDocument:
    """Create a BioCDocument instance from an XML element.

    Args:
        elem (ET.Element): An XML element representing the document.

    Returns:
        BioCDocument: An instance of BioCDocument created from the XML element.
    """
    id_text = elem.findtext("id", default="")

    infons = {
        e.attrib["key"]: e.text for e in elem.findall("infon") if e.text is not None
    }

    passages = [BioCPassage.from_xml(p_elem) for p_elem in elem.findall("passage")]

    return cls(
        id=id_text,
        infons=infons,
        passages=passages,
    )
to_xml() ¤

Convert the BioCDocument instance to an XML element.

Returns:

Type Description
Element

ET.Element: An XML element representing the document.

Source code in autocorpus/ac_bioc/document.py
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
def to_xml(self) -> ET.Element:
    """Convert the BioCDocument instance to an XML element.

    Returns:
        ET.Element: An XML element representing the document.
    """
    doc_elem = ET.Element("document")

    id_elem = ET.SubElement(doc_elem, "id")
    id_elem.text = self.id

    for k, v in self.infons.items():
        infon = ET.SubElement(doc_elem, "infon", {"key": k})
        infon.text = v

    for passage in self.passages:
        doc_elem.append(passage.to_xml())

    for rel in self.relations:
        doc_elem.append(rel.to_xml())

    return doc_elem