Skip to content

abbreviation

autocorpus.abbreviation ¤

Handles the processing of abbreviations.

modules used: - collections: used for counting the most common occurrences - datetime: datetime stamping - pathlib: OS-agnostic pathing - regex: regular expression matching/replacing

Attributes¤

Functions¤

get_abbreviations(main_text, soup, file_path) ¤

Extract abbreviations from the input main text.

Parameters:

Name Type Description Default
main_text dict[str, Any]

Article main text data

required
soup BeautifulSoup

Article as a BeautifulSoup object

required
file_path Path

Input file path

required

Returns:

Type Description
dict[str, Any]

Abbreviations in BioC format.

Source code in autocorpus/abbreviation.py
411
412
413
414
415
416
417
418
419
420
421
422
423
424
def get_abbreviations(
    main_text: dict[str, Any], soup: BeautifulSoup, file_path: Path
) -> dict[str, Any]:
    """Extract abbreviations from the input main text.

    Args:
        main_text: Article main text data
        soup: Article as a BeautifulSoup object
        file_path: Input file path

    Returns:
        Abbreviations in BioC format.
    """
    return _biocify_abbreviations(_extract_abbreviations(main_text, soup), file_path)