autocorpus.pdf
¤
Functionality for processing PDF files.
Attributes¤
Classes¤
Functions¤
extract_pdf_content(file_path)
¤
Extracts content from a PDF file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Path
|
Path to the PDF file. |
required |
Returns:
Type | Description |
---|---|
BioCCollection
|
A tuple of BioCTextConverter and BioCTableConverter objects containing |
BioCTableCollection
|
the extracted text and tables. |
Raises:
Type | Description |
---|---|
RuntimeError
|
If the PDF converter is not initialized. |
Source code in autocorpus/pdf.py
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|