inputs
autocorpus.inputs
¤
Module for processing the structure of the autocorpus input files.
Attributes¤
Functions¤
fill_structure(structure, key, ftype, fpath)
¤
Update the structure dict to contain the correct structure.
Takes the structure dict, if key is not present then creates new entry with default
values. It then adds fpath
to the correct ftype
.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
structure
|
structure dict |
required | |
key
|
base file name |
required | |
ftype
|
file type (main_text, linked_table) |
required | |
fpath
|
Path
|
full path to the file |
required |
Returns:
Type | Description |
---|---|
The updated structure dictionary |
Source code in autocorpus/inputs.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
|
get_file_type(file_path)
¤
Identify the type of files present in the given path.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Path
|
file path to be checked |
required |
Returns:
Type | Description |
---|---|
str
|
"directory", "main_text" or "linked_table" |
Source code in autocorpus/inputs.py
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
|
read_file_structure(file_path, target_dir)
¤
Takes in any file structure (flat or nested) and groups files.
Returns a dict of files which are all related and the paths to each related file.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path
|
Path
|
path to the file or directory |
required |
target_dir
|
Path
|
path to the target directory |
required |
Returns:
Type | Description |
---|---|
dict[str, Any]
|
Dictionary of files which are all related and the paths to each related file |
Source code in autocorpus/inputs.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 |
|