Minall
To facilitate the project's use as a Python library and as a CLI tool, minall
's workflow is managed via an exportable class, Minall
.
By creating a class instance of Minall
, the following preliminary steps are taken:
- API credentials for the
minet
clients are parsed. (param:config
) - File paths to the workflow's eventual output, CSV files for the target URLs (
links.csv
) and their shared content (shared_content.csv
), are prepared. This includes the creation of any necessary parent directories. (param:out_dir
) - The SQLite database connection is created. The connection can either be in-memory or, if a file path is provided, to an embedded SQLite database. If the user wants
Minall
to create and store the workflow's results in an SQLite database file, simply providing a file path will also create the file. (param:database
) - Through the SQLite connection, SQL tables are created for the user-provided data files. A file of target URLs is necessary, whose data will be parsed and inserted into the 'links' SQL table. (param:
links_file
,url_col
,shared_content_file
) - The class instance remembers whether to (a) deploy all of the
minall
enrichment workflow or (b) only collect the generalized Buzzsumo metadata. (param:buzzsumo_only
)
minall.main
Minall enrichment workflow.
With the class Minall
, this module manages the entire workflow.
The class contains the following methods:
__init__(database, config, output_dir, links_file, url_col, shared_content_file, buzzsumo_only)
- Intialize SQLite database and out-file paths.collect_and_coalesce()
- Collect new data and coalesce with existing data in relevant SQL tables.export()
- Write enriched SQL tables to CSV out-files.
Minall
Class to store variables and execute steps of enrichment.
Source code in minall/main.py
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 |
|
__init__(database, config, output_dir, links_file, url_col, shared_content_file=None, buzzsumo_only=False)
Intialize SQLite database and out-file paths.
Examples:
>>> # Set file path variables.
>>> OUT_DIR = Path(__file__).parent.parent.joinpath("docs").joinpath("doctest")
>>> LINKS_FILE = OUT_DIR.joinpath('minall_init_example.csv')
>>>
>>> # Create Minall instance.
>>> minall = Minall(database=None, config={}, output_dir=str(OUT_DIR), links_file=str(LINKS_FILE), url_col='target_url')
>>> minall.links_table.table
LinksConstants(table_name='links', primary_key='url')
>>>
>>> # Check that Minall's SQLite database connection has committed 1 change (creating the 'links' table).
>>> minall.connection.total_changes
1
Parameters:
Name | Type | Description | Default |
---|---|---|---|
database |
str | None
|
Path name to SQLite database. If None, creates database in memory. |
required |
config |
str | dict | None
|
Credentials for API keys. |
required |
output_dir |
Path | str
|
Path to directory for enriched CSV files. |
required |
links_file |
Path | str
|
Path to in-file for URLs. |
required |
url_col |
str
|
Name of URL column in URLs file. |
required |
shared_content_file |
str | None
|
Path name to CSV file of shared content related to URLs. |
None
|
buzzsumo_only |
bool
|
Whether to only run Buzzsumo enrichment. Defaults to False. |
False
|
Source code in minall/main.py
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 |
|
collect_and_coalesce()
Collect new data and coalesce with existing data in relevant SQL tables.
This method creates an instance of the class Enrichment
(from minall.enrichment.enrichment
), providing the target URL table ('links' table, self.links_table
), the minet API credentials (self.keys
), and the related shared content table (self.shared_content_table
) which may go unused depending on parameters used when Enrichment
is called.
Having prepared the Enrichment
instance, the method then calls the class, providing its self.buzzsumo_only
instance attribute as the argument for Enrichment
's buzzsumo_only
parameter. The latter boolean parameter determines whether all of the Enrichment
class's methods will be deployed or only its Buzzsumo method.
Source code in minall/main.py
99 100 101 102 103 104 105 106 107 108 109 110 111 |
|
export()
Write enriched SQL tables to CSV out-files.
This method simply exports to CSV files both of the Minall
class instance's SQL tables, self.links_table
and self.shared_content_table
. The class that manages the SQL tables (minall.tables.base.BaseTable
), stores each table's out-file path as an instance variable. The parent directory for both out-files was declared during Minall
's __init__()
method via the parameter output_dir
, from which the out-file paths were subsequently derived.
Returns:
Type | Description |
---|---|
Tuple[Path, Path]
|
Tuple[Path, Path]: Paths to links and shared content CSV files. |
Source code in minall/main.py
113 114 115 116 117 118 119 120 121 122 123 |
|