How-To Guides
Using minall from the command line
Context
Let's say we have a set of 2 URLs, one is a web page and the other is a YouTube video.
data.csv
| target_url |
|---|
| https://archive.fosdem.org/2020/schedule/event/open_research_web_mining/ |
| https://www.youtube.com/watch?v=BTvfWbAjh1w |
And we've stored the data file in our current working directory under the name data.csv.
.
│
└── data.csv
Set up files
For the purposes of this demonstration, we'll need to create an additional file, config.yml.
.
│
├── config.yml
│
└── data.csv
This YAML file is a configuration file in the same format as that used in minet, and it contains API keys.
config.yml
---
buzzsumo:
token: "XXXX" # Used as --token for `minet bz` commands
crowdtangle:
token: "XXXX" # Used as --token for `minet ct` commands
rate_limit: 50 # Used as --rate-limit for `minet ct` commands
youtube:
key: "XXXX" # Used as --key for `minet yt` commands
Set up minall
Now we'll install minall. Create and activate a virtual Python environment, using version 3.11. Then install the tool with pip.
$ pip install git+https://github.com/medialab/minall.git
Run
Finally, let's run the workflow on our dataset. We'll set the parameters as follows:
--config: Path to the YAML configuration file,./config.yml--links: Path to the dataset of target URLs,./data.csv--url-col: Name of the column in the dataset file with the target URLs.target_url--output-dir: Path to whereminallwill export its results../output/
$ minall --config config.yml --output-dir output/ --links data.csv --url-col target_url
While minall is running, we'll see the following commands proceed:
Querying YouTube videos 1/1 0:00:00
Querying YouTube channels 1/1 0:00:00
Scraping webpage 1/1 0:00:00
Calling Buzzsumo API 2/2 0:00:05
Results
The results will be written to files in the directory whose path was given to the parameter --output-dir.
.
│
├── config.yml
│
├── data.csv
│
└── output/
├── links.csv
└── shared_content.csv