Yeah it was in the paper, what I'm curious about is how they were able to correllate names with comments/posts without an account. When I joined names were redacted unless logged in. I'm assuming this was a recent change? @AOU
>Poal
>Since there is no previous research on Poal or publicly available datasets, we implement a custom crawler to collect Poal data. To this end, we followed the methodology in [34] and implemented a DOM-tree scraper using HTTP requests and Beautiful Soup to visit Poal subverses and collect data. Our online scraper operated between July 1, 2021, and September 7, 2021. Poal shows submissions and comments made on its platform without the need for registration. Therefore, our scraper could go back to the beginning of every subverse and collect data from then on. To guarantee our dataset’s completeness, our scraper was following a list of subverses that it had to collect, and it would go through the entire history of every subverse on that list in a loop, after the list of subverses was exhausted, repeating this process constantly, until the last day of collection, September 7, 2021. This way, our scraper visited submissions that it had already collected, looking for new comments, if any.
Yea, unless they didnt do that and Only made huge leaps of 'correlation' with case insensitive user names and comments containing their list of 'key words' in the subverses. Bum badda bing - online organizing of jan 6.
They also may have created an account associated with the webscraper. If so, I'd be curious if it was at all still active. It'd be theoretically able to be identified if you cross referenced the supposed dates of scraping
Scrappers can’t collect much data before getting automatically blocked. That’s why their “study” is made up with lots of erroneous assumptions regarding Poal (check my sticky comment).
(post is archived)