Project TechLord: Doxxing and Malicious Traffic

chants · January 21, 2024, 9:14pm

An interest new task idea to detect doxxing as well as more broadly pages whose intended purpose is to only drive traffic to its malicious content, e.g. harassment, etc. I was surprised we didnt have a similar task already. Thoughts?

maik_froebe · January 21, 2024, 10:17pm

Hi,

Indeed, this sounds interesting to me.
I am not aware of such a task, but it would definitively be very valuable in the open web search context.

Last year, Mohammed Al-Maamari (cc @mohammed.al-maamari ) and Michael Dinzinger (cc @michaeld ) organized a related task on web spam classification which is somewhat related: https://www.tira.io/task-overview/webpage-classification

It might be that building the dataset could be the main challenge, or?

Best regards,

Maik

chessgod101 · January 27, 2024, 7:25pm

Pardon me for posting this off-topic.
It’s best to ignore this user “chants”.
Seems he has nothing better to do than rake up conflicts wherever he goes. The thread he opened here was not made in good faith.

He starts this thread at our forum to fuel conflict:

For more context please see these posts:
https://archive.is/AA05B
hxxps://archive.is/a44b3

More details on this “chants” guy:
hxxps://exetools.live/?p=35
He should really not be doing this.
I hope to ban this chants guy at the exetools forum soon but unfortunately we need the funds he gets us and so we delay…

chants · January 27, 2024, 9:22pm

Indeed so actually the website provided by this drama seeking user looks like the perfect type of training data but we need more examples. I dont know anything about the drama being mentioned, perhaps there is some high profile interest in this topic like government agencies or such. If they are the moderator of a large forum which is doubtful, they can make a post on that forum confirming it, otherwise it appears someone brought a malicious actor into the midst.

I have been currently trying to utilize contacts at some large social media companies to get properly human labeled training data, but the response has been mostly that due to privacy issues and legal reasons, it is too much trouble to sanitize the data. Likely it would require some large grants to make it worth their while. Services like Twitter/X, facebook and even YouTube are without a doubt the premium in terms of training data, but also not practical to obtain. Spam tends to be a bit different though the models would likely transfer well.

chessgod101 · January 28, 2024, 12:43pm

I am the admin of the EXETOOLS forum.

We seriously regret having given that “chants” guy a VIP rank: we were forced to do that, given that he was paying for our server.

It is worth noting that he was arrested multiple times for selling drugs in our forum.
This is his own post in the forum (archived for posterity) after he was arrested by the FBI when caught red-handed attempting to sell drugs to a minor:
(SCROLL to the last post): https://archive.is/gG8q4

More details about him in this investigative journalism blog: https://exetools.live/?p=83

I am just trying to cast some light on this “chants” guy’s background and his intentions.

chants · January 28, 2024, 3:42pm

We might use the Instagram dataset mentioned in Identification of cyber harassment and intention of target users on social media platforms
S. Abarna,a,⁎ J.I. Sheeba,a S. Jayasrilakshmi,a and S. Pradeep Devaneyanb

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9364757/

cyberbullying (30%) , trolling (25% ), sexual harassment (23% ), doxing (19% ), cyberstalking (16% ), online impersonation (15% ), revenge porn (10%), cyber defamation (8%), hacking (8%), and message bombing (6%)

Interesting enough the cyberharasser we have here is basically checking most of the category boxes. My hypothesis is that serial cyberharassers tend to engage in all manners and methods (except ones requiring physical world connection) which might make identifying serial cyberharassers possible. Although generally unmasking and pursueing them would require legal action by law enforcement or through lawsuits.

So the Chelmis and Yao dataset should suffice.