View Full Version : Forensic linguistics identifying users on the internet
djr33
01-09-2013, 11:10 AM
This might interest some of you here--
http://www.scmagazine.com.au/News/328135,linguistics-identifies-anonymous-users.aspx
(I'm a bit skeptical with my background in linguistics; I think it's somewhat possible, but it's very far from being practical or accurate-- I doubt they'll be able to make that something that could hold up in court, at least for a very long time.)
keyboard
01-09-2013, 12:01 PM
The most amusing bit of that article -
Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated.
But that's a very interesting article... It's surprising how much they can track you just by the way you speak...
djr33
01-09-2013, 12:35 PM
That part was amusing, but part of why I'm skeptical. What they mean is that they haven't done anything aside from analyzing English. There's no reason they couldn't analyze any other language (including leetspeak). In fact, you'd get a lot more information from looking at things beyond standard English.
molendijk
01-09-2013, 09:26 PM
I think this linguistic hope is unrealistic. It's quite possible to identify groups (professional, social etc.) on the basis of linguistic criteria or peculiarities, but identifying individuals seems impossible to me. Except in certain rare cases. For instance, the rules of Dutch grammar present in the head of the legendary Dutch former soccer player Johan Cruijff are so unique (idiosyncratic) that anyone reading a written text of what he says will recognize him.
Linguists have been too optimistic on other occasions. In the sixties and seventies of the past century, they thought it would soon be possible to build translation machines endorsed with artificial intelligence they thought would be almost equal to human intelligence. The Google Translation Machine proves how sadly they were wrong. They were wrong from the start because they forgot too easily that more than 30%, 40% or maybe 50% of human linguistic interaction is ruled by what humans know 'about the world', not by linguistic rules. If I say The next day, John went to the railway station and he bought a paper then the person to whom I'm speaking will conclude that I mean that John bought the paper at the station, not afterwards. But if I say The next day, John went to see his mother and he bought a paper, nothing indicates when he bought the paper. He may even have bought the paper before going to his mother (=The next day, John went to see his mother and he also bought a paper). (This has nothing to do with 'vagueness' attached to the Simple Past of English. In French, the situation would not be different, despite the so-called 'preciseness' of the passé simple).
So the interpretations of certain linguistic utterances, especially sequences of sentences, may have less to do with grammar and the like than with knowledge of the world. And our knowledge of the world is as vast as the universe.
djr33
01-09-2013, 09:53 PM
The numbers in this case are "if we have 100 people we can identify 80 of them"... very unclear what that means; and it was with a huge amount of data (1000+ posts per person). What confuses me is what happens when they have 7 billion people to pick from. Do they still get 80% of everyone?
And... on the general notes, you're right, I think. It may be coming in the future (things like translation), but this one seems pretty far off, since even people can't do this (not on the scale they're implying, anyway), and humans are better than computers at basically everything language-related.
bernie1227
01-10-2013, 08:32 AM
http://www.brenz.net/services/l337Maker.asp
djr33
01-10-2013, 09:16 AM
Hmm... that doesn't even translate "lol".... I think it's designed only to "translate" numbers into letters?
bernie1227
01-10-2013, 09:50 AM
That's because lol isn't 1337 speak, it's a contraction.
1'/\/\ |\|07 $Ur3 j00Z'\/3 QU173 907 7|-|3 9r4$P 0Ph 1337 $P34|< d4|\|13L :p
I think you're just thinking of just regular kind of interwebs slang, for example:
"YOLO SWAG 420 TRAIN SIMULATOR 2002 SMOKE COAL EVREE DAY"
Now, the 'rules of the internet' are often not particularly good rules, but some of them are accurate, for example, rule 0.96:
"P30P13 WH0 U53 1337 5P34K 4R3 N07 1337. "
molendijk
01-10-2013, 11:03 AM
It's not very accurate. Leek is translated to L337 , but L337 is not translated to leek, but to elite, which is translated to L337. So the machine cannot even distinguish between leet and elite.
djr33
01-10-2013, 11:12 AM
What I mean is that the tool isn't more than a symbol-converter, as far as I can tell. It wouldn't help those doing this research to increase accuracy more than just writing that themselves-- 3=e, etc.
bernie1227
01-11-2013, 07:32 AM
It's not very accurate. Leek is translated to L337 , but L337 is not translated to leek, but to elite, which is translated to L337. So the machine cannot even distinguish between leet and elite.
You've confused me now, are you talking about leek? Or leet? That's because leet means elite, I assume. As a side note, leet is more often speeled as 1337, in all numbers in otherwords rather than with an L.
molendijk
01-11-2013, 10:50 AM
Sorry, typo. Where I put leek you should read leet. Anyway, the machine does not distinguish between leet and elite. If the explanation is that the program recognizes elite as a word that is always 'pronounced' leet, then it has some intelligence after all / it is more than just a symbol converter. Am I right?
djr33
01-11-2013, 06:35 PM
You're right-- it probably lists a couple words (however, that is the most obvious word for it to have, so I don't know how much else is in it).
bernie1227
01-12-2013, 12:27 AM
this may help in terms of the translation:
http://paulbui.net/wl/Leet-speak_Translator
it's a python tutorial for one.
Powered by vBulletin® Version 4.2.2 Copyright © 2021 vBulletin Solutions, Inc. All rights reserved.