Page 1 of 2 12 LastLast
Results 1 to 10 of 14

Thread: Forensic linguistics identifying users on the internet

  1. #1
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default Forensic linguistics identifying users on the internet

    This might interest some of you here--
    http://www.scmagazine.com.au/News/32...ous-users.aspx

    (I'm a bit skeptical with my background in linguistics; I think it's somewhat possible, but it's very far from being practical or accurate-- I doubt they'll be able to make that something that could hold up in court, at least for a very long time.)
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  2. #2
    Join Date
    Mar 2011
    Location
    N 11░ 19' 0.0012 E 142░ 15' 0
    Posts
    1,524
    Thanks
    41
    Thanked 89 Times in 88 Posts
    Blog Entries
    3

    Default

    The most amusing bit of that article -
    Leetspeak, an alternative alphabet popular in some forum circles, cannot be translated.
    But that's a very interesting article... It's surprising how much they can track you just by the way you speak...

  3. #3
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default

    That part was amusing, but part of why I'm skeptical. What they mean is that they haven't done anything aside from analyzing English. There's no reason they couldn't analyze any other language (including leetspeak). In fact, you'd get a lot more information from looking at things beyond standard English.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  4. #4
    Join Date
    Sep 2007
    Location
    The Netherlands
    Posts
    1,369
    Thanks
    31
    Thanked 141 Times in 136 Posts
    Blog Entries
    32

    Default

    I think this linguistic hope is unrealistic. It's quite possible to identify groups (professional, social etc.) on the basis of linguistic criteria or peculiarities, but identifying individuals seems impossible to me. Except in certain rare cases. For instance, the rules of Dutch grammar present in the head of the legendary Dutch former soccer player Johan Cruijff are so unique (idiosyncratic) that anyone reading a written text of what he says will recognize him.
    Linguists have been too optimistic on other occasions. In the sixties and seventies of the past century, they thought it would soon be possible to build translation machines endorsed with artificial intelligence they thought would be almost equal to human intelligence. The Google Translation Machine proves how sadly they were wrong. They were wrong from the start because they forgot too easily that more than 30%, 40% or maybe 50% of human linguistic interaction is ruled by what humans know 'about the world', not by linguistic rules. If I say The next day, John went to the railway station and he bought a paper then the person to whom I'm speaking will conclude that I mean that John bought the paper at the station, not afterwards. But if I say The next day, John went to see his mother and he bought a paper, nothing indicates when he bought the paper. He may even have bought the paper before going to his mother (=The next day, John went to see his mother and he also bought a paper). (This has nothing to do with 'vagueness' attached to the Simple Past of English. In French, the situation would not be different, despite the so-called 'preciseness' of the passÚ simple).
    So the interpretations of certain linguistic utterances, especially sequences of sentences, may have less to do with grammar and the like than with knowledge of the world. And our knowledge of the world is as vast as the universe.
    Last edited by molendijk; 01-09-2013 at 08:38 PM.

  5. #5
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default

    The numbers in this case are "if we have 100 people we can identify 80 of them"... very unclear what that means; and it was with a huge amount of data (1000+ posts per person). What confuses me is what happens when they have 7 billion people to pick from. Do they still get 80% of everyone?

    And... on the general notes, you're right, I think. It may be coming in the future (things like translation), but this one seems pretty far off, since even people can't do this (not on the scale they're implying, anyway), and humans are better than computers at basically everything language-related.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  6. #6
    Join Date
    May 2012
    Location
    Hitchhiking the Galaxy
    Posts
    1,013
    Thanks
    47
    Thanked 139 Times in 139 Posts
    Blog Entries
    1

    Default

    "Most good programmers do programming not because they expect to get paid or get adulation by the public, but because it is fun to program." - Linus Torvalds
    Anime Views Forums
    Bernie

  7. #7
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default

    Hmm... that doesn't even translate "lol".... I think it's designed only to "translate" numbers into letters?
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  8. #8
    Join Date
    May 2012
    Location
    Hitchhiking the Galaxy
    Posts
    1,013
    Thanks
    47
    Thanked 139 Times in 139 Posts
    Blog Entries
    1

    Default

    That's because lol isn't 1337 speak, it's a contraction.

    1'/\/\ |\|07 $Ur3 j00Z'\/3 QU173 907 7|-|3 9r4$P 0Ph 1337 $P34|< d4|\|13L

    I think you're just thinking of just regular kind of interwebs slang, for example:

    "YOLO SWAG 420 TRAIN SIMULATOR 2002 SMOKE COAL EVREE DAY"


    Now, the 'rules of the internet' are often not particularly good rules, but some of them are accurate, for example, rule 0.96:
    "P30P13 WH0 U53 1337 5P34K 4R3 N07 1337. "
    "Most good programmers do programming not because they expect to get paid or get adulation by the public, but because it is fun to program." - Linus Torvalds
    Anime Views Forums
    Bernie

  9. #9
    Join Date
    Sep 2007
    Location
    The Netherlands
    Posts
    1,369
    Thanks
    31
    Thanked 141 Times in 136 Posts
    Blog Entries
    32

    Default

    It's not very accurate. Leek is translated to L337 , but L337 is not translated to leek, but to elite, which is translated to L337. So the machine cannot even distinguish between leet and elite.

  10. #10
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,162
    Thanks
    263
    Thanked 690 Times in 678 Posts

    Default

    What I mean is that the tool isn't more than a symbol-converter, as far as I can tell. It wouldn't help those doing this research to increase accuracy more than just writing that themselves-- 3=e, etc.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

Similar Threads

  1. Identifying Ipod or Ipad
    By SteveHeavy in forum Looking for such a script or service
    Replies: 2
    Last Post: 02-01-2012, 08:17 PM
  2. Identifying machine uniquely by PHP
    By Aniruddhya Hazra in forum PHP
    Replies: 7
    Last Post: 11-13-2011, 07:41 PM
  3. Identifying Popups
    By Webiter in forum JavaScript
    Replies: 4
    Last Post: 11-05-2011, 05:06 PM
  4. Replies: 6
    Last Post: 04-06-2010, 10:12 PM
  5. Replies: 4
    Last Post: 03-02-2006, 01:13 AM

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •