Results 1 to 10 of 10

Thread: Natural Language Generation SPAM or NOT?

  1. #1
    Join Date
    Jan 2012
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default Natural Language Generation SPAM or NOT?

    Hello!
    I was wondering if automatically generated text reports about stock price movements can be considered SPAM or DUPLICATION. I have seen software packages which are able to generate easily readable and very helpful reports - via transforming the table based numerical information into coherent textual summary for the humans to easier to interpret. Here is an example system being described called Stock Reporter:

    http://web.science.mq.edu.au/~ltgdem...ter/about.html

    I wonder what you think of such a way of content generation?

    Looking forward to hear what you think,

  2. #2
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    I'm really not sure what you are asking here.

    Let me define 'spam' first:

    spam: messages (posts, emails, etc) that are not wanted. Usually these are sent in bulk, but in some cases it could be as few as a single message, still considered spam because it is "spamming" the user-- bothering them with messages that are irrelevant, undesired, or something the user has tried to block.
    Note that spam often also relates to advertising, but spam does not necessarily contain ads-- just often the spam with ads is worse than the spam without ads (partly because ads make money for the spammers).

    The definition of spam entirely depends on the context. For example, I'm a linguist. Many of the messages that I receive about languages or linguistics would probably be marked by others as spam because they don't want them. But I do. So "spam" isn't a general thing-- it's a relationship between a user and a message.

    You're posting this in the HTML forum, so maybe you're asking about including this kind of information in your website. Are you asking whether search engines might see your website as spam? Well, first, I don't know if "spam" can be properly applied to websites. Spam is usually something that is directly sent to someone. Websites are not a direct message, but rather a choice. By definition, anything a user chooses to receive is not spam, so by visiting your website they are choosing to receive its contents.


    As for natural language processing/generation, that's irrelevant. If the results are unreadable or confusing, or contain ads or otherwise annoying content, then it is likely that someone will object to that content, especially if they receive it directly without consent (for example, in spam email messages). But if the text looks like something a human would write, there's no reason that would matter.


    As for "duplicate" content, that's something that search engines can be bothered by. But there are many websites with "duplicate" pages that only differ by a little information. For example, websites with weather information have thousands of similar pages, and the only difference is in the numbers (temperature) and maybe some images (clouds, sun, etc).



    So if you have a specific question, we can try to help. But in general, generating content with a computer is not a problem as long as it is helpful. The source that is not a problem-- the result could be a problem.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  3. #3
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    29,133
    Thanks
    44
    Thanked 3,231 Times in 3,192 Posts
    Blog Entries
    12

    Default

    It's not spam unless you post or email it somewhere. And then, only if it's unwelcome. It is duplication, but that's not a real problem unless it's also copyright infringement.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  4. #4
    Join Date
    Jan 2012
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    thank you for your answers....so you think it is the duplication...and it is not so much of a problem right? so if I create a helpful contents - i.e. financial reports automatically generated which are sometimes a little bit repetitive stylistically but still of much help to the visitors - that will not bother google right?

  5. #5
    Join Date
    Mar 2005
    Location
    SE PA USA
    Posts
    29,133
    Thanks
    44
    Thanked 3,231 Times in 3,192 Posts
    Blog Entries
    12

    Default

    What do you mean by "bothering Google"? We've just now seen with SOPA what can happen if you really upset Google. That's not something anyone wants to do unless it's unavoidable.

    But if you just mean giving their search algorithms a little trouble, I wouldn't worry about it. If you're providing a valuable service and it's also something you want to do, nothing like that should stand in your way.
    - John
    ________________________

    Show Additional Thanks: International Rescue Committee - Donate or: The Ocean Conservancy - Donate or: PayPal - Donate

  6. #6
    Join Date
    Jan 2012
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    YES....but I am wondering if generating my own content automatically can be flagged as the DUPLICATION by google????? what if certain parts of my reports look a bit similar to the others while being overall different from one another??? what do you think is the acceptable level do REPETITIVENESS of my own articles???

  7. #7
    Join Date
    Mar 2006
    Location
    Illinois, USA
    Posts
    12,164
    Thanks
    265
    Thanked 690 Times in 678 Posts

    Default

    You are confusing method and content.

    Google and other search engines don't care or know if it was written by a computer.

    The question is whether your content is duplication or if it is useful. I've already given you a clear example: weather websites have lots of duplicate format information with a few numbers that are different. But they are useful.

    I don't know exactly how google and other search engines check this, but in my opinion, it's all very simple: are all of your pages useful? Are any of them actual duplicates (two URLs, one page)?
    If your content is useful and your pages are different (even if in small ways), then that's fine.

    Think about google search-- every page it generates is very similar. Does that mean it's not useful? It's also generated by a computer. No problem.


    However, one thing you can look into is blocking robots from checking all of your pages. If the pages you're making are like search results, actually google does ask that you don't have that type of dynamic page for certain things, like for google ads, and maybe for searches.

    But overall I don't think it's a problem.
    Daniel - Freelance Web Design | <?php?> | <html>| espa˝ol | Deutsch | italiano | portuguŕs | catalÓ | un peu de franšais | some knowledge of several other languages: I can sometimes help translate here on DD | Linguistics Forum

  8. #8
    Join Date
    Dec 2011
    Posts
    49
    Thanks
    8
    Thanked 1 Time in 1 Post

    Default

    i feel like finra would come after you before google or anyone else does lol.

  9. #9
    Join Date
    Jan 2012
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by djr33 View Post
    You are confusing method and content.

    Google and other search engines don't care or know if it was written by a computer.

    The question is whether your content is duplication or if it is useful. I've already given you a clear example: weather websites have lots of duplicate format information with a few numbers that are different. But they are useful.

    I don't know exactly how google and other search engines check this, but in my opinion, it's all very simple: are all of your pages useful? Are any of them actual duplicates (two URLs, one page)?
    If your content is useful and your pages are different (even if in small ways), then that's fine.

    Think about google search-- every page it generates is very similar. Does that mean it's not useful? It's also generated by a computer. No problem.


    However, one thing you can look into is blocking robots from checking all of your pages. If the pages you're making are like search results, actually google does ask that you don't have that type of dynamic page for certain things, like for google ads, and maybe for searches.

    But overall I don't think it's a problem.

    well teh way I plant to create the automated articles makes them unique - even if they might seen similar to the other articles based on certain word chains, phrases and by no means duplicates on one another- you think that is ok in google's eyes?

    ANOTHER QUESTION IS HOW YOU THINK GOGGLE MIGHT LOOK AT THE SUDDEN APPEARANCE OF SAY 100 ARTICLES THIS WAY FORMATTED AFTER A TIME OF NO UPDATES FOR A FEW MONTHS?

  10. #10
    Join Date
    Jan 2012
    Posts
    15
    Thanks
    0
    Thanked 0 Times in 0 Posts

    Default

    Quote Originally Posted by bacondelta View Post
    i feel like finra would come after you before google or anyone else does lol.

    lol hope not)

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •