Hi,
I want to export data from a webpage, into a csv file. The data will be seperated by a pipeline "|".
Is there any way this can be done using php?
Cheers,
Nick.
Printable View
Hi,
I want to export data from a webpage, into a csv file. The data will be seperated by a pipeline "|".
Is there any way this can be done using php?
Cheers,
Nick.
Yes.
I don't suppose you want to provide any more details?
How is the data input? How is it transferred to the script (GET, POST?) Is there a chance of the data having a pipe symbol in it? Is this a UNIX-, Windows- or old-Mac-style text file?
Hi Twey,
I want to save the content of this site:
http://www.mobiles4everyone.com/_exporths.asp
into a .txt/.csv file, without saving any of the html code
I don't have KRegExpEditor on this laptop, so I haven't checked that regex :)PHP Code:<?php
$d = "";
$f = fopen("http://www.mobiles4everyone.com/_exporths.asp");
while(!feof($f)) $f .= fread($f, 1);
fclose($f);
$s = array();
preg_match('[^<body]*<body[^>]*([^<\/body>]*)<\/body>.*', $d, $s);
$s = $s[1];
$f = fopen("m4e.csv");
fwrite($f, $s);
fclose($f);
?>
thanks, got an error for line 3 and a ton of errors on line 4.. ;(
Ah - did it mention opening URLs like files?
The easiest way is to set allow_url_fopen in your config file. The harder way is to implement a basic HTTP client like so:
ReplacePHP Code:function getPage() {
$errno;
$errstr;
$data = "";
$fp = fsockopen("mobiles4everyone.com", 80, $errno, $errstr, 30);
$query = "GET /_exporths.asp HTTP/1.1\r\n"
. "Host: mobiles4everyone.com\r\n\r\n";
fwrite($fp, $query);
while(!feof($fp)) $data .= fread($fp, 1);
fclose($fp);
$data = substr($data, strpos($data, "\r\n\r\n"));
return $data;
}
with a call to this function:Code:$f = fopen("http://www.mobiles4everyone.com/_exporths.asp");
while(!feof($f)) $f .= fread($f, 1);
fclose($f);
If you still get errors, please post them :)Code:$d = getPage();
Ok, this is what i've done so far:
view.php
PHP Code:<?php
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$inc = str_replace("<html>", "", "$inc");
$inc = str_replace("<head>", "", "$inc");
$inc = str_replace("<title>Untitled Document</title>", "", "$inc");
$inc = str_replace("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">", "", "$inc");
$inc = str_replace("</head>", "", "$inc");
$inc = str_replace("<body bgcolor=\"#FFFFFF\" text=\"#000000\">", "", "$inc");
$inc = str_replace(" ", "", "$inc");
$inc = str_replace(" | ", "|", "$inc");
$inc = str_replace("<br />", "<br/>", "$inc");
$inc = str_replace("</html>", "", "$inc");
$inc = str_replace("</head>", "", "$inc");
echo ("$inc");?>
savefeed.php
And i have uploadtosql.php which uploads database.txt to the mysql database.PHP Code:<?php
// define locations
$file = 'http://www.mysite.com/view.php';
$destination = 'datafeed.txt'; // change this for relative path to your desired server location
// retrieve & save the zip file
$fp = fopen($file,"r");
$fp2 = fopen("$destination", "w");
while (!feof($fp)) {
$buf = fread($fp, 1024);
fwrite($fp2, $buf);
}
fclose($fp);
fclose($fp2);
?>
There's only 1 problem, and that is that there are empty lines in the html of the page http://www.mobiles4everyone.com/_exporths.asp which is causing empty lines of data in the .txt file... How could i remove them in the view.php page?
PHP Code:$inc = str_replace("EMPTY_LINE", "NO_LINE", "$inc");
Instead of all those str_replace() calls, try simply:
Code:$s = array();
preg_match('<body\b[^>]*>(.*?)</body>', $inc, $s);
$inc = str_replace(" | ", "|", str_replace(" ", "", $inc));
$inc = $s[1];
I Get the error:
Warning: preg_match() [function.preg-match]: Unknown modifier ']' in /home/crazy4/public_html/view.php on line 12
when i open datafeed.txt
PHP Code:<?php
$mins = $_GET['freemins'];
$txts = $_GET['freetxts'];
$page = $_GET['page'];
$hs = $_GET['handset'];
$hsclean = str_replace(" ", "%20", $hs);
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$s = array();
preg_match('<body\b[^>]*>(.*?)</body>', $inc, $s);
$inc = str_replace(" | ", "|", str_replace(" ", "", $inc));
$inc = $s[1];
echo ("$inc");?>
Whoops, I think I forgot to escape a slash.
/EDIT: Hm, no, that doesn't fix it. I'll play around (I'm not too good with regex :))Code:preg_match('<body\b[^>]*>(.*?)<\/body>', $inc, $s);
i dont know what regex is, but i guess u are tryin 2 select only the data in within the body tags?! :cool:
Right. Trying, and still failing. Why won't PHP allow me a character class?!
Mike, help! :(
you had any luck with this twey?
I cant think of any alternative ways of doin this either!
I'm a lot less knowledgable than you guys, but might I suggest that you're working too hard?
It's not like you need to do this more than once.
First... cut/paste what you see, not the code, into a text file, or .htm file... whatever.
Then use that. And you've got it so much easier.
From there, just have it go through character by character, splitting the data at a |. This is what you're doing, but should be a lot easier when it's more in your control.
Now... I'm not sure what a .csv file is, so I don't know if I might be suggesting something that would make it harder to export to that, but I hope not.
Hope this helps.
I haven't, no. :( It's beginning to bug me.Quote:
you had any luck with this twey?
It needs to be kept updated. That's the point.Quote:
It's not like you need to do this more than once.
CSV stands for Comma-Seperated Values. It's a method of storing data, seperated by commas (or, less specifically, any ASCII character; the pipe symbol [|] in this case).Quote:
I'm not sure what a .csv file is
Hmmm... ok. Well.... isn't there an easy way in php to do, basically, cutting and pasting of what you'd see?
And .csv works how? Just out of curiosity... like.. how does it differ from .txt?
Yes. That's what I'm trying to do. :)Quote:
isn't there an easy way in php to do, basically, cutting and pasting of what you'd see?
It doesn't. You're evidently a Windows user -- used to your OS determining how things are treated by their file extension. File extensions are just a convenient method of seeing what type of data the file contains at a glance. They don't have any actual import on the content of the file; for example, I could name an executable .txt, or a text file .so, and it wouldn't actually change the content of the file (although Windows would probably refuse to handle it until I changed it back).Quote:
Just out of curiosity... like.. how does it differ from .txt?
The only difference between a CSV-structured text file and a non-structured text file is how it will be handled. This is the same as the difference between a CSV file and an HTML file, or any two filetypes you'd care to name, in fact.
I'm not on linux, but I'm on a mac and a pc right now... (I do video editing on my mac, and internet and such on my pc.. they're right next to each other).
I know how file extensions work, but I wasn't sure if .csv was a different format in the way that data would be stored within it.
Also... yeah... you can rename files to some extent on macs and it'll still read them. Windows is more touchy. You can also do things like renaming the filetype to get around free webhosts that block filetypes. ^_^
So... yeah... go on. Just curious. got it now, thanks.
Your problem here is a lack of delimiters. The regular expression parser will be treating the first less-than character as a delimiter, and will be looking for a greater-than character to mark the end of the pattern (and the start of the flags).Quote:
Originally Posted by Twey
I'm used to using slashes to delimit patterns, as that's how ECMAScript regular expression literals are formatted.
By the way, I haven't tested this. I only just happened to glance at the thread.PHP Code:preg_match('/<body\b[^>]*>(.*?)<\/body>/', $inc, $s);
Mike
Mike saves the day again. :p
hey, i cant seem to get it working :confused: :confused: :confused:
I dont get any error messages, just a blank page..
PHP Code:<?php
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$s = array(); preg_match('/<body\b[^>]*>(.*?)<\/body>/', $inc, $s); $inc = str_replace(" | ", "|", str_replace(" ", "", $inc)); $inc = $s[1];
echo ("$inc");?>
Call preg_match() after the str_replace() calls.
nope, still get a blank page!!
PHP Code:<?php
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$s = array();
$inc = str_replace(" | ", "|", str_replace(" ", "", $inc));
preg_match('/<body\b[^>]*>(.*?)<\/body>/', $inc, $s);
$inc = $s[1];
echo ("$inc");?>
TryCode:$inc = $s[0];
nope!! still gettin a blank page..
That would show the substring that the pattern matched, not the parenthesised expression.Quote:
Originally Posted by Twey
The problem here is that, by default, the dot (.) pattern token doesn't match line terminators. As one exists just after the opening body tag, the match halts pretty quickly. The 's' flag will include these line terminators.
More luck can be had with:
Notice that leading and trailing white space will be excluded from the capturing parentheses, so one of the str_replace function calls can be removed.Code:$data = file_get_contents('http://www.mobiles4everyone.com/_exporths.asp');
$data = str_replace(' | ', '|', $data);
if (preg_match('/<body\b[^>]*>\s*(.*?)\s*<\/body>/s', $data, $content)) {
echo $content[1];
}
Mike
your a genious!! lol, works great! thanks to everyone, twey, mike..
hey!
does anyone know why this script doesnt work with php 5.2.1?
PHP has just been updated on my host, and none of my databases are being updated!!! grr!!
Anyone got any ideas? help is much appreciated!
What's the error?
there is no error..
just get a blank page!
if i remove the preg_match part, the script works, but all the html tags are not removed..
i've still had no luck with this..
i'm trying to include the contents of a page similar to this:
without any the html at the top and bottom.. my current script it:Code:
<html>
<head>
<style>
body {
margin-left: 0px;
margin-top: 0px;
margin-right: 0px;
margin-bottom: 0px;
}
.tditem {
font-size: 10px;
font-family: Arial;
text-decoration: none;
color: #333333;
}
</style>
</head>
<body bgcolor="white" text="#000000" link="#0000FF" vlink="#000099" alink="#FF0000">
<font face="Courier New, Courier, mono" size="2">
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
</font>
</body>
</html>
all i get is a page which says "doesnt work!"PHP Code:<?php
$data = file_get_contents('http://www.mywebsite.com/website.html');
if (preg_match('/<font\b[^>]*>\s*(.*?)\s*<\/font>/s', $data, $content)) {
echo $content[1];
}
else
{
echo ("doesnt work!");
}
?>
I'm running php 5.2.1
please help!