PDA

View Full Version : Export Data to CSV (Using PHP)



nikomou
02-01-2006, 03:30 PM
Hi,

I want to export data from a webpage, into a csv file. The data will be seperated by a pipeline "|".

Is there any way this can be done using php?

Cheers,
Nick.

Twey
02-01-2006, 04:04 PM
Yes.
I don't suppose you want to provide any more details?
How is the data input? How is it transferred to the script (GET, POST?) Is there a chance of the data having a pipe symbol in it? Is this a UNIX-, Windows- or old-Mac-style text file?

nikomou
02-05-2006, 07:15 PM
Hi Twey,

I want to save the content of this site:
http://www.mobiles4everyone.com/_exporths.asp

into a .txt/.csv file, without saving any of the html code

Twey
02-05-2006, 07:47 PM
<?php
$d = "";
$f = fopen("http://www.mobiles4everyone.com/_exporths.asp");
while(!feof($f)) $f .= fread($f, 1);
fclose($f);
$s = array();
preg_match('[^<body]*<body[^>]*([^<\/body>]*)<\/body>.*', $d, $s);
$s = $s[1];
$f = fopen("m4e.csv");
fwrite($f, $s);
fclose($f);
?>I don't have KRegExpEditor on this laptop, so I haven't checked that regex :)

nikomou
02-06-2006, 01:02 AM
thanks, got an error for line 3 and a ton of errors on line 4.. ;(

Twey
02-06-2006, 04:31 PM
Ah - did it mention opening URLs like files?
The easiest way is to set allow_url_fopen in your config file. The harder way is to implement a basic HTTP client like so:

function getPage() {
$errno;
$errstr;
$data = "";
$fp = fsockopen("mobiles4everyone.com", 80, $errno, $errstr, 30);
$query = "GET /_exporths.asp HTTP/1.1\r\n"
. "Host: mobiles4everyone.com\r\n\r\n";
fwrite($fp, $query);
while(!feof($fp)) $data .= fread($fp, 1);
fclose($fp);
$data = substr($data, strpos($data, "\r\n\r\n"));
return $data;
}Replace
$f = fopen("http://www.mobiles4everyone.com/_exporths.asp");
while(!feof($f)) $f .= fread($f, 1);
fclose($f);with a call to this function:
$d = getPage();If you still get errors, please post them :)

nikomou
02-12-2006, 11:31 AM
Ok, this is what i've done so far:


view.php

<?php
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$inc = str_replace("<html>", "", "$inc");
$inc = str_replace("<head>", "", "$inc");
$inc = str_replace("<title>Untitled Document</title>", "", "$inc");
$inc = str_replace("<meta http-equiv=\"Content-Type\" content=\"text/html; charset=iso-8859-1\">", "", "$inc");
$inc = str_replace("</head>", "", "$inc");
$inc = str_replace("<body bgcolor=\"#FFFFFF\" text=\"#000000\">", "", "$inc");
$inc = str_replace(" ", "", "$inc");
$inc = str_replace(" | ", "|", "$inc");
$inc = str_replace("<br />", "<br/>", "$inc");
$inc = str_replace("</html>", "", "$inc");
$inc = str_replace("</head>", "", "$inc");
echo ("$inc");?>


savefeed.php

<?php
// define locations
$file = 'http://www.mysite.com/view.php';
$destination = 'datafeed.txt'; // change this for relative path to your desired server location

// retrieve & save the zip file
$fp = fopen($file,"r");
$fp2 = fopen("$destination", "w");
while (!feof($fp)) {
$buf = fread($fp, 1024);
fwrite($fp2, $buf);
}
fclose($fp);
fclose($fp2);
?>


And i have uploadtosql.php which uploads database.txt to the mysql database.

There's only 1 problem, and that is that there are empty lines in the html of the page http://www.mobiles4everyone.com/_exporths.asp which is causing empty lines of data in the .txt file... How could i remove them in the view.php page?


$inc = str_replace("EMPTY_LINE", "NO_LINE", "$inc");

Twey
02-12-2006, 11:54 AM
Instead of all those str_replace() calls, try simply:

$s = array();
preg_match('<body\b[^>]*>(.*?)</body>', $inc, $s);
$inc = str_replace(" | ", "|", str_replace(" ", "", $inc));
$inc = $s[1];

nikomou
02-12-2006, 03:56 PM
I Get the error:

Warning: preg_match() [function.preg-match]: Unknown modifier ']' in /home/crazy4/public_html/view.php on line 12

when i open datafeed.txt


<?php
$mins = $_GET['freemins'];
$txts = $_GET['freetxts'];
$page = $_GET['page'];
$hs = $_GET['handset'];
$hsclean = str_replace(" ", "%20", $hs);
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$s = array();
preg_match('<body\b[^>]*>(.*?)</body>', $inc, $s);
$inc = str_replace(" | ", "|", str_replace(" ", "", $inc));
$inc = $s[1];
echo ("$inc");?>

Twey
02-12-2006, 04:09 PM
Whoops, I think I forgot to escape a slash.

preg_match('<body\b[^>]*>(.*?)<\/body>', $inc, $s);
/EDIT: Hm, no, that doesn't fix it. I'll play around (I'm not too good with regex :))

nikomou
02-13-2006, 08:59 PM
i dont know what regex is, but i guess u are tryin 2 select only the data in within the body tags?! :cool:

Twey
02-13-2006, 09:04 PM
Right. Trying, and still failing. Why won't PHP allow me a character class?!
Mike, help! :(

nikomou
03-04-2006, 10:02 AM
you had any luck with this twey?

I cant think of any alternative ways of doin this either!

djr33
03-04-2006, 12:16 PM
I'm a lot less knowledgable than you guys, but might I suggest that you're working too hard?
It's not like you need to do this more than once.

First... cut/paste what you see, not the code, into a text file, or .htm file... whatever.
Then use that. And you've got it so much easier.

From there, just have it go through character by character, splitting the data at a |. This is what you're doing, but should be a lot easier when it's more in your control.

Now... I'm not sure what a .csv file is, so I don't know if I might be suggesting something that would make it harder to export to that, but I hope not.

Hope this helps.

Twey
03-04-2006, 02:15 PM
you had any luck with this twey?I haven't, no. :( It's beginning to bug me.
It's not like you need to do this more than once.It needs to be kept updated. That's the point.
I'm not sure what a .csv file isCSV stands for Comma-Seperated Values. It's a method of storing data, seperated by commas (or, less specifically, any ASCII character; the pipe symbol [|] in this case).

djr33
03-05-2006, 12:51 AM
Hmmm... ok. Well.... isn't there an easy way in php to do, basically, cutting and pasting of what you'd see?
And .csv works how? Just out of curiosity... like.. how does it differ from .txt?

Twey
03-05-2006, 10:31 AM
isn't there an easy way in php to do, basically, cutting and pasting of what you'd see?Yes. That's what I'm trying to do. :)

Just out of curiosity... like.. how does it differ from .txt?It doesn't. You're evidently a Windows user -- used to your OS determining how things are treated by their file extension. File extensions are just a convenient method of seeing what type of data the file contains at a glance. They don't have any actual import on the content of the file; for example, I could name an executable .txt, or a text file .so, and it wouldn't actually change the content of the file (although Windows would probably refuse to handle it until I changed it back).
The only difference between a CSV-structured text file and a non-structured text file is how it will be handled. This is the same as the difference between a CSV file and an HTML file, or any two filetypes you'd care to name, in fact.

djr33
03-06-2006, 03:57 AM
I'm not on linux, but I'm on a mac and a pc right now... (I do video editing on my mac, and internet and such on my pc.. they're right next to each other).

I know how file extensions work, but I wasn't sure if .csv was a different format in the way that data would be stored within it.
Also... yeah... you can rename files to some extent on macs and it'll still read them. Windows is more touchy. You can also do things like renaming the filetype to get around free webhosts that block filetypes. ^_^

So... yeah... go on. Just curious. got it now, thanks.

mwinter
03-06-2006, 12:05 PM
preg_match('<body\b[^>]*>(.*?)<\/body>', $inc, $s);Your problem here is a lack of delimiters. The regular expression parser will be treating the first less-than character as a delimiter, and will be looking for a greater-than character to mark the end of the pattern (and the start of the flags).

I'm used to using slashes to delimit patterns, as that's how ECMAScript regular expression literals are formatted.



preg_match('/<body\b[^>]*>(.*?)<\/body>/', $inc, $s);
By the way, I haven't tested this. I only just happened to glance at the thread.

Mike

Twey
03-06-2006, 03:53 PM
Mike saves the day again. :p

nikomou
03-10-2006, 08:54 AM
hey, i cant seem to get it working :confused: :confused: :confused:
I dont get any error messages, just a blank page..


<?php
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$s = array(); preg_match('/<body\b[^>]*>(.*?)<\/body>/', $inc, $s); $inc = str_replace(" | ", "|", str_replace(" ", "", $inc)); $inc = $s[1];
echo ("$inc");?>

Twey
03-10-2006, 05:14 PM
Call preg_match() after the str_replace() calls.

nikomou
03-10-2006, 08:12 PM
nope, still get a blank page!!


<?php
ob_start();
include ("http://www.mobiles4everyone.com/_exporths.asp");
$inc = ob_get_contents();
ob_end_clean();
$s = array();
$inc = str_replace(" | ", "|", str_replace(" ", "", $inc));
preg_match('/<body\b[^>]*>(.*?)<\/body>/', $inc, $s);
$inc = $s[1];
echo ("$inc");?>

Twey
03-10-2006, 08:23 PM
Try
$inc = $s[0];

nikomou
03-10-2006, 08:27 PM
nope!! still gettin a blank page..

mwinter
03-10-2006, 08:56 PM
Try
$inc = $s[0];That would show the substring that the pattern matched, not the parenthesised expression.

The problem here is that, by default, the dot (.) pattern token doesn't match line terminators. As one exists just after the opening body tag, the match halts pretty quickly. The 's' flag will include these line terminators.

More luck can be had with:



$data = file_get_contents('http://www.mobiles4everyone.com/_exporths.asp');
$data = str_replace(' | ', '|', $data);

if (preg_match('/<body\b[^>]*>\s*(.*?)\s*<\/body>/s', $data, $content)) {
echo $content[1];
}
Notice that leading and trailing white space will be excluded from the capturing parentheses, so one of the str_replace function calls can be removed.

Mike

nikomou
03-10-2006, 11:06 PM
your a genious!! lol, works great! thanks to everyone, twey, mike..

nikomou
03-04-2007, 11:27 AM
hey!

does anyone know why this script doesnt work with php 5.2.1?
PHP has just been updated on my host, and none of my databases are being updated!!! grr!!

Anyone got any ideas? help is much appreciated!

Twey
03-04-2007, 01:21 PM
What's the error?

nikomou
03-04-2007, 01:29 PM
there is no error..
just get a blank page!
if i remove the preg_match part, the script works, but all the html tags are not removed..

nikomou
03-27-2007, 08:57 PM
i've still had no luck with this..
i'm trying to include the contents of a page similar to this:




<html>
<head>
<style>
body {
margin-left: 0px;
margin-top: 0px;
margin-right: 0px;
margin-bottom: 0px;
}
.tditem {
font-size: 10px;
font-family: Arial;
text-decoration: none;
color: #333333;
}
</style>
</head>
<body bgcolor="white" text="#000000" link="#0000FF" vlink="#000099" alink="#FF0000">
<font face="Courier New, Courier, mono" size="2">
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
CONENT | MORE CONTENT | EVEN MORE CONTENT |
</font>
</body>
</html>


without any the html at the top and bottom.. my current script it:



<?php
$data = file_get_contents('http://www.mywebsite.com/website.html');
if (preg_match('/<font\b[^>]*>\s*(.*?)\s*<\/font>/s', $data, $content)) {
echo $content[1];
}
else
{
echo ("doesnt work!");
}
?>

all i get is a page which says "doesnt work!"
I'm running php 5.2.1

please help!