PDA

View Full Version : php to parse word document



sukanya.paul
01-16-2009, 06:11 AM
hi,
i need to parse word documents for name,email and contact information and insert the values to a database using php code.
can someone please help me with this as i don't really know how to go about it.

Thanks in advance ,
Suk

Nile
01-16-2009, 10:28 PM
Your PHP would look something like:


<?php
$file = fopen('word.doc','rb');
while($line = fgetcsv($file, null, ":")){
${strtolower($line[0])} = $line[1];
}
echo $phone;
?>


Then make a new file called word.doc:



Name: Bob Cammer
Email: cammer.bob@idk.com
Phone: 1.888.888.8888


You can simply add anything to it:


Name: Bob Cammer
Email: cammer.bob@idk.com
Phone: 1.888.888.8888
Type: Artwork

To output it, you would then add this at the bottom of the php script(highlighted in the php script above):


echo $type; //output Artwork

Notice how the type is capitalized in the word doc, but not in the php.

sukanya.paul
01-19-2009, 07:29 AM
thnx for the reply, but i didnt really follow the code.
i have a word doc called txtfile.doc in which i have say name and age it can have n number of details.
what i want to do is extract the word which comes after say Name and then the numbers which come after Phone.
How???:confused:

drluv888
09-01-2011, 12:35 PM
Your PHP would look something like:


<?php
$file = fopen('word.doc','rb');
while($line = fgetcsv($file, null, ":")){
${strtolower($line[0])} = $line[1];
}
echo $phone;
?>


Then make a new file called word.doc:



Name: Bob Cammer
Email: cammer.bob@idk.com
Phone: 1.888.888.8888


You can simply add anything to it:


Name: Bob Cammer
Email: cammer.bob@idk.com
Phone: 1.888.888.8888
Type: Artwork

To output it, you would then add this at the bottom of the php script(highlighted in the php script above):


echo $type; //output Artwork

Notice how the type is capitalized in the word doc, but not in the php.

You are parsing a csv file here. You cant use fgetcsv on a real .doc word file. Ofcourse you can make a csv file and call it word.doc but it isnt going to be a properly formatted MS word .doc file.

Now that MS have made libraries available to the open source world so that things like open office can work with .doc files, I would have thought that someone would have ported this to a php api by now. From all the chatter around from people that want to know how to parse a .doc file, I guess its not be written yet. I would advise people to look for .doc to xml converters and to parse the xml in PHP as they need to. These are the open source xml standards written by microsoft.

Thanks

JShor
09-01-2011, 01:08 PM
Guys, it doesn't work like that. You need to use some sort of library to read these files. Use an open source package like Antiword to read .doc files.
http://www.winfield.demon.nl/