nixCraft Linux Forum

nixCraft

Linux / UNIX Tech Support Forum

Parse XML file and store data in array in shell scripting

This is a discussion on Parse XML file and store data in array in shell scripting within the Shell scripting forums, part of the Development/Scripting category; Hello, I have the XML file in the format <Users> <Host> <hostAddress>180.144.226.47</hostAddress> <userName>pwdfe</userName> <password>hjitre</password> <instanceCount>2</instanceCount> </Host> <Host> <hostAddress>180.144.226.87</hostAddress> <userName>trrrer</userName> <password>jhjjhhj</password> ...

Register free or login to your existing account and remove all advertisements.


Go Back   nixCraft Linux Forum > Development/Scripting > Shell scripting

Linux answers from nixCraft.


Shell scripting You can discuss the shell scripting, request shell scripts and scripting techniques

Reply

 

LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 02-01-2008, 11:16 AM
Junior Member
User
 
Join Date: Jan 2008
OS: Debian
Posts: 3
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
Nishanthhampali is on a distinguished road
Default Parse XML file and store data in array in shell scripting

Hello,
I have the XML file in the format

<Users>
<Host>
<hostAddress>180.144.226.47</hostAddress>
<userName>pwdfe</userName>
<password>hjitre</password>
<instanceCount>2</instanceCount>
</Host>
<Host>
<hostAddress>180.144.226.87</hostAddress>
<userName>trrrer</userName>
<password>jhjjhhj</password>
<instanceCount>3</instanceCount>
</Host>
<Host>
<hostAddress>180.455.226.87</hostAddress>
<userName>wewqw</userName>
<password>dfsdfd</password>
<instanceCount>3</instanceCount>
</Host>
</Users>

I have to read this xml file from the shell script and store the value of the tags hostAddress,username,password,instancecount in a separate arrays.

Please help me out in solving this.
Reply With Quote
  #2 (permalink)  
Old 02-11-2008, 01:31 PM
agn agn is offline
Member
User
 
Join Date: Feb 2008
OS: OpenBSD/FreeBSD/Debian/Fedora/RHEL
Posts: 69
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 3
agn is on a distinguished road
Default

Code:
for tag in hostAddress username password instancecount
do
    grep  $tag in.xml | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/'
done
Something like the above might help. I don't use bash, so don't know how arrays are populated.
Reply With Quote
  #3 (permalink)  
Old 02-13-2008, 08:55 PM
Junior Member
User
 
Join Date: Jun 2007
OS: Debian
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
manishkochar is on a distinguished road
Default

Quote:
Originally Posted by agn View Post
Code:
for tag in hostAddress username password instancecount
do
    grep  $tag in.xml | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/'
done
Something like the above might help. I don't use bash, so don't know how arrays are populated.
The sed expression was the most complex part, stuffing things into an array, is easy

Code:
#!/bin/bash

for tag in hostAddress userName password instanceCount
do
OUT=`grep  $tag in.xml | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/' `

# This is what I call the eval_trick, difficult to explain in words.
eval ${tag}=`echo -ne \""${OUT}"\"`
done

# So let's stuff the obtained results into 4 different Arrays

H_ARRAY=( `echo ${hostAddress}` )
U_ARRAY=( `echo ${userName}` )
P_ARRAY=( `echo ${password}` )
I_ARRAY=( `echo ${instanceCount}` )

# Ok, time to announce success, let's printout each of the arrays

echo ${H_ARRAY[@]}
echo ${U_ARRAY[@]}
echo ${P_ARRAY[@]}
echo ${I_ARRAY[@]}

# For the benefit of agn - 
# We can now refer to each unique element of the array like this -

echo ${H_ARRAY[0]} 

# The above prints the first item in array H_ARRAY
I chanced upon this thread, because, I am trying to do a similar project.
The specs look rather challenging, for my poor knowledge of sed.
So let's see if agn can crack this one too!

I want to create a list of web-sites that definitely contain pornographic, or adult content, that's not suitable for kids, at school.
I can see that the dmoz offers it's data in an xml format.
I also noticed that the xml file contains descriptive information about each web-site.

Now this is what I want to do -
A shell script, wherein I specify (via PCRE, of course) the look_up_string.
Based on the look_up_string, I want to, collect in a file the names of web-sites. I don't want the whole URL, just the hostname is enough.
I will then later set this hostname in my hosts file, to ensure effective blocking of these sites.

Could anybody help on this?
Reply With Quote
  #4 (permalink)  
Old 02-13-2008, 09:42 PM
agn agn is offline
Member
User
 
Join Date: Feb 2008
OS: OpenBSD/FreeBSD/Debian/Fedora/RHEL
Posts: 69
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 3
agn is on a distinguished road
Default

Am not an expert in sed. Sed is a really weird tool, but the power it contains is awesome.

Regex's are not a good tool to parse html/xml data. You should use an XML parser. Right tool for the right job.
Reply With Quote
  #5 (permalink)  
Old 02-14-2008, 09:44 AM
Junior Member
User
 
Join Date: Jun 2007
OS: Debian
Posts: 6
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
manishkochar is on a distinguished road
Default

Quote:
Originally Posted by agn View Post
Am not an expert in sed. Sed is a really weird tool, but the power it contains is awesome.

Regex's are not a good tool to parse html/xml data. You should use an XML parser. Right tool for the right job.
Any links you would like to share?
I've tried everything listed in freshmeat / sourceforge for "dmoz" parsing.
Sadly, none of them really work, and all are poorly documented.
Besides I feel like a retard when it comes to perl.
Reply With Quote
  #6 (permalink)  
Old 02-14-2008, 11:57 AM
agn agn is offline
Member
User
 
Join Date: Feb 2008
OS: OpenBSD/FreeBSD/Debian/Fedora/RHEL
Posts: 69
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 3
agn is on a distinguished road
Default

Am a total newbie to XML parsing. But, XML::Simple[1] looks easy.

[1] XML::Simple - Easy API to maintain XML (esp config file - search.cpan.org

Last edited by agn; 02-14-2008 at 12:07 PM.
Reply With Quote
  #7 (permalink)  
Old 12-15-2009, 07:37 PM
Junior Member
User
 
Join Date: Dec 2009
OS: SUSE
Posts: 1
Thanks: 0
Thanked 0 Times in 0 Posts
Rep Power: 0
jaiswal is on a distinguished road
Default

Quote:
Originally Posted by agn View Post
Code:
 
for tag in hostAddress username password instancecount
do
    grep  $tag in.xml | tr -d '\t' | sed 's/^<.*>\([^<].*\)<.*>$/\1/'
done
Something like the above might help. I don't use bash, so don't know how arrays are populated.
could you pls explain the sed part.
thanks
Reply With Quote
Reply

Tags
perl xml , shell scripting , shell xml , xml


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads

Thread Thread Starter Forum Replies Last Post
Learning Shell Scripting ricc Shell scripting 5 04-02-2009 03:11 PM
Rearange Data from a file to another sebastanov Shell scripting 1 04-16-2008 10:46 AM
need help on shell scripting rahul_sayz Shell scripting 1 12-08-2007 10:37 AM
get data mysql from shell alpa Shell scripting 1 05-14-2007 06:02 PM
Shell scripting - Removing file extension urbanreformer Shell scripting 3 03-07-2007 08:44 PM


All times are GMT +5.5. The time now is 07:03 AM.


Powered by vBulletin® Version 3.8.4 - Copyright ©2000 - 2010, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.3.2
©2005-2009 nixCraft. All rights reserved

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38