This is a discussion on ISO8859-1 to UTF-8 script wanted within the Shell scripting forums, part of the Development/Scripting category; It seems I can use iconv for this. When I do # iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./oldfile.htm ./newfile.htm it gives the ...
|
|||||||
| Register | FAQ | Members List | Calendar | Forgotten your password? | Mark Forums Read |
|
|||
|
It seems I can use iconv for this. When I do
# iconv --from-code=ISO-8859-1 --to-code=UTF-8 ./oldfile.htm ./newfile.htm it gives the right result for my webserver's config. I have huge folders with htm and txt files to do this for, so I would like to not have to rename them by hand. Plus I would like the script to check for certain characters in the files, in order to decide if a file should be converted or not. I found this script somewhere: Code:
#!/bin/bash
for i in $* ; do
echo "Converting $i ..."
mv -i $i $i.bak
iconv -f ISO8859-1 -t UTF-8 $i.bak >$i
done
How do I add a character-search to it? Only if my documents (only *.txt and *.ht*) contain "é" or "à" or "è" or "ï" or "ë" or "ó" they should be converted to UTF-8, if not they should be left alone. I have done this before, but unfortunately lost the script and couldn't find much about it online. For example, how do I determine the character's presence? I have a separate backup of all the files, so a total replace script would do. Any help or examples very much appreciated. Last edited by meowing; 06-02-2007 at 06:39 PM.. |
| Sponsored Links | ||
|
|
|
|||
|
Correct code..
Code:
#!/bin/bash
mypath="$1"
for i in "$mypath"
do
echo "Converting $i ..."
mv -i $i $i.bak
iconv -f ISO8859-1 -t UTF-8 $i.bak >$i
done
Code:
./script *.html Quote:
Code:
egrep "é|à|è|ï|ë|ó" $i if [ $i -eq 0 ]; then # found # do something or call above script fi Last edited by jerry; 06-03-2007 at 12:00 AM.. Reason: code typo fixed |
|
|||
|
Quote:
Also, I think egrep does not see the correct characters in the files. I should probably use codes instead of the characters as is. |
|
|||
|
Your script, for some reason, did not work.
This one does: Code:
#!/bin/bash
FROM=iso-8859-1
TO=UTF-8
ICONV="iconv -f $FROM -t $TO"
# Convert
find /some/folder/ -type f -name "*" | while read fn; do
cp ${fn} ${fn}.bak
$ICONV < ${fn}.bak > ${fn}
rm ${fn}.bak
done
Just rename it to something.sh and then run it from command line, that will do the job. You can change the From and To fields to your liking. |
|
|||
|
I have files with spaces in the names
Example: à blanc.txt How do I modify this script to handle this case? cp: target `blanc.txt.bak' is not a directory testiconv: line 8: CB2/TermsDictionaryGlosary/ZZAcented/à blanc.txt.bak: No such file or directory |
![]() |
| Bookmarks |
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests) |
|
| Thread Tools | |
| Display Modes | |
|
|
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| games wanted | kasimani | The Hangout | 5 | 03-24-2008 11:38 PM |