Author Topic: how to change the charset format for html export  (Read 2002 times)

Darksun974

  • Newbie
  • *
  • Posts: 5
    • View Profile
how to change the charset format for html export
« on: January 19, 2009, 01:28:09 AM »
Hello !
Just want to know how put the charset in UTF-8 for the HTML export.
In the template export the charset value is <charset> and in the html page exported the value is ISO8859-1.
Can someone tell me where modify the default value for the charset.
Thank You and sorry for my bad english  (i'm french) ;D

Corey Cooper

  • Administrator
  • Hero Member
  • *****
  • Posts: 6216
    • View Profile
Re: how to change the charset format for html export
« Reply #1 on: January 19, 2009, 10:38:41 AM »
The charset is taken from the input set of data.  Meaning, whatever is being "exported", the program checks to see what the encoding should be.  It defaults to ISO-8859-1 if it doesn't detect the need to encode in UTF-8, UTF-16, or UTF-32.

If you want to override this, edit the export template.  Where it says:

content="text/html; charset=<charset>"

change to

content="text/html; charset=UTF-8"


Darksun974

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: how to change the charset format for html export
« Reply #2 on: January 20, 2009, 12:40:24 AM »
ok  i will try this !! ty

Darksun974

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: how to change the charset format for html export
« Reply #3 on: January 20, 2009, 08:31:41 PM »
Hi Corey!
i'm sorry but your tips doesn't work  :(
 i'm changing the <charset> value to UTF-8 in the statsexport template. After the TD export Firefox and IE really detect the UTF8 format but the characteres on the page still coded in ISO. When i change the option to ISO-8859 in the navigator, the characters are good.
Other test, i convert this exported page in UTF-8 with Sisulzer's Kaboom Software and after this the characteres are good.
Can you make some test to tell me if this probleme is only on my computer or general.
Ty

Corey Cooper

  • Administrator
  • Hero Member
  • *****
  • Posts: 6216
    • View Profile
Re: how to change the charset format for html export
« Reply #4 on: January 21, 2009, 10:26:21 AM »
Sorry, I guess I wasn't clear in my last post.  You can override what the "charset" is set to in the exported file by changing it in the template, as I described.  But as I mentioned, the data will still be exported in the necessary encoding.  In other words, if there is no data that NEEDS to be encoded in Unicode, it will NOT be encoded in Unicode.

Furthermore, the file code doesn't have the ability to specify an encoding.  It can only specify ASCII or Unicode.  If the data can be written using ASCII, it does so.  If it cannot (if the data has characters that can not be encoded in ASCII), it switches to Unicode.  However, it cannot specify whether to use UTF-8, UTF-16, or UTF-32.  The system code makes that decision, and if I am not mistaken, I believe it defaults to UTF-16.  Right now, there's no way to force the TD to save files in Unicode (but I'll put something down to look into this).

So, the change to the template you made (and I described) is not necessary.  The exported file will either be encoded in ASCII or UTF-16, based on what is being exported.  And the charset tag will be set appropriately.

The only way I can think of to force it to be exported in Unicode is to make sure something that is being exported requires Unicode.  In other words, change a player name or the tournament description or other piece of data so that there is a Unicode character in it.

Darksun974

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: how to change the charset format for html export
« Reply #5 on: January 21, 2009, 10:25:15 PM »
ok Corey,
i've got the same conclusions with my other tests. I need to export the html page  in Unicode because i publish it (using the include php function) on a web site propulsed by wordpress. And wordpress was coded in UTF-8.
I will try what you describe in your post and tell you what happen.
Thank you again for your support
See u later

Corey Cooper

  • Administrator
  • Hero Member
  • *****
  • Posts: 6216
    • View Profile
Re: how to change the charset format for html export
« Reply #6 on: January 22, 2009, 09:57:50 AM »
One more thing to note: I recently did some work on the encoding portion of the TD, and what I discovered is that the Windows code that the TD uses to read and write files claims to support UTF-8 but does not (at least not correctly).  Therefore, from the TD perspective, it just won't work with UTF-8 files.

If you need your exported files to be UTF-8, your best bet is probably to export data normally from the TD, and then use another tool to convert the file to UTF-8.  You can even just open the file with Notepad and then "Save As...".  On the Save dialog, you can select the encoding.

It's an additional step, but will get the files into the format you want.

Darksun974

  • Newbie
  • *
  • Posts: 5
    • View Profile
Re: how to change the charset format for html export (RESOLVED)
« Reply #7 on: January 23, 2009, 02:13:31 AM »
Hi Corey and thank you for your reply !

If you need your exported files to be UTF-8, your best bet is probably to export data normally from the TD, and then use another tool to convert the file to UTF-8.  You can even just open the file with Notepad and then "Save As...".  On the Save dialog, you can select the encoding.

It's an additional step, but will get the files into the format you want.
I found a solution to encode the html page before include it in the web site.
I use a php script, i share the code, maybe it will help someone else:

Code: [Select]
<?php
         
$content file_get_contents('stats.html');
         
$binary fopen('stats2.html','wb');                 
 
$content=str_replace('charset=ISO-8859-1','charset=UTF-8',$content);  
         if (
fwrite($binaryutf8_encode($content))) print "$fichier";
         else print 
"ERROR - " $fichier "<br>";
         
fclose($binary); 
?>

<?php 
        
include('stats2.html'
?>


With this script i keep the auto update for the different exported pages and no need to use a external softaware to encore in UTF-8.

Thank you again Corey for your support and good luck.
« Last Edit: January 26, 2009, 02:44:05 AM by Darksun974 »