i am not getting kannada text when i run the perl script on a file -


i having following code extracting text html files , writing text file. in html contain kannada text(utf-8) when programs runs getting text file in getting text not in proper formate. text in unreadable formate

enter code here use utf8; use html::formattext; $string = html::formattext->format_file( 'a.html', leftmargin => 0, rightmargin => 50 ); open mm,">t1.txt"; print mm "$string"; 

so please me.how handle file formates while processing it.

if understand correctly, want output file utf-8 encoded characters kannada language encoded in output correctly. code trying (and failing) encode incorrectly iso-8859-1 instead.

if so, can make sure file opened utf-8 encoding filter.

use html::formattext;  open $htmlfh, '<:encoding(utf-8)', 'a.html' or die "cannot open a.html: $!"; $content = { local $/; <$htmlfh> }; # read content file close $htmlfh;  $string = html::formattext->format_string(     $content,     leftmargin => 0, rightmargin => 50 );  open $mm, '>:encoding(utf-8)', 't1.txt' or die "cannot open t1.txt: $!"; print $mm $string; 

for further reading, recommend checking out these docs:

a few other notes:

  • the use utf8 line makes perl script/library may contain utf formatting. not make changes how read or write files.
  • avoid using two-argument forms of open() in example. may allow malicious user compromise system in cases. (though, usage in example happens safe.
  • when opening file, need add or die afterwards or failures read or write file silently ignored.

update 3/12: changed read file in utf-8 , send html::formattext. if a.html file saved bom character @ start, may have done right thing anyway, should make assume utf-8 incoming file.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

android - Keyboard hides my half of edit-text and button below it even in scroll view -

css - Make div keyboard-scrollable in jQuery Mobile? -