Converting Japanese to Unicode/HTML?
دھاگا پوسٹ کرنے والے: Paul Cohen
Paul Cohen
Paul Cohen  Identity Verified
گرین لینڈ
Local time: 07:56
انگریزیسےجرمن
+ ...
Jun 22, 2010

I have a question concerning placing Japanese texts on websites.

Recently a longstanding client asked my wife to add Japanese texts to an existing website (which my wife programmed herself using an HTML editor). The client said that he would have the translations done by a qualified translator and then forwarded to her.

Well, now she has received a Japanese text in a Word file but she can't figure out how to convert it into HTML code and put it on the website. She has
... See more
I have a question concerning placing Japanese texts on websites.

Recently a longstanding client asked my wife to add Japanese texts to an existing website (which my wife programmed herself using an HTML editor). The client said that he would have the translations done by a qualified translator and then forwarded to her.

Well, now she has received a Japanese text in a Word file but she can't figure out how to convert it into HTML code and put it on the website. She has found a website with Unicodes for Hiragana and Katakana characters, but that seems to cover just a small proportion of the characters.

Does anyone out there have any experience in this area?

Does an application exist that can convert Japanese characters into Unicode/HTML?

Any comments or ideas would be greatly appreciated.

Thanks in advance for your help!

Paul
Collapse


 
Katalin Horváth McClure
Katalin Horváth McClure  Identity Verified
امریکہ
Local time: 04:56
رکن (2002)
ہنگیریائیسےانگریزی
+ ...
What is the problem exactly? Jun 22, 2010

If she received a Word file, the Japanese text itself is most likely in Unicode.
I mean - does it display on her computer correctly?
As to how to put it into HTML - well, you just have to use Unicode for the encoding, and specify the language as Japanese. html lang="ja"
Use meta-tags for Unicode: content="text/html; charset=utf-8"

Or, if you want/need to use Shift-JIS encoding, then charset=Shift_JIS

So, there is no need to replace every character w
... See more
If she received a Word file, the Japanese text itself is most likely in Unicode.
I mean - does it display on her computer correctly?
As to how to put it into HTML - well, you just have to use Unicode for the encoding, and specify the language as Japanese. html lang="ja"
Use meta-tags for Unicode: content="text/html; charset=utf-8"

Or, if you want/need to use Shift-JIS encoding, then charset=Shift_JIS

So, there is no need to replace every character with a code number, if that's what you are thinking. That's not the way to go.

Maybe I am not clear about the question, but perhaps it would help to take a look at the source code of any Japanese webpage.
http://www.nikon.co.jp/
http://www.toyota.co.jp/
http://www.nissan.co.jp/

Katalin
Collapse


 
Madeleine MacRae Klintebo
Madeleine MacRae Klintebo  Identity Verified
برطانیہ
Local time: 09:56
انگریزیسےسویڈش
+ ...
From an amateur with an interest in character issues and web design Jun 22, 2010

Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:



If not, the best thing would probably be to do so. Next alternative is to use a converter, there are many free ones on the net. This should convert actual characters into html numbers or codes. Just remember, if you're using a free online tool you have to be careful with confidential information as well as:

"It's generally better, howeve
... See more
Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:



If not, the best thing would probably be to do so. Next alternative is to use a converter, there are many free ones on the net. This should convert actual characters into html numbers or codes. Just remember, if you're using a free online tool you have to be careful with confidential information as well as:

"It's generally better, however, to use the characters themselves rather than their Unicode NCRs in cases where a Web page has a lot of Chinese text, because Chinese characters take up less file space than their NCRs."

Reference: http://pinyin.info/tools/converter/chars2uninumbers.html

If none of this is helpful, maybe your site developer might want to read this:

http://www.joelonsoftware.com/articles/Unicode.html

BTW - it's usually safer to throw text into Notepad or similar before adding to a CMS to remove Word's (unnecessary) formatting.

Edited to add that the missing "bit" above (forgot to add spaces around tags) is the same as mentioned by Katalin.



[Edited at 2010-06-22 21:54 GMT]
Collapse


 
RieM
RieM  Identity Verified
امریکہ
Local time: 04:56
جاپانیسےانگریزی
+ ...
good ol' native2ascii Jun 22, 2010

I still use it. It's part of Java SDK.

Of course, there are text editors that support such conversion. But then, the file should be text format first.

I will be happy to take a look and covert it as you like. Just send the file from my profile page.

Rie


 
Paul Cohen
Paul Cohen  Identity Verified
گرین لینڈ
Local time: 07:56
انگریزیسےجرمن
+ ...
TOPIC STARTER
Exellent advice Jun 23, 2010

Thanks, Katalin, Madeleine and Rie.

Excellent advice! We'll look into it an let you know how things turn out.

Thanks again,

Paul (& Monika)


 
esperantisto
esperantisto  Identity Verified
Local time: 12:56
رکن (2006)
روسیسےانگریزی
+ ...
SITE LOCALIZER
Some remarks Jun 23, 2010

Katalin Horvath McClure wrote:

If she received a Word file, the Japanese text itself is most likely in Unicode.


Theoretically, it can be so called Far Eastern Word 6.0/95 format. But if it is opened in Word 97 to 2010, it is converted to Unicode on-the-fly.

As to how to put it into HTML - well, you just have to use Unicode for the encoding, and specify the language as Japanese. html lang="ja"
Use meta-tags for Unicode: content="text/html; charset=utf-8"


…and make sure you’re saving your HTML file in, respectively, Unicode UTF-8 (with or without BOM, that’s immaterial). I would suggest using a text/HTML editor with explicit encoding control such as jEdit.


 
Paul Cohen
Paul Cohen  Identity Verified
گرین لینڈ
Local time: 07:56
انگریزیسےجرمن
+ ...
TOPIC STARTER
Converting Japanese characters into Unicode Sep 30, 2010

Sorry that it took so long to get back to all of you.

This is the solution that we found:

http://www.cse.iitb.ac.in/~pratik/downloads/ConvertCharactersToUnicode.html

Just copy in the characters and you get Unicode. It works!

It also appears to work for Hindi, Sanskrit, Malayalam and Chinese characters.
<
... See more
Sorry that it took so long to get back to all of you.

This is the solution that we found:

http://www.cse.iitb.ac.in/~pratik/downloads/ConvertCharactersToUnicode.html

Just copy in the characters and you get Unicode. It works!

It also appears to work for Hindi, Sanskrit, Malayalam and Chinese characters.

Best regards,

Paul
Collapse


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 10:56
ہنگیریائیسےانگریزی
+ ...
Tags Sep 30, 2010

Not that it matters much at this point, but if anyone wants to post tags in the forum, remember to use character entities, not actual angle brackets. I.e. write &lt; instead of <, because otherwise the forum motor misinterprets your tags as, well, tags.

This post shows how things go wrong:

Madeleine MacRae Klintebo wrote:

Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:





This is how it looks - as intended - if you use lt and gt:
Madeleine MacRae Klintebo wrote:

Might be a silly question, but has the web designer/developer remembered to add support for Unicode in the header? Like:

<meta http-equiv="content-type" content="text/html;charset=utf-8" />



 
Soonthon LUPKITARO(Ph.D.)
Soonthon LUPKITARO(Ph.D.)  Identity Verified
تھائی لینڈ
Local time: 16:56
رکن (2004)
تھائیسےانگریزی
+ ...
MS Word text format Oct 1, 2010

Since the texts in question are in MS Word, one of the easiest ways to convert is saving the file as text, and select option as Unicode (Unicode-8, Unicode-7) etc. These fonts are shown correctly in HTML file with Unicode font enabled on the header tag line.

Soonthon Lupkitaro


 


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Converting Japanese to Unicode/HTML?






Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Pastey
Your smart companion app

Pastey is an innovative desktop application that bridges the gap between human expertise and artificial intelligence. With intuitive keyboard shortcuts, Pastey transforms your source text into AI-powered draft translations.

Find out more »