Pages in topic:   [1 2] >
Free TM in 24 languages! The DGT TM
Thread poster: Philippe Locquet
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 03:59
English to French
+ ...
Jun 9, 2017

Hello To all,

I'm creating this thread to inform all that do not know yet that you can have and use a big translation memory made available to the public by the EU, it’s called the DGT TM. This TM includes 24 languages and depending on the languages, if you merge all the entries released in 2017 you will have on average over 400 000 TUs per language pair. If you make use of earlier releases it will be even bigger.

To get it, there are a few steps to the process (3) wh
... See more
Hello To all,

I'm creating this thread to inform all that do not know yet that you can have and use a big translation memory made available to the public by the EU, it’s called the DGT TM. This TM includes 24 languages and depending on the languages, if you merge all the entries released in 2017 you will have on average over 400 000 TUs per language pair. If you make use of earlier releases it will be even bigger.

To get it, there are a few steps to the process (3) which I describe in the video I’ve posted on YouTube for all here: https://youtu.be/wVeU9NKEYjM
I have put direct download links to already extracted TMs From the DGT TM in a few language pairs in the video description.

This TM was suggested by Milan Condak, many thanks 😉, ( http://www.proz.com/profile/37344 ) in a thread I created about two other free assets: The VLTM TM and the IATE Glossary that can be connected in Wordfast Pro 5 here: http://www.proz.com/forum/wordfast_support/314785-new_working_feature_in_wordfast_pro_5_connect_to_free_remote_tms_and_remote_glossaries.html#2651147

Feel free to discuss it further here!
My bests to all! 😊
Collapse


Anthony Teixeira
Ann Marie Bohan
José Manuel Miana
 
Tatiana Grehan
Tatiana Grehan  Identity Verified
United States
Local time: 23:59
English to Russian
+ ...
Thanks for the information! Jun 9, 2017

Does it include a TM for the Russian language?

 
Milan Condak
Milan Condak  Identity Verified
Local time: 04:59
English to Czech
No Jun 9, 2017

Tatiana Grehan wrote:

Does it include a TM for the Russian language?


In EU is 31 countries and only 24 official languages. After brexit will English for part of Cyprus, Gibraltar, ...

(I suggest to replace an English with Czech: it is Slavonic (en_US: Slavic) language in Latin-2 and encoding is similar to Baltic languages. Now are three EU languages: EN, DE, FR; german, roman, roman. This makes no sense.)

Milan


 
Milan Condak
Milan Condak  Identity Verified
Local time: 04:59
English to Czech
Old presentations Jun 9, 2017

Fi2 n Co wrote:

This TM was suggested by Milan Condak, many thanks 😉, ( http://www.proz.com/profile/37344 )



Here are links to some of my presentations on DGT in Czech language:

1. http://www.condak.cz/nove/2016-05/29/cs/00.html

Vyhledávání v databázi DGT

Wordfast Classic a Wordfast Server, 29.05.2016
--
2. http://www.condak.net/tmx/tmx-dgt-pctrans/cs/00.html

TM DGT v PC Translatoru 2012, 23.10.2011
--
3. http://www.condak.cz/archiv-cz/2009-01/01-25/cs/00.html

PC Translator 2009 + HU, 25.01.2009
--
Before DGT was avalaible database JRC Acquis.

Another resource for non-EU languages are e.g. in OPUS .. the open parallel corpus

http://opus.lingfil.uu.se/index.php

One section are DGT files: http://opus.lingfil.uu.se/DGT.php

==
But all need some work: with downloading and extracting of TMX a story only begins.

Milan

[Edited at 2017-06-09 19:17 GMT]


 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 03:59
English to French
+ ...
TOPIC STARTER
List of languages Jun 9, 2017

Tatiana Grehan wrote:

Does it include a TM for the Russian language?


Hi,

There's a list with statistics and all released languages on the DGT TM page, you can see it here: http://optima.jrc.it/Resources/DGT-TM_Statistics.pdf

My bests


 
Nuno Oliveira
Nuno Oliveira  Identity Verified
Portugal
English to Portuguese
TMXtract.jar not executing solution Jun 12, 2017

First of all, let me thank you for sharing this with us! This is great stuff.

If someone can't get the TMXtract.jar file to execute my advice is to just download and run jarfix. It will fix it.


 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 03:59
English to French
+ ...
TOPIC STARTER
Thanks Jun 13, 2017

Nuno Oliveira wrote:

First of all, let me thank you for sharing this with us! This is great stuff.

If someone can't get the TMXtract.jar file to execute my advice is to just download and run jarfix. It will fix it.


Thanks Nuno, yes I hope it will help many since free TMs like the VLTM have some time limited entries in some language pairs.

Thank you for this tip. I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.
It will be interesting to get feedback from others to see if this gave them a permanent fix, looks promising!

My bests



[Modifié le 2017-06-13 11:33 GMT]


 
Rolf Keller
Rolf Keller
Germany
Local time: 04:59
English to German
Java :-( Jun 14, 2017

Fi2 n Co wrote:

I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.


On my Windows 10 the .jar file doesn't work. Moreover I have to restart the PC after trying the .jar, because the cpu load goes up to 50 percent and never goes down again.

On my old Vista PC it works, though.


 
Rolf Keller
Rolf Keller
Germany
Local time: 04:59
English to German
Getting the most from the EU's glossaries Jun 14, 2017

You don't need any CAT software to benefit from the EU's glossaries. All these (and other) .tbx and .tmx files can be searched by Omni-Lookup. At the same time you can search several other glossaries in the web or offline (e. g. Acolada or Excel). One click for your preferred portfolio of resources. See www.omni-lookup.de.

 
Philippe Locquet
Philippe Locquet  Identity Verified
Portugal
Local time: 03:59
English to French
+ ...
TOPIC STARTER
Thanks for this esxperience Jun 14, 2017

Rolf Keller wrote:

Fi2 n Co wrote:

I have't tried it because usually the problem is solved by reinstalling Java, and I haven't seen the issue in Windows 10.


On my Windows 10 the .jar file doesn't work. Moreover I have to restart the PC after trying the .jar, because the cpu load goes up to 50 percent and never goes down again.

On my old Vista PC it works, though.


Hi Rolf,

Sorry it didn't work on yours (W10). Did you try Jarfix to solve this then?
Did you get the 1607 or 1703 update? When you get a major update, you could uninstalling Java before restarting for the update to install. Then install Java again. This may help with some issues.

My bests

Win 10 versions here: https://en.wikipedia.org/wiki/Windows_10_version_history

[Modifié le 2017-06-14 14:03 GMT]


 
Susan He
Susan He
Netherlands
Local time: 04:59
Dutch to English
+ ...
Error with importing tmx file into MemoQ Jun 29, 2017

So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbase... See more
So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbases and translation memories? (Free or paid license)Collapse


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 03:59
Member (2009)
Dutch to English
+ ...
Yes there are. Here are some of the best: Jun 30, 2017

Niann-Tsyr wrote:

So I followed the steps and completed the extraction to a tmx file with the language pair I wanted. I imported it into MemoQ via Resource Console TM tab. It took a long time (didn't time that). Then it was done apparently (window closed on finish, should have unchecked the box, since I didn't see what happened), but when I looked at the TM info zero entries -__- The tmx are supposed to be aligned, right? Or do I need to do that still? Also, are there better programs to use just only for termbases and translation memories? (Free or paid license)


http://www.farkastranslations.com/tmlookup.php (amazing TM/terminology concordancer)
https://www.xbench.net/ (great all-round terminology tool)
https://github.com/heartsome/tmxeditor8 Heartsome (TMX editor)
http://prdownloads.sourceforge.net/okapi/Olifant-R00022.zip?download (Olifant TMX editor)

Michael

[Edited at 2017-06-30 09:01 GMT]


 
FarkasAndras
FarkasAndras  Identity Verified
Local time: 04:59
English to Hungarian
+ ...
Xbench Jun 30, 2017

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


 
Michael Beijer
Michael Beijer  Identity Verified
United Kingdom
Local time: 03:59
Member (2009)
Dutch to English
+ ...
yes, that RAM is a problem Jun 30, 2017

FarkasAndras wrote:

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


I only use Xbench for glossaries, because of the RAM-loading limitation you mentioned. I use TMLookup for TMs.


 
Noe Tessmann
Noe Tessmann  Identity Verified
Local time: 04:59
English to German
+ ...
Updates for your EU TM alignments? Jun 30, 2017

Hi Andras,

another question concerning your EU TM alignments. Are there any updates? What happened to the project? I still use your highly valuable TMs.

Kind regards

Noe



FarkasAndras wrote:

How does xbench do with absurdly large datasets, such as tens of millions of segments?
The last time I checked it, it just loaded everything into RAM, which is fast and convenient for small TMs but it obviously doesn't work if your TM file is bigger than your RAM, and it generally starts to get unworkable long before that.

[Edited at 2017-06-30 09:51 GMT]


 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

Free TM in 24 languages! The DGT TM







Trados Studio 2022 Freelance
The leading translation software used by over 270,000 translators.

Designed with your feedback in mind, Trados Studio 2022 delivers an unrivalled, powerful desktop and cloud solution, empowering you to work in the most efficient and cost-effective way.

More info »
Anycount & Translation Office 3000
Translation Office 3000

Translation Office 3000 is an advanced accounting tool for freelance translators and small agencies. TO3000 easily and seamlessly integrates with the business life of professional freelance translators.

More info »