An Online Service for SUbtitling by MAchine Translation
|
|
|
- Marilynn Small
- 10 years ago
- Views:
Transcription
1 SUMAT CIP-ICT-PSP An Online Service for SUbtitling by MAchine Translation Annual Public Report 2011 Editor(s): Contributor(s): Reviewer(s): Status-Version: Volha Petukhova, Arantza del Pozo Mirjam Sepesy Maucec, Lindsay Bywood Consortium Final Date: 15th November 2011
2 Table of Contents 1. Introduction Summary of activities Dissemination Future work Further Information Appendix: Project Title: SUMAT 2 Contract No. CIP-ICT-PSP
3 1. Introduction Current European policies aim to make audiovisual and multimedia content widely available across languages to promote cultural and linguistic diversity in Europe, and to make content accessible to people with visual and hearing disabilities through the use of sign-language, subtitling, audio-description and easily understandable menu navigation. In such a framework, subtitling plays an important role, being the preferred multimedia content translation method in most European countries and for most genres to make audiovisual content widely accessible across languages. For these reasons, the demand for subtitling by the European audiovisual industry has increased significantly in recent years. However, subtitling and subtitle translation face some important problems that are preventing the expansion of the market and are therefore hindering new business opportunities: cost, time and quality. There is a clear need to increase the productivity of current subtitle translation procedures, reducing costs and turnaround times while enhancing the quality of the translation results. Also, subtitling and audiovisual translation have been recognized as areas that could greatly benefit from the introduction of Statistical Machine Translation (SMT) followed by post-editing, in order to increase productivity and enhance the quality of results. The SUMAT project aims to increase the efficiency and productivity of the European subtitle industry, while enhancing the quality of its results, thanks to the effective introduction of SMT technologies in the subtitle translation processes. SUMAT will develop an online subtitle translation service addressing 9 different European languages combined into 14 different language pairs, with the aim to semi-automatize on a large scale the subtitle translation processes usually performed by both freelance translators and subtitling companies, in order to optimize their efficiency and productivity thereby helping them to meet market demands. The language pairs are: English-Dutch; English-French; English- German; English-Portuguese; English-Spanish; English-Swedish and Serbian-Slovenian. The translation service will work in both directions. It is worthwhile noting that in addition to languages with high impact (English, Spanish, French, German) and those with a lower impact but a large subtitling market (Dutch, Swedish, Portuguese), SUMAT also addresses two lessresourced languages, namely Serbian and Slovenian. Currently, there are no effective tools or services that can provide automatic subtitle machine translation. The main limitation is the lack of sufficient high-quality parallel subtitle corpora, required to train the SMT models. Professionally produced high-quality subtitle data is the property of subtitling companies or their clients. Moreover, data is used and stored in various subtitle formats, some of which are proprietary. All this makes access to high-quality data for research and development purposes rather problematic. These issues were addressed in SUMAT at the very first project stage, together with the hardware and software infrastructure of the pilot service and its functionalities. Project Title: SUMAT 3 Contract No. CIP-ICT-PSP
4 The rest of this document describes the progress of the SUMAT project so far in more detail, together with the corresponding results and future plans. 2. Summary of activities The SUMAT kick-off took place in April at Vicomtech s facilities in San Sebastian. For the first seven months of the project, the technical work of the consortium has mainly involved Work Packages 2 and 3. Within WP2 Definition and specification of required corpora, SMT infrastructure and online service functionalities we have defined and specified the subtitle corpora, software tools and hardware infrastructure required to develop the SUMAT pilot service, whose functionalities have also been refined. Corpora A key task within SUMAT is to obtain high-quality subtitle data from the professional subtitle translation companies of the consortium. From previous experiments reported in the literature, we know that around parallel subtitles are needed to obtain good results from SMT systems, with best results obtained by even more subtitles (around 1 million). Because the quality of the SMT results will depend to a great extent on the number of subtitles available, subtitling companies inspected their archives in more detail and estimated more precisely the amounts of subtitles they could deliver. Two types of subtitle data will be collected: parallel and monolingual. Parallel subtitles form the basis on which SMT systems will be trained. Monolingual subtitles will be used to build larger target language models for SMT, an approach that has been shown to be beneficial in most instances, and in particular for language pairs which only have smaller training sets. Tables 1 and 2 show the re-estimated amounts of subtitles available. Subtitle companies have reported that they have larger amounts of subtitle corpora available than initially estimated, which should have a positive impact on the performance of the SMT systems of subtitles to be developed. Total amount of available parallel subtitles PARALLEL CORPORA Initially Re-estimated estimated English German English French English - Spanish English - Dutch English - Swedish English - Portuguese Serbian - Slovenian Table 1. Re-estimated amount of available parallel subtitles Project Title: SUMAT 4 Contract No. CIP-ICT-PSP
5 MONOLINGUAL CORPORA Total amount of subtitles Initially estimated Re-estimated English German French Dutch Swedish Portuguese Table 2: Re-estimated amount of available monolingual subtitles Software and hardware infrastructure Professional data of high quality, however, cannot be directly used as SMT training material. An aligned parallel corpus needs to be compiled first. This requires a range of software tools to deal with the diversity of formats and encodings and to pre-process the raw data. In addition, software tools are also required to linguistically annotate and translate subtitles for each SUMAT language and language pair. In WP2, we have defined and specified the set of software components needed for subtitle format conversion, pre-processing, translation and linguistic annotation. With the help of the subtitling companies, we have compiled a list of the subtitle formats most widely employed within the subtitling industry (see Table 3). These include both proprietary and non-proprietary formats. Since the SUMAT pilot service will only support the nonproprietary ones, converters for non-proprietary formats into and from plain text need to be developed. It is worth noting that SUMAT plans to support the EBU TT format, a new XML standard whose definition is currently being finalized by the W3C group and the European Broadcast Union. PROPRIETARY NON-PROPRIETARY.o32 EBU STL.s32 TXT.x32 SRT.890 XML.pac.ezt EBU TT Table 3: Subtitle formats most widely employed by the members of the consortium The pre-processing tools to be developed within the project will include solutions for language identification, document alignment, normalization and tokenization, sentence splitting, sentence alignment and subtitle alignment. Project Title: SUMAT 5 Contract No. CIP-ICT-PSP
6 A number of stable, mature and freely available tools will be employed to develop the SMT systems. Those include alignment tools such as Giza++, language modeling tools such as SRILM and IRSLT, and decoders such as Moses. Existing linguistic annotation tools for Part-of-Speech tagging, lemmatization and compound splitting, and syntactic parsing will be used and adapted for the different SUMAT languages. In addition, we will adapt/develop our own tools for those languages for which no suitable tools are available. Table 4 summarizes the linguistic annotation tools which will be involved in the project. TOOLS English German French Spanish POS tagging TreeTagger HunPos Lemmatization and compound splitting RASP system LT-TTT2 tools Standford CoreNLP Textshuttle in-house tools (to be adapted/developed) TreeTagger Morfette FreeLing TreeTagger Syntactic parsing MALT parser Berkley parser MALT parser Pro3Gres dependency parser MALT parser FreeLing Dutch Alpino parser Frog tools Alpino parser; Frog Swedish TreeTagger HunPos Textshuttle in-house tools (to be adapted/developed) MALT parser Portuguese FreeLing FreeLing Freeling Serbian University of Belgrade tools MARIBOR in-house tools (to be adapted/developed) Slovenian SPREAD tools JOS tools Table 4: Overview of the linguistic annotation tools to be used in SUMAT Within WP2, we have also defined the hardware infrastructure of the SUMAT pilot service shown in Figure 1, which will be distributed among the technical partners of the consortium. Figure 1. Hardware infrastructure Project Title: SUMAT 6 Contract No. CIP-ICT-PSP
7 There will be a server dedicated to subtitle data storage. The Web Application Server will host the user interface application that allows the users to interact with the system. It will also be responsible for orchestrating the interactions between the various modules of the pilot service: format conversion, workflow management, subtitle repository, post-editing etc. The technical partners will host servers dedicated to process the translation requests in their assigned language pairs. Online service functionalities The functionalities of the online subtitle service were identified from the feedback provided by the subtitling companies acting as end-users. Two different use cases of the pilot service are foreseen: demo and professional. The SUMAT Demo will target the general public and aim to demonstrate the potential of the SUMAT technology and approach. The SUMAT Professional Translation Tool will constitute a professional product for machine translation of subtitles and target the subtitling industry. The functionalities of both services regarding user registration, file uploading, source and target language specification, supported subtitle formats and formatting tags, workflow, resulting translated files, post-editing, feedback and user interface have been examined and elaborated upon. As a result, the detailed initial mock-ups shown in Figures 2 and 3 have been designed in order to be refined later in the project by the subtitling partners acting as endusers. Figure 2: Mock-up of the SUMAT Professional Translation Tool: (left) Home page; (right) Post-editing page Project Title: SUMAT 7 Contract No. CIP-ICT-PSP
8 Figure 3: Mock-up of the SUMAT Demo Within WP3 Corpus collection and alignment, subtitling companies have been delivering subtitle data while the technical partners of the consortium have developed the format converters and the tools required for its pre-processing. Subtitle corpora collection and alignment are underway and expected to be completed by the end of January. Corpus collection An FTP server with a clear and simple folder and subfolder structure has been set up and is running as the central point for subtitle data collection. Each partner has their own login details and rights. The files are uploaded on the FTP server, converted into plain text, preprocessed and stored back on the server. The delivery of corpora by the subtitling companies has been following a prearranged schedule and will be completed by the end of the year. Format converters Converters have been developed for the majority of the non-proprietary subtitle formats to be supported by the SUMAT pilot service. EBU STL, TXT and SRT subtitle files can already be converted to and from plain text through the developed conversion utility set up as a web service. The converter for the EBU TT standard is still underway, awaiting the final definition of the standard for its complete implementation. Project Title: SUMAT 8 Contract No. CIP-ICT-PSP
9 Corpus alignment The technical partners have defined and set up a pipeline to pre-process and align the corpora being delivered by the subtitling companies. This has involved integrating existing tools for language identification and developing subtitle file alignment, normalization, tokenization, sentence splitting, sentence alignment and subtitle alignment scripts. The subtitle file alignment approach compares the time-codes that specify each subtitle s start and end time frames and measures their correspondence between two files. In order to cope with time-code differences, offsets, untranslated subtitles and timeline shifts, the implemented algorithm also matches shifted documents based on dynamic programming. A similar approach is being used for subtitle alignment. For sentence alignment, two different approaches are being explored: text-independent based on time-code information and textdependent based on bilingual dictionaries automatically generated through sentence-length alignment. Each technical partner has started pre-processing and aligning the parallel and monolingual corpora harvested by the subtitling companies according to their assigned languages and language pairs. Preliminary informal evaluations of subtitle file, sentence and subtitle alignments are showing good results on the already pre-processed data for some languages and language pairs. More precise alignment evaluations of the collected subtitle corpora will be carried out and documented in WP8 Evaluation of modules. Results are also planned to be reported at the LREC 2012 conference. 3. Dissemination During the development of the first version of the project dissemination plan, an early exploration of the potential dissemination opportunities was performed. We have identified an extensive list of both industrial and scientific events during the lifetime of the project where SUMAT is planning to participate. So far, the project partners have attended the following events: Event Place Date Partners participated in event Purpose META FORUM 2011 Budapest June 27 28, 2011 VIC, DDS, MARIBOR, invision Project presentation 4th Media for All London June 28 July 1, 2011 VSI, DDS Project presentation MIPCOM 2011 Cannes October 3-6, 2011 DDS Project presentation AVT 2011 Krakow October 14-15, 2011 TEXTSHUTTLE Project presentation Table 5. Dissemination activities performed In addition, the Internet dissemination activities have been ongoing since the start of the project. We have set up the project website and wiki and compiled the project factsheet, logo and templates. SUMAT is also active in social networks such as LinkedIn and Twitter. A Google Project Title: SUMAT 9 Contract No. CIP-ICT-PSP
10 Adwords campaign has also been arranged. For online promotion, the SUMAT website has been linked from the partners websites. Complementary dissemination material for effective project presentation such as leaflets, posters, banners, usb-sticks (including a flash presentation of the project), pens and t-shirts have also been designed and their production is underway (see Appendix). Regarding liaison activities with other EU projects, SUMAT has signed a collaboration agreement with META-NET and participated in the META-Exhibition of the META-FORUM 2011 event held in June in Budapest. We have also established cooperation with the CESAR project. They are working on the compilation and development of language resources for the Serbian language, which we are planning to use in SUMAT. An abstract has been submitted to LREC 2012, where we plan to publish statistics, error analysis and alignment evaluation results of the final subtitle corpora compiled in WP3. 4. Future work The SUMAT future work will involve: - finalizing the collection and alignment of the subtitle corpora; - training baseline SMT systems; - enriching the baselines with linguistic annotations to achieve optimal results; - developing the online pilot service; - and finally, evaluating the SUMAT approach with the subtitling companies acting as end-users. These tasks will follow the more specific time plan shown in the following diagram: 1. Compilation of final parallel corpora January Online service specification development infrastructure March Baseline MT systems for 2 language pairs April Baseline SMT systems for 4 language pairs May Baseline SMT systems for all language pairs June Evaluation of baseline SMT systems June 2012 Project Title: SUMAT 10 Contract No. CIP-ICT-PSP
11 7. Online service version 1 June POS annotated subtitle for training taggers August POS taggers for subtitles August Adapted dependency parsers for subtitles October Annotated treebanks for dependency parsing October Evaluation of the impact of dependency parsing on SM October Final SMT systems for EN-NL, EN-DE November Compound splitter integrated in SMT systems December Adapted NER for subtitles December Factored models January Final SMT EN-FR, EN-ES February Evaluation of linguistic annotation by trained software May Evaluation improved SMT systems May SMT systems for EN-PT, EN-DV, EN-SV, SB-SL May Online service version 2 May Test-cases and overall system and service evaluation March Exploitation plan March Further Information For further information please visit the SUMAT web site at for information on the project and its progress. Project Title: SUMAT 11 Contract No. CIP-ICT-PSP
12 Appendix: Leaflets Generic poster and banner Poster template for scientific dissemination Project Title: SUMAT 12 Contract No. CIP-ICT-PSP
An Online Service for SUbtitling by MAchine Translation
SUMAT CIP-ICT-PSP-270919 An Online Service for SUbtitling by MAchine Translation Annual Public Report 2012 Editor(s): Contributor(s): Reviewer(s): Status-Version: Arantza del Pozo Mirjam Sepesy Maucec,
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles
SUMAT: Data Collection and Parallel Corpus Compilation for Machine Translation of Subtitles Volha Petukhova 1, Rodrigo Agerri 2, Mark Fishel 3, Yota Georgakopoulou 4, Sergio Penkale 5, Arantza del Pozo
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast
Automatic Speech Recognition and Hybrid Machine Translation for High-Quality Closed-Captioning and Subtitling for Video Broadcast Hassan Sawaf Science Applications International Corporation (SAIC) 7990
Survey Results: Requirements and Use Cases for Linguistic Linked Data
Survey Results: Requirements and Use Cases for Linguistic Linked Data 1 Introduction This survey was conducted by the FP7 Project LIDER (http://www.lider-project.eu/) as input into the W3C Community Group
Dutch Parallel Corpus
Dutch Parallel Corpus Lieve Macken [email protected] LT 3, Language and Translation Technology Team Faculty of Applied Language Studies University College Ghent November 29th 2011 Lieve Macken (LT
Machine Translation at the European Commission
Directorate-General for Translation Machine Translation at the European Commission Konferenz 10 Jahre Verbmobil Saarbrücken, 16. November 2010 Andreas Eisele Project Manager Machine Translation, ICT Unit
Environment (including Climate Change) Deliverable No: D2. FIRESENSE Web Site. 28 February 2010. 17 March 2010
Project Title: Contract No: Instrument: Thematic Priority: FIRESENSE: Fire Detection and Management through a Multi-Sensor Network for the Protection of Cultural Heritage Areas from the Risk of Fire and
Collaborative Machine Translation Service for Scientific texts
Collaborative Machine Translation Service for Scientific texts Patrik Lambert [email protected] Jean Senellart Systran SA [email protected] Laurent Romary Humboldt Universität Berlin
What Is the Productivity Gain in Machine Translation of Subtitles?
What Is the Productivity Gain in Machine Translation of Subtitles? Martin Volk University of Zurich, Switzerland Mark Fishel TextShuttle, Switzerland Lindsay Bywood VSI and Imperial College, UK Yota Georgakopoulou
Question template for interviews
Question template for interviews This interview template creates a framework for the interviews. The template should not be considered too restrictive. If an interview reveals information not covered by
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 CWMT2011 技 术 报 告
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems for CWMT2011 Jin Yang and Satoshi Enoue SYSTRAN Software, Inc. 4444 Eastgate Mall, Suite 310 San Diego, CA 92121, USA E-mail:
PROMT Technologies for Translation and Big Data
PROMT Technologies for Translation and Big Data Overview and Use Cases Julia Epiphantseva PROMT About PROMT EXPIRIENCED Founded in 1991. One of the world leading machine translation provider DIVERSIFIED
Language and Computation
Language and Computation week 13, Thursday, April 24 Tamás Biró Yale University [email protected] http://www.birot.hu/courses/2014-lc/ Tamás Biró, Yale U., Language and Computation p. 1 Practical matters
WebLicht: Web-based LRT services for German
WebLicht: Web-based LRT services for German Erhard Hinrichs, Marie Hinrichs, Thomas Zastrow Seminar für Sprachwissenschaft, University of Tübingen [email protected] Abstract This software
Chapter 8. Final Results on Dutch Senseval-2 Test Data
Chapter 8 Final Results on Dutch Senseval-2 Test Data The general idea of testing is to assess how well a given model works and that can only be done properly on data that has not been seen before. Supervised
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge
The SYSTRAN Linguistics Platform: A Software Solution to Manage Multilingual Corporate Knowledge White Paper October 2002 I. Translation and Localization New Challenges Businesses are beginning to encounter
Customizing an English-Korean Machine Translation System for Patent Translation *
Customizing an English-Korean Machine Translation System for Patent Translation * Sung-Kwon Choi, Young-Gil Kim Natural Language Processing Team, Electronics and Telecommunications Research Institute,
PROGRAMME MED. Communication Strategy Roadmap. Mediterranean Transnational Technology Transfer. C1.1 Communication Component Responsible Partner: NHRF
Project co-funded by the EUROPEAN REGIONAL DEVELOPMENT FUND PROGRAMME MED Mediterranean Transnational Technology Transfer Communication Strategy Roadmap C1.1 Communication Component Responsible Partner:
Translation Solution for
Translation Solution for Case Study Contents PROMT Translation Solution for PayPal Case Study 1 Contents 1 Summary 1 Background for Using MT at PayPal 1 PayPal s Initial Requirements for MT Vendor 2 Business
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship
The PALAVRAS parser and its Linguateca applications - a mutually productive relationship Eckhard Bick University of Southern Denmark [email protected] Outline Flow chart Linguateca Palavras History
Collecting Polish German Parallel Corpora in the Internet
Proceedings of the International Multiconference on ISSN 1896 7094 Computer Science and Information Technology, pp. 285 292 2007 PIPS Collecting Polish German Parallel Corpora in the Internet Monika Rosińska
Statistical Machine Translation
Statistical Machine Translation Some of the content of this lecture is taken from previous lectures and presentations given by Philipp Koehn and Andy Way. Dr. Jennifer Foster National Centre for Language
SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer
SWIFT Aligner, A Multifunctional Tool for Parallel Corpora: Visualization, Word Alignment, and (Morpho)-Syntactic Cross-Language Transfer Timur Gilmanov, Olga Scrivner, Sandra Kübler Indiana University
XTM for Language Service Providers Explained
XTM for Language Service Providers Explained 1. Introduction There is a new generation of Computer Assisted Translation (CAT) tools available based on the latest Web 2.0 technology. These systems are more
Central and South-East European Resources in META-SHARE
Central and South-East European Resources in META-SHARE Tamás VÁRADI 1 Marko TADIĆ 2 (1) RESERCH INSTITUTE FOR LINGUISTICS, MTA, Budapest, Hungary (2) FACULTY OF HUMANITIES AND SOCIAL SCIENCES, ZAGREB
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia
Computer Assisted Language Learning (CALL): Room for CompLing? Scott, Stella, Stacia Outline I What is CALL? (scott) II Popular language learning sites (stella) Livemocha.com (stacia) III IV Specific sites
Extraction and Visualization of Protein-Protein Interactions from PubMed
Extraction and Visualization of Protein-Protein Interactions from PubMed Ulf Leser Knowledge Management in Bioinformatics Humboldt-Universität Berlin Finding Relevant Knowledge Find information about Much
Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features
, pp.273-280 http://dx.doi.org/10.14257/ijdta.2015.8.4.27 Tibetan-Chinese Bilingual Sentences Alignment Method based on Multiple Features Lirong Qiu School of Information Engineering, MinzuUniversity of
IBM Content Analytics with Enterprise Search, Version 3.0
IBM Content Analytics with Enterprise Search, Version 3.0 Highlights Enables greater accuracy and control over information with sophisticated natural language processing capabilities to deliver the right
Comprendium Translator System Overview
Comprendium System Overview May 2004 Table of Contents 1. INTRODUCTION...3 2. WHAT IS MACHINE TRANSLATION?...3 3. THE COMPRENDIUM MACHINE TRANSLATION TECHNOLOGY...4 3.1 THE BEST MT TECHNOLOGY IN THE MARKET...4
Localizing dynamic websites created from open source content management systems
Localizing dynamic websites created from open source content management systems memoqfest 2012, May 10, 2012, Budapest Daniel Zielinski Martin Beuster Loctimize GmbH [daniel martin]@loctimize.com www.loctimize.com
Hybrid Machine Translation Guided by a Rule Based System
Hybrid Machine Translation Guided by a Rule Based System Cristina España-Bonet, Gorka Labaka, Arantza Díaz de Ilarraza, Lluís Màrquez Kepa Sarasola Universitat Politècnica de Catalunya University of the
MEDAR Mediterranean Arabic Language and Speech Technology An intermediate report on the MEDAR Survey of actors, projects, products
MEDAR Mediterranean Arabic Language and Speech Technology An intermediate report on the MEDAR Survey of actors, projects, products Khalid Choukri Evaluation and Language resources Distribution Agency;
CINTIL-PropBank. CINTIL-PropBank Sub-corpus id Sentences Tokens Domain Sentences for regression atsts 779 5,654 Test
CINTIL-PropBank I. Basic Information 1.1. Corpus information The CINTIL-PropBank (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed
M3039 MPEG 97/ January 1998
INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND ASSOCIATED AUDIO INFORMATION ISO/IEC JTC1/SC29/WG11 M3039
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006
The XMU Phrase-Based Statistical Machine Translation System for IWSLT 2006 Yidong Chen, Xiaodong Shi Institute of Artificial Intelligence Xiamen University P. R. China November 28, 2006 - Kyoto 13:46 1
D5.5 Initial EDSA Data Management Plan
Project acronym: Project full : EDSA European Data Science Academy Grant agreement no: 643937 D5.5 Initial EDSA Data Management Plan Deliverable Editor: Other contributors: Mandy Costello (Open Data Institute)
Learning Translations of Named-Entity Phrases from Parallel Corpora
Learning Translations of Named-Entity Phrases from Parallel Corpora Robert C. Moore Microsoft Research Redmond, WA 98052, USA [email protected] Abstract We develop a new approach to learning phrase
Hybrid Strategies. for better products and shorter time-to-market
Hybrid Strategies for better products and shorter time-to-market Background Manufacturer of language technology software & services Spin-off of the research center of Germany/Heidelberg Founded in 1999,
Convergence of Translation Memory and Statistical Machine Translation
Convergence of Translation Memory and Statistical Machine Translation Philipp Koehn and Jean Senellart 4 November 2010 Progress in Translation Automation 1 Translation Memory (TM) translators store past
Schema documentation for types1.2.xsd
Generated with oxygen XML Editor Take care of the environment, print only if necessary! 8 february 2011 Table of Contents : ""...........................................................................................................
WEB& WEBSITE DESIGN TRAINING
WEB& WEBSITE DESIGN TRAINING Introduction to Websites Course Content: Introduction to Web Technologies Protocols and Port Numbers Domain Names, DNS and Domaining Client and Server Software. Static, Dynamic
Boundary Commission for England Website technical development - Statement of Work. Point of Contact for Questions. Project Director.
Point of Contact for Questions Project Director Project Manager Website technical development Statement of Work Reading Room Ltd 65-66 Frith Street Soho London W1D 3JR T: +44 (20) 7173 2800 F: +44 (20)
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection
Dublin City University at CLEF 2004: Experiments with the ImageCLEF St Andrew s Collection Gareth J. F. Jones, Declan Groves, Anna Khasin, Adenike Lam-Adesina, Bart Mellebeek. Andy Way School of Computing,
Report on the embedding and evaluation of the second MT pilot
Report on the embedding and evaluation of the second MT pilot quality translation by deep language engineering approaches DELIVERABLE D3.10 VERSION 1.6 2015-11-02 P2 QTLeap Machine translation is a computational
Search and Data Mining: Techniques. Text Mining Anya Yarygina Boris Novikov
Search and Data Mining: Techniques Text Mining Anya Yarygina Boris Novikov Introduction Generally used to denote any system that analyzes large quantities of natural language text and detects lexical or
Xerox Easy Translator Service User Guide
Xerox Easy Translator Service User Guide Table of Contents Xerox Easy Translator 2 Service Overview Creating an Account 3 via our Web Portal Logging In to the Web Portal 4 Utilizing the Web Portal 4 Creating
The history of machine translation in a nutshell
1. Before the computer The history of machine translation in a nutshell 2. The pioneers, 1947-1954 John Hutchins [revised January 2014] It is possible to trace ideas about mechanizing translation processes
Shallow Parsing with Apache UIMA
Shallow Parsing with Apache UIMA Graham Wilcock University of Helsinki Finland [email protected] Abstract Apache UIMA (Unstructured Information Management Architecture) is a framework for linguistic
Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU http://ixa.si.ehu.es
KYOTO () Intelligent Content and Semantics Knowledge Yielding Ontologies for Transition-Based Organization http://www.kyoto-project.eu/ Kybots, knowledge yielding robots German Rigau IXA group, UPV/EHU
Project Execution Guidelines for SESAR 2020 Exploratory Research
Project Execution Guidelines for SESAR 2020 Exploratory Research 04 June 2015 Edition 01.01.00 This document aims at providing guidance to consortia members on the way they are expected to fulfil the project
SYSTRAN 混 合 策 略 汉 英 和 英 汉 机 器 翻 译 系 统
SYSTRAN Chinese-English and English-Chinese Hybrid Machine Translation Systems Jin Yang, Satoshi Enoue Jean Senellart, Tristan Croiset SYSTRAN Software, Inc. SYSTRAN SA 9333 Genesee Ave. Suite PL1 La Grande
Language technologies for Education: recent results by the MLLP group
Language technologies for Education: recent results by the MLLP group Alfons Juan 2nd Internet of Education Conference 2015 18 September 2015, Sarajevo Contents The MLLP research group 2 translectures
XTM Cloud Explained. XTM Cloud Explained. Better Translation Technology. Page 1
XTM Cloud Explained Better Translation Technology Page 1 Documentation for XTM Cloud Explained Published by XTM International Ltd. Copyright XTM International Ltd. All rights reserved. No part of this
The Challenge of Machine Translation of Patent Specifications and the Approach of the European Patent Office
The Challenge of Machine Translation of Patent Specifications and the Approach of the European Patent Office Georg Artelsmair Head of Department European Affairs/Member States European Patent Office Ottawa,
A Joint Sequence Translation Model with Integrated Reordering
A Joint Sequence Translation Model with Integrated Reordering Nadir Durrani, Helmut Schmid and Alexander Fraser Institute for Natural Language Processing University of Stuttgart Introduction Generation
Technical Report. The KNIME Text Processing Feature:
Technical Report The KNIME Text Processing Feature: An Introduction Dr. Killian Thiel Dr. Michael Berthold [email protected] [email protected] Copyright 2012 by KNIME.com AG
OCR LEVEL 2 CAMBRIDGE TECHNICAL
Cambridge TECHNICALS OCR LEVEL 2 CAMBRIDGE TECHNICAL CERTIFICATE/DIPLOMA IN IT WEBSITE DEVELOPMENT A/601/3245 LEVEL 2 UNIT 9 GUIDED LEARNING HOURS: 60 UNIT CREDIT VALUE: 10 WEBSITE DEVELOPMENT A/601/3245
Interactive Dynamic Information Extraction
Interactive Dynamic Information Extraction Kathrin Eichler, Holmer Hemsen, Markus Löckelt, Günter Neumann, and Norbert Reithinger Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI, 66123 Saarbrücken
Safe Harbor Statement
Safe Harbor Statement The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment
estatistik.core: COLLECTING RAW DATA FROM ERP SYSTEMS
WP. 2 ENGLISH ONLY UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing (Bonn, Germany, 25-27 September
Integration of a Multilingual Keyword Extractor in a Document Management System
Integration of a Multilingual Keyword Extractor in a Document Management System Andrea Agili *, Marco Fabbri *, Alessandro Panunzi +, Manuel Zini * * DrWolf s.r.l., + Dipartimento di Italianistica - Università
Proposal for Website Design and Development Services: Digital Library Federation
Proposal for Website Design and Development Services: Digital Library Federation Overview The Digital Library Federation (DLF) is an association of libraries and institutions whose mission is to develop
CREATIVE EXPRESS. Digital Upload MODULE 2A. Version 3 November 2011. Copyright 2010 Hewlett-Packard Development Company, L.P.
CREATIVE EXPRESS MODULE 2A Digital Upload Version 3 November 2011 1 MODULE OBJECTIVES Understand the process for uploading general content for HP Users and Agency Partners Consider three types of upload
Translating the Penn Treebank with an Interactive-Predictive MT System
IJCLA VOL. 2, NO. 1 2, JAN-DEC 2011, PP. 225 237 RECEIVED 31/10/10 ACCEPTED 26/11/10 FINAL 11/02/11 Translating the Penn Treebank with an Interactive-Predictive MT System MARTHA ALICIA ROCHA 1 AND JOAN
Deliverable D 6.1 Website
Biogas2PEM-FC Biogas Reforming and Valorisation through PEM Fuel Cells FP7-SME-2012, Grant Agreement No. 314940 Deliverable D 6.1 Website Deliverable details Deliverable version: v1.0 Date of delivery:
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts
MIRACLE at VideoCLEF 2008: Classification of Multilingual Speech Transcripts Julio Villena-Román 1,3, Sara Lana-Serrano 2,3 1 Universidad Carlos III de Madrid 2 Universidad Politécnica de Madrid 3 DAEDALUS
Website Redesign and Content Management System Implementation -- Request for Proposals
Website Redesign and Content Management System Implementation -- Request for Proposals Deadline: Friday, November 14, 2008 at 5 p.m. EST The Commission for Environmental Cooperation (CEC) is seeking qualified
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns
GrammAds: Keyword and Ad Creative Generator for Online Advertising Campaigns Stamatina Thomaidou 1,2, Konstantinos Leymonis 1,2, Michalis Vazirgiannis 1,2,3 Presented by: Fragkiskos Malliaros 2 1 : Athens
Annotation and Evaluation of Swedish Multiword Named Entities
Annotation and Evaluation of Swedish Multiword Named Entities DIMITRIOS KOKKINAKIS Department of Swedish, the Swedish Language Bank University of Gothenburg Sweden [email protected] Introduction
KantanMT.com. www.kantanmt.com. The world s #1 MT Platform. No Hardware. No Software. No Hassle MT.
KantanMT.com No Hardware. No Software. No Hassle MT. The world s #1 MT Platform Communicate globally, easily! Create customized language solutions in the cloud. www.kantanmt.com What is KantanMT.com? KantanMT
Content Management System for internal communication. Deliverable D1.2
Content Management System for internal communication Deliverable D1.2 28 April 2015 Author(s) Iliyana kuzmova, Pavel Stoev, Banjamin Burkhard, Margarita Grudova, Teodor Georgiev, Lyubomir Penev ESMERALDA
