Know-how in human language processing. Machine translation between European languages

Summary of the technology

A Spanish University is expert in software development for machine translation between related languages, with emphasis in Romance languages. The know-how is also applicable to Slavic and Scandinavian languages pairs from Europe. The main features are a good combination between speed and translation quality, the possibility of its use online and the possibility of obtaining it as open source software. Companies are sought to use their know-how on specific natural language processing projects.

Universidad de Alicante

TECHNICAL DESCRIPTION

A group from a Spanish University is expert in development of human language processing software, in particular, for machine translation between related languages, with special emphasis in Romance languages from Europe. The tools may also be used to develop reading-aloud aids for these languages.
The group is very interested in local culture preservation, therefore, they pay special attention to the case of languages with a minority use in Europe.

The group has know-how and capacity to develop automatic translation engines with the following characteristics:

- Applicable to automatic translation between related languages pairs, for example between Romanique Romance languages (Spanish, French, Italian, Catalan, etc.), Slavic languages (Polish, Czech, Slovak; or Bulgarian, Macedonian) or Scandinavian languages (Danish, Swedish, Norwegian, Icelandic).
- Modular: the engine is formed of independent modules that performing each task, most of which are based on finite-state techniques.
- The machine translation is very fast thanks to the use of finite-state techniques. It translates around 10.000 words per second in a regular desktop personal
computer.
- High quality translation levels, measured as the percentage of automatically translated text that will not have to be corrected because it is a reasonably correct translation of the original. The quality levels fluctuate between 85% and 95% of correctly translated text.
- Easy integration of the developed engines in Internet applications that are not machine translation.
- Translated surfing: all the links are automatically modified to point at the translations of the appropriate pages.
- Texts are in RTF format (Microsoft’s Rich Text Format).
- Open source software: at present, the group is reimplementing all the modules to release the software as open source software (license types: GPL, Creative
Commons).
- Use of UNICODE. At present, the group is changing all the character codes to UNICODE to avoid incompatibility in translations between languages from different European regions.


DESIGN PROCESS

The design is very simple: it is based on modules that carry out the translation in stages or phases. Seven basic phases can be distinguished:
a. Separation of text from format information in the document.
b. Morphological analysis of words and phrases in the text.
c. Choice of a unique morphological analysis with words with several readings (based on context and by means of statistic procedures).
d. Treatment of specific and simple syntactic structures of more than a word which demand a special treatment (genre and number agreement, preposition changes, new position of words in a sentence), to produce the corresponding structure in the target language, and bilingual equivalence dictionary lookup.
e. Morphological generation or inflection of words in the target language.
f. Orthographical transformation: Apostrophization, contractions and hyphens.
g. Reinsertion of format information to get a translated document with a format as similar as possible to the original.

This way of operation is reasonably powerful, and at the same time is simple, allows for a really efficient programming of the stages and, as a result, produces a translation speed of tens of thousands of words per second.

DEVELOPED APPLICATIONS

Some commercial applications which use this know-how have been developed. For example:
· InterNOSTRUM translator Spanish-Catalan (http://www.internostrum.com)
· Universia translator Spanish-Portuguese (http://traductor.universia.net)
· Reading-aloud assistant for a language called Valencian (http://sao.dlsi.ua.es/)

TECHNOLOGY ADVANTAGES AND INNOVATIVE ASPECTS

INNOVATIVE ASPECTS

Although there are a lot of machine translation tools developers, there are not so many that work with minority languages.

MAIN ADVANTAGES

- A good compromise between speed and translation quality.

- High speed due to the use of finite-state models.

- The fact of being an open source software makes possible a better adaptation to the particular needs of each application, besides allowing a better error debugging.

- Possibility of web application for the developed engines.

CURRENT STATE OF DEVELOPMENT

The know-how is available to be transferred and used to develop new applications. Several commercial applications have been developed which already use this know-how.

COLLABORATION SOUGHT

- Partner sought: industries, universities and technological centres.

- Sector: Human language treatment, translation software, web applications.

- The research group is interested in the application of their know-how in specific projects.

Intellectual property status

  • Granted Patent
  • Patent application number :

Related Keywords

  • Human Language Technologies
  • Machine Learning and Artificial Intelligence
  • Program development tools/languages
  • natural language processing
  • linguistics
  • translation
  • Computer Science, Language and Communication
  • neural machine translation (nmt)

About Universidad de Alicante

Ahead of the current Coronavirus outbreak, Innoget is fully committed to contributing to mobilizing scientific and expert communities to find a real solution to the Covid-19 pandemic. Therefore, we're supporting worldwide calls and programs that could help in any aspects of the coronavirus crisis.

Is your organization promoting or looking for innovation or research initiatives to mitigate the Covid-19 outbreak? Email us at covid19@innoget.com to list them.

Channeled through Innoget's online open innovation network, initiatives in the health, virology, medicine, or novel technologies applied to human health, among others, are listed and disseminated to Innoget members -ranging from hospitals, research institutes, scientists, businesses, and public administrations- and innovation partners worldwide.

Universidad de Alicante

Never miss an update from Universidad de Alicante

Create your free account to connect with Universidad de Alicante and thousands of other innovative organizations and professionals worldwide

Universidad de Alicante

Send a request for information
to Universidad de Alicante

About Technology Offers

Technology Offers on Innoget are directly posted
and managed by its members as well as evaluation of requests for information. Innoget is the trusted open innovation and science network aimed at directly connect industry needs with professionals online.

Help

Need help requesting additional information or have questions regarding this Technology Offer?
Contact Innoget support