Translations as semantic mirrors: from parallel corpus to wordnet

Helge Dyvik

Jump to Content

Translations as semantic mirrors: from parallel corpus to wordnet

于Advances in Corpus Linguistics

著者：

Helge Dyvik

Helge Dyvik
Search for other papers by Helge Dyvik in
Current site
Google Scholar
PubMed

类型:: 章节

Page Range:: 309–326

DOI:: https://doi.org/10.1163/9789004333710_019

Purchase instant access (PDF download and unlimited online access):

€42.00€35.00 excl. VAT

Add to Cart

找回访问令牌

Abstract

The paper reports from the project ‘From Parallel Corpus to Wordnet’ at the University of Bergen (2001–2004), which explores a method for deriving wordnet relations such as synonymy and hyponymy from data extracted from parallel corpora. Assumptions behind the method are that semantically closely related words ought to have strongly overlapping sets of translations, and words with wide meanings ought to have a larger number of translations than words with narrow meanings. Furthermore, if a word a is a hyponym of a word b (such as tasty of good, for example), then the possible translations of a ought to be a subset of the possible translations of b.

Based on assumptions like these a set of definitions are formulated, defining semantic concepts like, e.g., ‘synonymy’, ‘hyponymy’, ‘ambiguity’ and ‘semantic field’ in translational terms. The definitions are implemented in a computer program which takes words with their sets of translations from the corpus as input and performs the following calculations: (1) On the basis of the input different senses of each word are identified. (2) The senses are grouped in semantic fields based on overlapping sets of translations, such overlap being assumed to indicate semantic relatedness. (3) On the basis of the structure of a semantic field a set of features is assigned to each individual sense in it, coding its relations to other senses in the field. (4) Based on intersections and inclusions among these feature sets a semilattice is calculated with the senses as nodes. According to our hypothesis, hyponymy/hyperonymy, near-synonymy and other semantic relations among the senses now appear through dominance and other relations among the nodes in the semilattice. Thus, the semilattice is supposed to contain some of the semantic information we want to represent in wordnets. (5) In accordance with this assumption, thesaurus-like entries for words are generated from the information in the semilattice.

In the project these assumptions are tested against data from the English- Norwegian Parallel Corpus ENPC (Johansson 1997).

引用信息

Save
Cite
Email this content

Share Link

Copy this link, or click below to email it to a friend
Email this content
or copy the link directly:

https://brill.edhh.ma/abstract/book/9789004333710/B9789004333710-s019.xml
The link was not copied. Your current browser may not support copying via this button.

Link copied successfully

折叠
展开
回到页面顶端

Advances in Corpus Linguistics

Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) Göteborg 22-26 May 2002

丛编： Language and Computers, 卷： 49

ISBN:: 9789004333710

出版社:: Brill

印刷出版日期:: 01 Jan 2004

Subjects
- Languages and Linguistics
  - Applied Linguistics
  - Indo-European Languages

统计数据

	全部期间	过去一年	过去30天
摘要浏览次数	114	37	7
全文浏览次数	0	0	0
PDF下载次数	2	0	0

African Studies	Education	Media Studies
American Studies	History	Middle East and Islamic Studies
Ancient Near East and Egypt	Human Rights and Humanitarian Law	Musicology
Art History	International Law	Philosophy
Asian Studies	International Relations	Religious Studies
Biblical Studies	Jewish Studies	Slavic and Eurasian Studies
Biology	Languages and Linguistics	Social Sciences
Book History and Cartography	Life Sciences	Theology and World Christianity
Classical Studies	Literature and Cultural Studies

Subjects

AppBar-Authors 2025

AppBar-Open Access 2025

AppBar-Product Information 2025

AppBar-Company 2025

AppBar-Contact 2025

Translations as semantic mirrors: from parallel corpus to wordnet

Abstract

引用信息

Share Link

Advances in Corpus Linguistics

Papers from the 23rd International Conference on English Language Research on Computerized Corpora (ICAME 23) Göteborg 22-26 May 2002

目录

统计数据

统计数据