READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification

阿新 • • 發佈：2018-11-14

https://aclanthology.info/pdf/W/W11/W11-2308.pdf

2 background
2000年以前 ----
傳統可讀性準則侷限於表面的文字特徵，例如the Flesch-Kincaid measure（現在還在用的最普遍的）是每個單詞的平均音節數和每個句子的平均單詞數的線性函式，前者和後者都作為詞彙和語法複雜度的代表。對於義大利而言，有兩個可讀性公式：the Flesh-Kincaid的從英語道義大利語的轉變，即the Flesch-Vacca formula公式；the GulpEase index，基於每個單詞的平均的字元數目和每句話的平均單詞數目來評估可讀性。
傳統可讀性評估準則快又簡單，但是有很多缺點：
a、使用句子長度來衡量句法複雜度時，認為句子越長句法複雜度越高，但是事實並不總是這樣
b、使用詞的音節數是假設更常出現的詞更可能有更少的音節，但是，和之前的情況一樣，詞的長度並不直接反應難度。
2001-2009年 ---
這些準則的不可靠性已經被這幾年的一些實驗證明。
在評估給定文字的單詞難度時，第一步是基於vocabulary的公式例如Dale-Chall formula，結合了平均句子長度和單詞頻率。但是後者重建了不在3000個easy單詞裡的單詞的比例，方式是通過match自身單詞列表和被評估的材料裡的單詞，去決定合適的reading level。如果基於單詞的方法在評估文字可讀性上有提高，可能是因為頻率詞典和參考語料的可得性，它們仍然和與句子結構有關的東西不合適。

後來應用了lexical, syntactic, semantic, discourse各種形式的特徵，也考慮了讀者的型別。

轉變：Si and Callan(2001) 提出的靜態的基於vocabulary的方法，使用了unigram語言模型+句子長度來捕捉scientific web pages的內容資訊，還有CollinsThompson and Callan (2004) 應用了相似的語言模型 (Smoothed Unigram model)來預測短文字和網頁文件的閱讀難度。這些方法都可以看作基於vocabulary方法的一般化，目的是捕捉finer-grained和更靈活的詞彙使用的資訊。

READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification

READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification

dumped the major features of my life with ADHD and how I manage it | Hacker News

Read it later library with Spaced Repetition | Hacker News

Flesch Index To Determine The Readability Of A Text File With Python.

3% of users browse with IE9 and 14% of users have a disability. Why do we only cater for the former?

[React] Create a queue of Ajax requests with redux-observable and group the results.

Failed to load resource: the server responded with a status of 404 (Not Found)

查看邏輯卷出現 read failed after 0 of 4096 at 0

Xamarin.Android 使用 SQLite 出現 Index -1 requested, with a size of 10 異常

Sync a fork of a repository to keep it up-to-date

android.database.CursorIndexOutOfBoundsException: Index 0 requested, with a size of 0

《System Service Call-oriented Symbolic Execution of Android Framework with Applications to...》論文閱讀筆記

Vue報錯：Uncaught TypeError: Cannot assign to read only property’exports‘ of object’#[Object]‘的解決方法

1 TypeError: Index(...) must be called with a collection of some kind, ' ' was passed columns

MCD5.8ms-Detection of Moving Objects with Non-Stationary Cameras in 5.8ms: Bringing Motion Detection

read appSettings in configuration file by XElement with xmlns

Android問題：報錯Index -1 requested, with a size of 1

Debugging & Visualising training of Neural Network with TensorBoard

obtain start and end timestamp of last week, with set of hour/minute/second 獲取上週開始和結束的時間戳，可以設定時分秒

【PTA練習遇到的問題】warning: ignoring return value of ‘scanf’, declared with attribute warn_unused_result

READ–IT: Assessing Readability of Italian Texts with a View to Text Simplification

相關推薦