【Python】使用python實現漢字轉拼音（2018.12更新）

阿新 • • 發佈：2018-12-16

在瀏覽部落格時，偶然看到了用python將漢字轉為拼音的第三方包，但是在實現的過程中發現一些引數已經更新，現在將兩種方法記錄一下。

xpinyin

在一些部落格中看到，如果要轉化成帶音節的拼音，需要傳遞引數，‘show_tone_marks=True’，但我在實際使用時發現，已經沒有這個引數了，變成了tone_marks，其它的引數和使用方法，一看就明白了，寫的很清楚。
看下原始碼：

class Pinyin(object):

    """translate chinese hanzi to pinyin by python, inspired by flyerhzm’s
    `chinese\_pinyin`_ gem

    usage
    -----
    ::

        >>> from xpinyin import Pinyin
        >>> p = Pinyin()
        >>> # default splitter is `-`
        >>> p.get_pinyin(u"上海")
        'shang-hai'
        >>> # show tone marks
        >>> p.get_pinyin(u"上海", tone_marks='marks')
        'shàng-hǎi'
        >>> p.get_pinyin(u"上海", tone_marks='numbers')
        >>> 'shang4-hai3'
        >>> # remove splitter
        >>> p.get_pinyin(u"上海", '')
        'shanghai'
        >>> # set splitter as whitespace
        >>> p.get_pinyin(u"上海", ' ')
        'shang hai'
        >>> p.get_initial(u"上")
        'S'
        >>> p.get_initials(u"上海")
        'S-H'
        >>> p.get_initials(u"上海", u'')
        'SH'
        >>> p.get_initials(u"上海", u' ')
        'S H'

    請輸入utf8編碼漢字
    .. _chinese\_pinyin: https://github.com/flyerhzm/chinese_pinyin
    """

安裝：pip install xpinyin
程式碼：

from xpinyin import Pinyin


# 例項拼音轉換物件
p = Pinyin()
# 進行拼音轉換
ret = p.get_pinyin(u"漢語拼音轉換", tone_marks='marks')
ret1 = p.get_pinyin(u"漢語拼音轉換", tone_marks='numbers')
print(ret+'\n'+ret1)
# 得到轉化後的結果
# hàn-yǔ-pīn-yīn-zhuǎn-huàn
# han4-yu3-pin1-yin1-zhuan3-huan4

pypinyin

與xpinyin相比，pypinyin更強大。
安裝：pip install pypinyin
使用：

import pypinyin


# 不帶聲調的(style=pypinyin.NORMAL)
def pinyin(word):
    s = ''
    for i in pypinyin.pinyin(word, style=pypinyin.NORMAL):
        s += ''.join(i)
    return s


# 帶聲調的(預設)
def yinjie(word):
    s = ''
    # heteronym=True開啟多音字 

    for i in pypinyin.pinyin(word, heteronym=True):
        s = s + ''.join(i) + " "
    return s


if __name__ == "__main__":
    print(pinyin("忠厚傳家久"))
    print(yinjie("詩書繼世長"))

原始碼：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

from __future__ import unicode_literals

from copy import deepcopy
from itertools import chain

from pypinyin.compat import text_type, callable_check
from pypinyin.constants import (
    PHRASES_DICT, PINYIN_DICT,
    RE_HANS, Style
)
from pypinyin.contrib import mmseg
from pypinyin.utils import simple_seg, _replace_tone2_style_dict_to_default
from pypinyin.style import auto_discover, convert as convert_style

auto_discover()


def seg(hans):
    hans = simple_seg(hans)
    ret = []
    for x in hans:
        if not RE_HANS.match(x):   # 沒有拼音的字元，不再參與二次分詞
            ret.append(x)
        elif PHRASES_DICT:
            ret.extend(list(mmseg.seg.cut(x)))
        else:   # 禁用了詞語庫，不分詞
            ret.append(x)
    return ret


def load_single_dict(pinyin_dict, style='default'):
    """載入使用者自定義的單字拼音庫

    :param pinyin_dict: 單字拼音庫。比如： ``{0x963F: u"ā,ē"}``
    :param style: pinyin_dict 引數值的拼音庫風格. 支援 'default', 'tone2'
    :type pinyin_dict: dict
    """
    if style == 'tone2':
        for k, v in pinyin_dict.items():
            v = _replace_tone2_style_dict_to_default(v)
            PINYIN_DICT[k] = v
    else:
        PINYIN_DICT.update(pinyin_dict)

    mmseg.retrain(mmseg.seg)


def load_phrases_dict(phrases_dict, style='default'):
    """載入使用者自定義的詞語拼音庫

    :param phrases_dict: 詞語拼音庫。比如： ``{u"阿爸": [[u"ā"], [u"bà"]]}``
    :param style: phrases_dict 引數值的拼音庫風格. 支援 'default', 'tone2'
    :type phrases_dict: dict
    """
    if style == 'tone2':
        for k, value in phrases_dict.items():
            v = [
                list(map(_replace_tone2_style_dict_to_default, pys))
                for pys in value
            ]
            PHRASES_DICT[k] = v
    else:
        PHRASES_DICT.update(phrases_dict)

    mmseg.retrain(mmseg.seg)


def to_fixed(pinyin, style, strict=True):
    """根據拼音風格格式化帶聲調的拼音.

    :param pinyin: 單個拼音
    :param style: 拼音風格
    :param strict: 是否嚴格遵照《漢語拼音方案》來處理聲母和韻母
    :return: 根據拼音風格格式化後的拼音字串
    :rtype: unicode
    """
    return convert_style(pinyin, style=style, strict=strict, default=pinyin)


def _handle_nopinyin_char(chars, errors='default'):
    """處理沒有拼音的字元"""
    if callable_check(errors):
        return errors(chars)

    if errors == 'default':
        return chars
    elif errors == 'ignore':
        return None
    elif errors == 'replace':
        if len(chars) > 1:
            return ''.join(text_type('%x' % ord(x)) for x in chars)
        else:
            return text_type('%x' % ord(chars))


def handle_nopinyin(chars, errors='default', heteronym=True):
    py = _handle_nopinyin_char(chars, errors=errors)
    if not py:
        return []
    if isinstance(py, list):
        # 包含多音字資訊
        if isinstance(py[0], list):
            if heteronym:
                return py
            # [[a, b], [c, d]]
            # [[a], [c]]
            return [[x[0]] for x in py]

        return [[i] for i in py]
    else:
        return [[py]]


def single_pinyin(han, style, heteronym, errors='default', strict=True):
    """單字拼音轉換.

    :param han: 單個漢字
    :param errors: 指定如何處理沒有拼音的字元，詳情請參考
                   :py:func:`~pypinyin.pinyin`
    :param strict: 是否嚴格遵照《漢語拼音方案》來處理聲母和韻母
    :return: 返回拼音列表，多音字會有多個拼音項
    :rtype: list
    """
    num = ord(han)
    # 處理沒有拼音的字元
    if num not in PINYIN_DICT:
        return handle_nopinyin(han, errors=errors, heteronym=heteronym)

    pys = PINYIN_DICT[num].split(',')  # 字的拼音列表
    if not heteronym:
        return [[to_fixed(pys[0], style, strict=strict)]]

    # 輸出多音字的多個讀音
    # 臨時儲存已存在的拼音，避免多音字拼音轉換為非音標風格出現重複。
    # TODO: change to use set
    # TODO: add test for cache
    py_cached = {}
    pinyins = []
    for i in pys:
        py = to_fixed(i, style, strict=strict)
        if py in py_cached:
            continue
        py_cached[py] = py
        pinyins.append(py)
    return [pinyins]


def phrase_pinyin(phrase, style, heteronym, errors='default', strict=True):
    """詞語拼音轉換.

    :param phrase: 詞語
    :param errors: 指定如何處理沒有拼音的字元
    :param strict: 是否嚴格遵照《漢語拼音方案》來處理聲母和韻母
    :return: 拼音列表
    :rtype: list
    """
    py = []
    if phrase in PHRASES_DICT:
        py = deepcopy(PHRASES_DICT[phrase])
        for idx, item in enumerate(py):
            py[idx] = [to_fixed(item[0], style=style, strict=strict)]
    else:
        for i in phrase:
            single = single_pinyin(i, style=style, heteronym=heteronym,
                                   errors=errors, strict=strict)
            if single:
                py.extend(single)
    return py


def _pinyin(words, style, heteronym, errors, strict=True):
    """
    :param words: 經過分詞處理後的字串，只包含中文字元或只包含非中文字元，
                  不存在混合的情況。
    """
    pys = []
    # 初步過濾沒有拼音的字元
    if RE_HANS.match(words):
        pys = phrase_pinyin(words, style=style, heteronym=heteronym,
                            errors=errors, strict=strict)
        return pys

    py = handle_nopinyin(words, errors=errors, heteronym=heteronym)
    if py:
        pys.extend(py)
    return pys


def pinyin(hans, style=Style.TONE, heteronym=False,
           errors='default', strict=True):
    """將漢字轉換為拼音.

    :param hans: 漢字字串( ``'你好嗎'`` )或列表( ``['你好', '嗎']`` ).
                 可以使用自己喜愛的分詞模組對字串進行分詞處理,
                 只需將經過分詞處理的字串列表傳進來就可以了。
    :type hans: unicode 字串或字串列表
    :param style: 指定拼音風格，預設是 :py:attr:`~pypinyin.Style.TONE` 風格。
                  更多拼音風格詳見 :class:`~pypinyin.Style`
    :param errors: 指定如何處理沒有拼音的字元。詳見 :ref:`handle_no_pinyin`

                   * ``'default'``: 保留原始字元
                   * ``'ignore'``: 忽略該字元
                   * ``'replace'``: 替換為去掉 ``\\u`` 的 unicode 編碼字串
                     (``'\\u90aa'`` => ``'90aa'``)
                   * callable 物件: 回撥函式之類的可呼叫物件。

    :param heteronym: 是否啟用多音字
    :param strict: 是否嚴格遵照《漢語拼音方案》來處理聲母和韻母，詳見 :ref:`strict`
    :return: 拼音列表
    :rtype: list

    :raise AssertionError: 當傳入的字串不是 unicode 字元時會丟擲這個異常

    Usage::

      >>> from pypinyin import pinyin, Style
      >>> import pypinyin
      >>> pinyin('中心')
      [['zhōng'], ['xīn']]
      >>> pinyin('中心', heteronym=True)  # 啟用多音字模式
      [['zhōng', 'zhòng'], ['xīn']]
      >>> pinyin('中心', style=Style.FIRST_LETTER)  # 設定拼音風格
      [['z'], ['x']]
      >>> pinyin('中心', style=Style.TONE2)
      [['zho1ng'], ['xi1n']]
      >>> pinyin('中心', style=Style.CYRILLIC)
      [['чжун1'], ['синь1']]
    """
    # 對字串進行分詞處理
    if isinstance(hans, text_type):
        han_list = seg(hans)
    else:
        han_list = chain(*(seg(x) for x in hans))
    pys = []
    for words in han_list:
        pys.extend(_pinyin(words, style, heteronym, errors, strict=strict))
    return pys


def slug(hans, style=Style.NORMAL, heteronym=False, separator='-',
         errors='default', strict=True):
    """生成 slug 字串.

    :param hans: 漢字
    :type hans: unicode or list
    :param style: 指定拼音風格，預設是 :py:attr:`~pypinyin.Style.NORMAL` 風格。
                  更多拼音風格詳見 :class:`~pypinyin.Style`
    :param heteronym: 是否啟用多音字
    :param separstor: 兩個拼音間的分隔符/連線符
    :param errors: 指定如何處理沒有拼音的字元，詳情請參考
                   :py:func:`~pypinyin.pinyin`
    :param strict: 是否嚴格遵照《漢語拼音方案》來處理聲母和韻母，詳見 :ref:`strict`
    :return: slug 字串.

    :raise AssertionError: 當傳入的字串不是 unicode 字元時會丟擲這個異常

    ::

      >>> import pypinyin
      >>> from pypinyin import Style
      >>> pypinyin.slug('中國人')
      'zhong-guo-ren'
      >>> pypinyin.slug('中國人', separator=' ')
      'zhong guo ren'
      >>> pypinyin.slug('中國人', style=Style.FIRST_LETTER)
      'z-g-r'
      >>> pypinyin.slug('中國人', style=Style.CYRILLIC)
      'чжун1-го2-жэнь2'
    """
    return separator.join(chain(*pinyin(hans, style=style, heteronym=heteronym,
                                        errors=errors, strict=strict)
                                ))


def lazy_pinyin(hans, style=Style.NORMAL, errors='default', strict=True):
    """不包含多音字的拼音列表.

    與 :py:func:`~pypinyin.pinyin` 的區別是返回的拼音是個字串，
    並且每個字只包含一個讀音.

    :param hans: 漢字
    :type hans: unicode or list
    :param style: 指定拼音風格，預設是 :py:attr:`~pypinyin.Style.NORMAL` 風格。
                  更多拼音風格詳見 :class:`~pypinyin.Style`。
    :param errors: 指定如何處理沒有拼音的字元，詳情請參考
                   :py:func:`~pypinyin.pinyin`
    :param strict: 是否嚴格遵照《漢語拼音方案》來處理聲母和韻母，詳見 :ref:`strict`
    :return: 拼音列表(e.g. ``['zhong', 'guo', 'ren']``)
    :rtype: list

    :raise AssertionError: 當傳入的字串不是 unicode 字元時會丟擲這個異常

    Usage::

      >>> from pypinyin import lazy_pinyin, Style
      >>> import pypinyin
      >>> lazy_pinyin('中心')
      ['zhong', 'xin']
      >>> lazy_pinyin('中心', style=Style.TONE)
      ['zhōng', 'xīn']
      >>> lazy_pinyin('中心', style=Style.FIRST_LETTER)
      ['z', 'x']
      >>> lazy_pinyin('中心', style=Style.TONE2)
      ['zho1ng', 'xi1n']
      >>> lazy_pinyin('中心', style=Style.CYRILLIC)
      ['чжун1', 'синь1']
    """
    return list(chain(*pinyin(hans, style=style, heteronym=False,
                              errors=errors, strict=strict)))

【Python】使用python實現漢字轉拼音（2018.12更新）

在瀏覽部落格時，偶然看到了用python將漢字轉為拼音的第三方包，但是在實現的過程中發現一些引數已經更新，現在將兩種方法記錄一下。 xpinyin 在一些部落格中看到，如果要轉化成帶音節的拼音，需要傳遞引數，‘show_tone_marks=True’，但我在實際使用時發現，已經

java 漢字轉拼音（解決多音字問題）

上一篇文章 Java 漢字轉拼音介紹了Java 中利用Pinyin4j 實現漢字轉拼音，但是對於多音字問題採取的是組合拼音方式，例如長沙取拼音結果就是 changsha zhangsha。某些情況下我們希望能得到多音字的唯一拼音，此時就需要藉助多音字字典了，原理很

【整理】常用電子設備功耗（不定期更新）

沒事標準優酷運行充電電子組裝由器 1.2 NAS 關機：1.4W（我擦，關機還耗電，還不少）待機：20W 運行：20W+每塊3.5寸硬盤5W PC 關機：1.2W 待機：45W（要不是你太吵，那你當nas也可以）處理器滿負荷：100W 顯卡滿負荷：180

python實現漢字轉拼音和讀寫excel

本文再次見證python是對付雜活的利器。不過，為什麼這麼多雜活呢？最近接到上級的任務，要在網路上收集一大批人圖片，主頁等資訊，然後將這些資訊填入到Excel表格。其中有一個令人髮指地無聊的工作就是要將所有人的英文名寫好整理好，由於都是中國人，因此，這工作也就是在考驗我的漢

C#實現漢字轉拼音(包括生僻字)

專案裡面有一個功能是將使用者的名字轉變成拼音全拼，但是在使用過程中，有很多人的名字是生僻字，程式根本找不到那個字的拼音，後來看程式碼才發現，轉拼音這個類居然是將一個個的漢字列舉的：且不說這樣列舉會增加多大的工作量，光這漢字之多，能列舉的完嗎？後來看了一些資料並借鑑了前

java實現漢字轉拼音

轉載請註明出處：http://blog.csdn.net/xiaojimanman/article/details/48579073 個人部落格站已經上線了，網址 www.llwjy.com ~歡迎各位吐槽~---------------------------------

利用Android原始碼，輕鬆實現漢字轉拼音功能

今天和大家分享一個從Android系統原始碼提取出來的漢字轉成拼音實現方案，只要一個類，560多行程式碼就可以讓你輕鬆實現漢字轉成拼音的功能，且無需其他任何第三方依賴。需求場景實際開發過程中需要用到實現漢字轉成拼音的場景比較常見，如：通訊錄裡的聯絡人字母導航欄

js實現漢字轉拼音

將JSPinyin剝離mootools這個JavaScript庫，可以獨立使用。 1）一個是將漢字翻譯為拼音，其中每一個字的首字母大寫； pinyin.getFullChars(this.value); 2）一個是可以將每一個字的拼音的首字母提取出來，是大寫的

Android-實現漢字轉拼音支援一詞多音

<span style="font-family:KaiTi_GB2312;font-size:18px;"><RelativeLayout xmlns:android="http://schemas.android.com/apk/res/android" xmlns:tools=

VB實現漢字轉拼音縮寫的函式

PublicFunction getPYChar(char AsString) AsStringDim lChar AsLong lChar =65536+Asc(char) If (lChar >=45217And lChar <=45252) Then getPYChar ="A"

50行python代碼實現個代理server（你懂的）

try sel -m 轉發 size sso ddr bin input 之前遇到一個場景是這種：我在自己的電腦上須要用mongodb圖形client，可是mongodb的server地址沒有對外網開放，僅僅能通過先登錄主機A，然後再從A連接mongodbserver

【智慧路由器】openwrt實現內網穿透（p2p、n2n）

背景有時候在對線上裝置進行維護，由其是除錯的時候希望技術人員遠端進入路由後臺除錯路由資訊的時候，如果沒有內網穿透就會比較麻煩。本篇部落格是在路由上實現內網穿透，以實現資料、檔案的點對點傳輸或訪問閱讀時需要額外瞭解下p2p協議原理，以及n2n工

2萬字庫PHP漢字轉拼音（UTF-8）

<?php $a1=array("\U4E00","\U4E01","\U4E02","\U4E03","\U4E04","\U4E05","\U4E06","\U4E07","\U4E08","\U4E09","\U4E0A","\U4E0B","\U4

【Qt】Qt之自定義介面（右下角冒泡）【轉】

簡述網頁右下角上經常會出現一些提示性的資訊，桌面軟體中也比較常見，類似360新聞、QQ訊息提示一樣！這種功能用動畫實現起來很簡單，這節我們暫時使用定時器來實現，後面章節會對動畫框架進行詳細講解。下面我們來實現一個右下角冒泡的功能。簡述效果實現原理實現效果

【Qt】Qt之自定義介面（窗體縮放）【轉】

簡述通過前兩節內容，我們實現了自定義窗體的移動，以及自定義標題欄-用來顯示窗體的圖示、標題，以及控制窗體最小化、最大化、關閉。在這之後，我們還缺少窗體的縮放-當滑鼠移動到窗體的邊框-左、上、右、下、左上角、左下角、右上角、右下角時候，滑鼠變為相應的樣式，並且窗體可以隨著滑鼠拖動而進行放大、縮小。

5 MySQL 漢字轉拼音（全拼）

INSERT INTO `base_pinyin` VALUES ('zuo',10254),('zun',10256),('zui',10260),('zuan',10262),('zu',10270),('zou',10274),('zong',10281),('zi',10296),('zhuo',10

[開發技巧]·Python極簡實現滑動平均濾波（基於Numpy.convolve）

[開發技巧]·Python極簡實現滑動平均濾波（基於Numpy.convolve） 1.滑動平均概念滑動平均濾波法（又稱遞推平均濾波法），時把連續取N個取樣值看成一個佇列，佇列的長度固定為N ，每次取樣到一個新資料放入隊尾,並扔掉原來隊首的一次資料.(先進先出原則)

【Quartz】Quartz的搭建、應用（單獨使用Quartz）

文章 sgd aca guide mfc uci strong div guid 原文：http://www.cnblogs.com/nick-huang/p/4848843.html 目錄 1. > 參考的優秀資料 2. > 版本說明 3. > 簡單的

【BZOJ4524】[Cqoi2016]偽光滑數堆（模擬搜索）

整數多少 while i++ size pop truct 滿足答案【BZOJ4524】[Cqoi2016]偽光滑數 Description 若一個大於1的整數M的質因數分解有k項，其最大的質因子為Ak，並且滿足Ak^K<=N,Ak<128，我們就

【總結】遊戲框架與架構設計（Unity為例）

單機業務 github 事件概念 lec 集合架構模式 wid 使用框架開發遊戲優點：耦合性低，重用性高，部署快，可維護性高，方便管理。提高開發效率，降低開發難度缺點：增加了系統結構和實現的復雜性，需要額外花費精力維護，不適合小型程序，易影響運行效率常見

【Python】使用python實現漢字轉拼音（2018.12更新）

xpinyin

pypinyin

相關推薦