class wordcloud.WordCloud(font_path=None, width=400, height=200, margin=2, ranks_only=None, prefer_horizontal=0.9,mask=None, scale=1, color_func=None, max_words=200, min_font_size=4, stopwords=None, random_state=None,background_color='black', max_font_size
=None, font_step=1, mode='RGB', relative_scaling=0.5, regexp=None, collocations=True,colormap=None, normalize_plurals=True)


font_path : string //字型路徑,需要展現什麼字型就把該字型路徑+字尾名寫上,如:font_path = '黑體.ttf'

width : int (default=400) //輸出的畫布寬度,預設為400畫素

height : int (default=200) //輸出的畫布高度,預設為200畫素
prefer_horizontal : float (default=0.90) //詞語水平方向排版出現的頻率,預設 0.9 (所以詞語垂直方向排版出現頻率為 0.1 ) mask : nd-array or None (default=None) //如果引數為空,則使用二維遮罩繪製詞雲。如果 mask 非空,設定的寬高值將被忽略,遮罩形狀被 mask 取代。除全白(#FFFFFF)的部分將不會繪製,其餘部分會用於繪製詞雲。如:bg_pic = imread('讀取一張圖片.png'),背景圖片的畫布一定要設定為白色(#FFFFFF),然後顯示的形狀為不是白色的其他顏色。可以用ps工具將自己要顯示的形狀複製到一個純白色的畫布上再儲存,就ok了。
scale : float (default=1) //按照比例進行放大畫布,如設定為1.5,則長和寬都是原來畫布的1.5倍。 min_font_size : int (default=4) //顯示的最小的字型大小 font_step : int (default=1) //字型步長,如果步長大於1,會加快運算但是可能導致結果出現較大的誤差。 max_words : number (default=200) //要顯示的詞的最大個數 stopwords : set of strings or None //設定需要遮蔽的詞,如果為空,則使用內建的STOPWORDS background_color : color value (default=”black”) //背景顏色,如background_color='white',背景顏色為白色。 max_font_size : int or None (default=None) //顯示的最大的字型大小 mode : string (default=”RGB”) //當引數為“RGBA”並且background_color不為空時,背景為透明。 relative_scaling : float (default=.5) //詞頻和字型大小的關聯性 color_func : callable, default=None //生成新顏色的函式,如果為空,則使用 self.color_func regexp : string or None (optional) //使用正則表示式分隔輸入的文字 collocations : bool, default=True //是否包括兩個詞的搭配 colormap : string or matplotlib colormap, default=”viridis” //給每個單詞隨機分配顏色,若指定color_func,則忽略該方法。 fit_words(frequencies) //根據詞頻生成詞雲 generate(text) //根據文字生成詞雲 generate_from_frequencies(frequencies[, ...]) //根據詞頻生成詞雲 generate_from_text(text) //根據文字生成詞雲 process_text(text) //將長文字分詞並去除遮蔽詞(此處指英語,中文分詞還是需要自己用別的庫先行實現,使用上面的 fit_words(frequencies) ) recolor([random_state, color_func, colormap]) //對現有輸出重新著色。重新上色會比重新生成整個詞雲快很多。 to_array() //轉化為 numpy array to_file(filename) //輸出到檔案



How the Word Cloud Generator Works

The layout algorithm for positioning words without overlap is available on GitHub under an open source license as d3-cloud. Note that this is the only the layout algorithm and any code for converting text into words and rendering the final output requires additional development.

As word placement can be quite slow for more than a few hundred words, the layout algorithm can be run asynchronously, with a configurable time step size. This makes it possible to animate words as they are placed without stuttering. It is recommended to always use a time step even without animations as it prevents the browser’s event loop from blocking while placing the words.

The layout algorithm itself is incredibly simple. For each word, starting with the most “important”:

Attempt to place the word at some starting point: usually near the middle, or somewhere on a central horizontal line.
If the word intersects with any previously placed words, move it one step along an increasing spiral. Repeat until no intersections are found.
The hard part is making it perform efficiently! According to Jonathan Feinberg, Wordle uses a combination of hierarchical bounding boxes and quadtrees to achieve reasonable speeds.

Glyphs in JavaScript

There isn’t a way to retrieve precise glyph shapes via the DOM, except perhaps for SVG fonts. Instead, we draw each word to a hidden canvas element, and retrieve the pixel data.

Retrieving the pixel data separately for each word is expensive, so we draw as many words as possible and then retrieve their pixels in a batch operation.

Sprites and Masks

My initial implementation performed collision detection using sprite masks. Once a word is placed, it doesn't move, so we can copy it to the appropriate position in a larger sprite representing the whole placement area.

The advantage of this is that collision detection only involves comparing a candidate sprite with the relevant area of this larger sprite, rather than comparing with each previous word separately.

Somewhat surprisingly, a simple low-level hack made a tremendous difference: when constructing the sprite I compressed blocks of 32 1-bit pixels into 32-bit integers, thus reducing the number of checks (and memory) by 32 times.

In fact, this turned out to beat my hierarchical bounding box with quadtree implementation on everything I tried it on (even very large areas and font sizes). I think this is primarily because the sprite version only needs to perform a single collision test per candidate area, whereas the bounding box version has to compare with every other previously placed word that overlaps slightly with the candidate area.

Another possibility would be to merge a word’s tree with a single large tree once it is placed. I think this operation would be fairly expensive though compared with the analagous sprite mask operation, which is essentially ORing a whole block. 


# -*- coding: utf-8 -*-

from wordcloud import WordCloud
import matplotlib.pyplot as plt
from scipy.misc import imread


text = open('test.txt','r').read()


bg_pic = imread('3.png')


wordcloud = WordCloud(mask=bg_pic,background_color='white',scale=1.5).generate(text)

image_colors = ImageColorGenerator(bg_pic)







