1. 程式人生 > >Apache Superset 二次開發

Apache Superset 二次開發

基本概念

 Superset 是 Airbnb 開源的一個旨在視覺,直觀和互動式的資料探索平臺(曾用名 Panoramix、Caravel,現已進入 Apache 孵化器)

基礎元件

Flask

 Python 幾大著名 Web 框架之一,以其輕量級, 高可擴充套件性而著名

  • Jinja2
    模板引擎

  • Werkzeug
    WSGI 工具集

Gunicorn

 Gunicorn 是一個開源的 Python WSGI HTTP 伺服器,移植於 Ruby 的 Unicorn 專案的採用 pre-fork 模式的伺服器

WSGI

 WSGI,即 Python **W**eb **S**erver **G**ateway **I**nterface,是專門用於 Python 應用程式或框架與 Web 伺服器之間的一種介面,沒有官方的實現,因為 WSGI 更像一個協議,只要遵照這些協議,WSGI 應用都可以在 任何伺服器上執行,反之亦然

Pre-Fork

 一個程序處理一個請求,基於 select 模型,所以最多一次建立 1024 個程序
 預先建立程序,pre-fork 採用的是預派生子程序方式,用子程序處理不同的請求,每個請求對應一個子程序,程序之間是彼此獨立的
 一定程度上加快了程序的響應速度

Django

 Django 是一個開放原始碼的 Web 應用框架,由 Python 寫成。採用了 MVC 的軟體設計模式,使得開發複雜的、資料庫驅動的網站變得簡單
 Django 注重元件的重用性和” 可插拔性”,敏捷開發和 DRY 法則(Do not Repeat Yourself)

 核心元件
* 物件導向的對映器,用作資料模型(以 Python 類的形式定義)和 關聯性資料庫間的媒介
* 基於正則表示式的 URL 分發器
* 檢視系統,用於處理請求
* 模板系統

PyDruid

 A Python connector for Druid
 Exposes a simple API to create, execute, and analyze Druid queries

Pandas

 Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive

SciPy

 SciPy 是基於 Numpy 構建的一個集成了多種數學演算法和方便的函式的 Python 模組

Scikit-learn

 Machine Learning in Python

D3.js

 D3.js 是一個操縱資料的 JavaScript 庫

安裝

基礎環境

OS

$ uname -a
Linux 2.6.32-431.el6.x86_64 #1 SMP Fri Nov 22 03:15:09 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

$ cat /proc/version
Linux version 2.6.32-431.el6.x86_64 ([email protected]) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-4) (GCC) ) #1 SMP Fri Nov 22 03:15:09 UTC 2013

# For Fedora and RHEL-derivatives
# [Doc]: Other System https://superset.apache.org/installation.html#os-dependencies
$ sudo yum upgrade python-setuptools -y
$ sudo yum install gcc libffi-devel python-devel python-pip python-wheel openssl-devel libsasl2-devel openldap-devel -y

Machines

# 外網(http://192.168.1.10:9097/)
superset01                     192.168.1.10           Superset
druid01                        192.168.1.11           Druid
druid02                        192.168.1.12           MySQL

# Cluster 配置
Cluster                         druid cluster
Coordinator Host                192.168.1.11
Coordinator Port                8081
Coordinator Endpoint            druid/coordinator/v1/metadata
Broker Host                     192.168.1.13
Broker Port                     8082
Broker Endpoint                 druid/v2
Cache Timeout                   86400               # 1day: result_backend


# 線上(http://192.168.2.10:9097)
druid-prd01                     192.168.2.10         Superset
druid-prd02                     192.168.2.11         Druid

# Cluster 配置
Cluster                         druid cluster
Coordinator Host                192.168.2.11
Coordinator Port                8081
Coordinator Endpoint            druid/coordinator/v1/metadata
Broker Host                     192.168.2.13
Broker Port                     8082
Broker Endpoint                 druid/v2
Cache Timeout                   86400                 # 1day: result_backend

Python 相關

Python

$ python --version
  Python 2.7.8

[Note]: Superset is tested using Python 2.7 and Python 3.4+. Python 3 is the recommended version, Python 2.6 won't be supported.'

## 升級 Python(stable: Python 2.7.12 | 3.4.5, lastest: Python 3.5.2 [2016/12/15])
https://www.python.org/downloads/

# 在 python ftp 伺服器中下載到,對應版本的 python
$ wget http://python.org/ftp/python/2.7.12/Python-2.7.12.tgz

# 編譯
$ tar -zxvf Python-2.7.12.tgz
$ cd /root/software/Python-2.7.12
$ ./configure --prefix=/usr/local/python27
$ make
$ make install

$ ls /usr/local/python27/ -al

  drwxr-xr-x.  6 root root 4096 1215 14:22 .
  drwxr-xr-x. 13 root root 4096 1215 14:20 ..
  drwxr-xr-x.  2 root root 4096 1215 14:22 bin
  drwxr-xr-x.  3 root root 4096 1215 14:21 include
  drwxr-xr-x.  4 root root 4096 1215 14:22 lib
  drwxr-xr-x.  3 root root 4096 1215 14:22 share


# 覆蓋原來的 python6
$ which python
  /usr/local/bin/python
# mv /usr/bin/python /usr/bin/python_old
$ mv /usr/local/bin/python /usr/local/bin/python_old
$ ln -s /usr/local/python27/bin/python /usr/local/bin/
$ python --version
  Python 2.7.12

# 修改 yum 引用的 python 版本為舊版 2.6 的 python
$ vim /usr/bin/yum

  # 第一行修改為 python2.6
  #!/usr/bin/python2.6

$ yum --version | sed '2,$d'
  3.2.29

Pip

$ pip --version
$ pip 9.0.1 from /usr/local/lib/python2.7/site-packages (python 2.7)

# upgrade setup tools and pip
$ pip install --upgrade setuptools pip

## Offline 環境下安裝 pip
# https://pypi.python.org/pypi/setuptools#code-of-conduct 下載 setuptools-32.0.0.tar.gz
$ tar zxvf setuptools-32.0.0.tar.gz
$ cd setuptools-32.0.0

$ cd setuptools-32.0.0
$ python setup.py install

# https://pypi.python.org/pypi/pip 下載 pip-9.0.1.tar.gz
$ wget --no-check-certificate https://pypi.python.org/packages/11/b6/abcb525026a4be042b486df43905d6893fb04f05aac21c32c638e939e447/pip-9.0.1.tar.gz#md5=35f01da33009719497f01a4ba69d63c9
$ tar zxvf pip-9.0.1.tar.gz
$ cd pip-9.0.1
$ python setup.py install
  Installed /usr/local/python27/lib/python2.7/site-packages/pip-9.0.1-py2.7.egg
  Processing dependencies for pip==9.0.1
  Finished processing dependencies for pip==9.0.1

$ pip --version
  pip 9.0.1 from /root/software/pip-9.0.1 (python 2.7)

Virtualenv

$ pip install virtualenv

# virtualenv is shipped in Python 3 as pyvenv
$ virtualenv venv
$ source venv/bin/activate

## Offline 環境下安裝 virtualenv
# https://pypi.python.org/pypi/virtualenv#downloads 下載 virtualenv-15.1.0.tar.gz
$ tar zxvf virtualenv-15.1.0.tar.gz
$ cd virtualenv-15.1.0
$ python setup.py install

$ virtualenv --version
  15.1.0

Superset 相關

Superset 初始化

$ pip install superset

## Offline 環境下安裝 superset
# https://pypi.python.org/pypi/superset 下載 superset-0.15.0.tar.gz
$ tar zxvf superset-0.15.0.tar.gz
$ cd superset-0.15.0
$ python setup.py install

# Create an admin user
$ fabmanager create-admin --app superset

  Username [admin]:        # login name
  User first name [admin]: # first name
  User last name [user]:   # lastname
  Email [[email protected]]:   # email, must unique
  Password: 
  Repeat for confirmation: 
  Error: the two entered values do not match
  Password:             #superset
  Repeat for confirmation: #superset
  // ...
  Recognized Database Authentications.
  2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
  Admin User superset db upgrade created.

# Initialize the database
$ superset db upgrade

  // ...
  INFO  [alembic.runtime.migration] Context impl SQLiteImpl.
  INFO  [alembic.runtime.migration] Will assume transactional DDL.


# Load some data to play with
$ superset load_examples

  Loading examples into <SQLA engine=u'sqlite:////root/.superset/superset.db'>
  Creating default CSS templates
  Loading energy related dataset
  Creating table [wb_health_population] reference
  2016-12-14 17:58:09,568:INFO:root:Creating database reference
  2016-12-14 17:58:09,575:INFO:root:sqlite:////root/.superset/superset.db
  Loading [World Bank's Health Nutrition and Population Stats]'
  Creating table [wb_health_population] reference
  2016-12-14 17:58:30,840:INFO:root:Creating database reference
  2016-12-14 17:58:30,846:INFO:root:sqlite:////root/.superset/superset.db


# Create default roles and permissions
$ superset init

  Loading examples into <SQLA engine=u'sqlite:////root/.superset/superset.db'>
  Creating default CSS templates
  Loading energy related dataset
  Creating table [wb_health_population] reference
  2016-12-14 17:58:09,568:INFO:root:Creating database reference
  2016-12-14 17:58:09,575:INFO:root:sqlite:////root/.superset/superset.db
  Loading [World Bank's Health Nutrition and Population Stats]
  Creating table [wb_health_population] reference
  2016-12-14 17:58:30,840:INFO:root:Creating database reference
  2016-12-14 17:58:30,846:INFO:root:sqlite:////root/.superset/superset.db
  Creating slices
  Creating a World's Health Bank dashboard
  Loading [Birth names]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table [birth_names] reference
  2016-12-14 17:58:52,276:INFO:root:Creating database reference
  2016-12-14 17:58:52,280:INFO:root:sqlite:////root/.superset/superset.db
  Creating some slices
  Creating a dashboard
  Loading [Random time series data]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table [random_time_series] reference
  2016-12-14 17:58:53,953:INFO:root:Creating database reference
  2016-12-14 17:58:53,957:INFO:root:sqlite:////root/.superset/superset.db
  Creating a slice
  Loading [Random long/lat data]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table reference
  2016-12-14 17:59:09,732:INFO:root:Creating database reference
  2016-12-14 17:59:09,736:INFO:root:sqlite:////root/.superset/superset.db
  Creating a slice
  Loading [Multiformat time series]
  Done loading table!
  --------------------------------------------------------------------------------
  Creating table [multiformat_time_series] reference
  2016-12-14 17:59:10,421:INFO:root:Creating database reference
  2016-12-14 17:59:10,426:INFO:root:sqlite:////root/.superset/superset.db
  Creating some slices
  Loading [Misc Charts] dashboard
  Creating the dashboard


# Start the web server on port 8088
$ superset runserver -p 8088

# To start a development web server, use the -d switch
# superset runserver -d

# Refresh Druid Datasource (after config it)
$ superset refresh_druid

Virtualenv 工作空間

# superset01 192.168.1.10
$ cd root
$ virtualenv -p /usr/local/bin/python --system-site-packages --always-copy superset
$ source superset/bin/activate

# 詳見下文 `遇到的坑` - `安裝 superset需要下載依賴庫` 部分
# pip install --download package -r requirements.txt
$ pip install -r /root/requirements.txt

$ superset runserver -a 0.0.0.0 -p 8088

# 建議使用 rsync,詳見 `部署上線` 部分
$ cd /root
$ tar zcvf virtualenv.tar.gz virtualenv/
$ scp virtualenv.tar.gz [email protected]192.168.1.13:/root/

# 192.168.1.13
$ cd /root/virtualenv/superset
$ source bin/activate
## 【拓展】
# virtualenvwrapper 是 virtualenv 的擴充套件工具,可以方便的建立、刪除、複製、切換不同的虛擬環境
$ pip install virtualenvwrapper
$ mkdir ~/workspaces
$ vim ~/.bashrc
  # 增加
  export WORKON_HOME=~/virtualenv
  source /usr/local/bin/virtualenvwrapper.sh

$ mkvirtualenv --python=/usr/bin/python superset
  Running virtualenv with interpreter /usr/bin/python
  New python executable in /root/virtualenv/superset/bin/python
  Installing setuptools, pip, wheel...done.
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/predeactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postdeactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/preactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/postactivate
  virtualenvwrapper.user_scripts creating /root/virtualenv/superset/bin/get_env_details
(superset) [[email protected] virtualenv]# 
(superset) [[email protected] virtualenv]# deactivate

$ workon superset
(superset) [[email protected] virtualenv]# lsvirtualenv -b
superset

部署上線

拷貝

# rsync 替換 scp 可以確保軟連結 也能被 cp
$ rsync -avuz -e ssh /home/superset/superset-0.15.4/ [email protected]:/home/yuzhouwan/superset-0.15.4

  //...
  sent 142935894 bytes  received 180102 bytes  3920986.19 bytes/sec
  total size is 359739823  speedup is 2.51

# 在 本機 和 目標機器 的 superset 目錄下校驗檔案數量
$ find | wc -l
  10113

# 重複以上步驟,從跳板機 rsync 到線上機器
$ rsync -avuz -e ssh /home/yuzhouwan/superset-0.15.4/ [email protected]192.168.2.10:/home/superset/superset-0.15.4

# virtualenv 建立依賴的 python
$ rsync -avuz -e ssh /root/software [email protected]:/home/yuzhouwan
$ rsync -avuz -e ssh /home/yuzhouwan/software [email protected]:/root

$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /       # nessnary!!
$ python -V
  Python 2.7.12

動態連結庫

# 雖然軟連結已經 rsync 過來了,但是 目標機器相關目錄下,沒有對應的 Python 的動態連結庫
$ file /root/superset/lib/python2.7/lib-dynload

  /root/superset/lib/python2.7/lib-dynload: broken symbolic link to `/usr/local/python27/lib/python2.7/lib-dynload`

# 需要和聯網環境中,建立 VirtualEnv 時的 Python 全域性環境一致
$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /

$ ls /usr/local/python27/lib/python2.7/lib-dynload -sail

使用者許可權

# 建立使用者
$ adduser superset
$ cd /home/superset
# 如果存在版本號,需要建立 軟連結
$ chown -R superset:superset superset-0.15.4
$ ln -s superset-0.15.4 superset

$ chown -h superset:superset superset
$ su - superset

元資料儲存

# 修改資料庫
$ vim ./lib/python2.7/site-packages/superset/config.py

  # SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
  SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://user:[email protected]:3306/superset1?charset=utf8'

$ mysql -hmysql01 -p3306 -uuser -ppassword
> use superset1;
> show tables;
  +-------------------------+
  | Tables_in_superset1     |
  +-------------------------+
  | ab_permission           |
  | ...                     |
  | url                     |
  +-------------------------+
  28 rows in set (0.00 sec)

# mysqldump -hmysql01 -p3306 -uuser -ppassword superset1 > superset1.sql
$ mysqldump -hmysql01 -p3306 -uuser -ppassword --single-transaction superset1 > superset1.sql

啟動

$ cd /home/superset/superset-0.15.4
$ source bin/activate
$ mkdir logs
$ nohup superset runserver -a 0.0.0.0 -p 9097 2>&1 -w 4 > logs/superset.log &

本地執行

依賴

Windows 相關

Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat)
描述

 error: Microsoft Visual C++ 9.0 is required (Unable to find vcvarsall.bat). Get it from http://aka.ms/vcpython27

解決
# download vcredist_x64.exe from http://www.microsoft.com/en-us/download/details.aspx?id=2092
$ pip install wheel setuptools
# VCForPython27.msi 下載安裝
‘openssl/opensslv.h’: No such file or directory
解決
# download openssl-0.9.8h-1-setup.exe from http://gnuwin32.sourceforge.net/packages/openssl.htm
參考
Cannot open include file: ‘stdint.h’: No such file or directory
解決
# Microsoft Visual C++ 2015 Redistributable Update 3
# download vc_redist.x64.exe from https://www.microsoft.com/zh-CN/download/details.aspx?id=53840
$ vim D:\apps\Python27\Lib\distutils\msvc9compiler.py

  def get_build_version():
    return 9.0
  def find_vcvarsall(version):
    return r'C:\Users\yuzhouwan\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\vcvarsall.bat'

$ cd superset-0.15.4
$ python setup.py install

# Microsoft 提供的 VCForPython27.msi 預設使用 VC2008,而 stdint.h 是從 VC2012 開始支援的
# 2014 年之後,VCForPython27.msi 便不再維護,決定嘗試用 ubuntu or remote debug ...
參考

Ubuntu 相關

安裝 VMware

Python 相關

Make sure that you use the correct version of ‘pip’
描述
  Try to run this command from the system terminal. Make sure that you use the correct version of 'pip' installed for your Python interpreter located at 'D:\apps\Python27\python.exe'
解決
# 安裝 pip,下載 https://bootstrap.pypa.io/get-pip.py 安裝檔案
$ python get-pip.py

$ pip --version
  pip 8.1.1 from d:\apps\python27\lib\site-packages (python 2.7)
參考
‘Connection to pypi.python.org timed out. (connect timeout=15)’
描述
$ pip install --upgrade pip
  'Connection to pypi.python.org timed out. (connect timeout=15)'
解決
# 設定 proxy
$ export https_proxy="http://10.10.10.10:8080"
$ pip install --upgrade pip
$ pip --version
  pip 9.0.1 from d:\apps\python27\lib\site-packages (python 2.7)
參考
setup.py failed with error code 1
描述
Command "d:\apps\python27\python.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\yuzhouwan\\appdata\\local\\temp\\pip-build-zzbhrq\\sasl\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\yuzhouwan\appdata\local\temp\pip-erwavd-record\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\yuzhouwan\appdata\local\temp\pip-build-zzbhrq\sasl\
解決
$ pip install --upgrade setuptools pip
$ pip install superset

# Download superset-0.15.4.tar.gz from https://pypi.python.org/pypi/superset
$ tar zxvf superset-0.15.4.tar.gz
$ cd superset-0.15.4
$ python setup.py install
參考

開發環境搭建

依賴

$ cd /root/software
$ tar zxvf Python-2.7.12.tgz
$ cd Python-2.7.12

$ ./configure --prefix=/usr/local/python27 --enable-shared CFLAGS=-fPIC
$ make && make install
$ /sbin/ldconfig -v | grep /
$ python -V
$ Python 2.7.12

$ mv /usr/local/bin/python /usr/local/bin/python_bak
$ ln -s /usr/local/python27/bin/python /usr/local/bin/python

虛擬環境

$ cd /root
$ virtualenv -p /usr/local/bin/python --system-site-packages env
$ cd env
$ mkdir code

程式碼

# windows
$ cd E:\Github\super\env
$ git init
$ git remote add origin master https://github.com/asdf2014/superset.git
$ git pull origin master

# SFTP
# 上傳到 /root/env/code

安裝

$ cd /root/env/code
$ source /root/env/bin/activate

$ cd /root/env/code/superset/static
$ mv assets assets_bak
$ ln -s ../assets assets

$ cd /root/env/code
$ python setup.py develop

  Finished processing dependencies for superset==0.15.4

$ pip freeze | grep superset
  superset==0.15.4

# Create an admin user
$ fabmanager create-admin --app superset

  Username [admin]:        # login name
  User first name [admin]: # first name
  User last name [user]:   # lastname
  Email [[email protected]]:   # email, must unique
  Password: 
  Repeat for confirmation: 
  Error: the two entered values do not match
  Password:             #superset
  Repeat for confirmation: #superset
  // ...
  Recognized Database Authentications.
  2016-12-14 17:53:40,945:INFO:flask_appbuilder.security.sqla.manager:Added user superset db upgrade
  Admin User superset db upgrade created.

$ superset db upgrade
$ superset init
$ superset load_examples

Npm

# [Mac OS]
$ sudo yum group install "Development Tools" --setopt=group_package_types=mandatory,default,optional --skip-broken -y
$ sudo yum install curl git m4 ruby texinfo bzip2-devel curl-devel expat-devel ncurses-devel zlib-devel -y

# ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/linuxbrew/go/install)"    # Do not run this as root!
$ wget https://raw.githubusercontent.com/Homebrew/linuxbrew/go/install --no-check-certificate
$ mv install install.rb
$ vim install.rb

 # abort "Don't run this as root!" if Process.uid == 0

$ mkdir -p /root/.linuxbrew/bin
$ export PATH="/root/.linuxbrew/bin:$PATH"
$ ruby install.rb

$ vim ~/.bashrc

 export PATH="$HOME/.linuxbrew/bin:$PATH"
 export MANPATH="$HOME/.linuxbrew/share/man:$MANPATH"
 export INFOPATH="$HOME/.linuxbrew/share/info:$INFOPATH"


# [CentOS]
$ yum install npm
$ cd /root/env/code/superset/assets    # package.json
$ npm install

# if visit https://github.com/jquery/jquery.git return timeout
$ vim /etc/hosts

 192.30.253.112 github.com
 151.101.100.133 assets-cdn.github.com
 192.30.253.117 api.github.com
 192.30.253.121 codeload.github.com

測試

$ cd /root/env/code
$ chmod 777 *sh
$ cd /root/env/code/superset/bin
$ chmod 777 superset

$ cd /root/env/code
$ bash run_tests.sh

IDE 中遠端開發

Remote Debug

 詳見我的另一篇部落格中 Remote Debug 部分:《Python

參考

二次開發

Others Category

問題

描述

 對 HBase 的 Region 層面進行聚合,group 出來的 Region 會很多,在 DistributionPieViz 中展示會很卡頓,而且不美觀

解決
增加 row_limit 可以排除 topN 之外的資料
$ cd /root/superset-0.15.4
$ vim ./lib/python2.7/site-packages/superset/viz.py

  fieldsets = ({
    'label': None,
    'fields': (
      'metrics', 'groupby',
      'limit',
      'pie_label_type',
      ('donut', 'show_legend'),
      'labels_outside',
      'row_limit',
    )
  },)
others_category 將 topN 之外的資料聚合
$ cd /root/superset-0.15.4
$ vim ./lib/python2.7/site-packages/superset/viz.py

  fieldsets = ({
    'label': None,
    'fields': (
      'metrics', 'groupby',
      'limit',
      'pie_label_type',
      ('donut', 'show_legend'),
      'labels_outside',
      'row_limit',
      'others_category',
    )
  },)

$ vim ./lib/python2.7/site-packages/superset/forms.py

  'others_category': (BetterBooleanField, {
    "label": _("Others category"),
    "default": True,
    "description": _("Aggregate data outside of topN into a single category")
  }),


# models.py
# Others類別,沒有被排在最後,而是重新又進行了一次排序
# "others_category": "y" 屬性沒有傳遞下來

self.status = None
self.error_message = None
self.others_category = form_data.get("others_category")

top_n = 10
if top_n > 0:
df_head = df.head(top_n)
df_tail = df.tail(len(df) - 10)
other_metrics_sum = []
for i in range(0, len(metrics) - 1):
  metric = metrics[i]
  other_metrics_sum[i] = df_tail[metric].sum()
df_other = pd.DataFrame([['Others', other_metrics_sum]], columns=df.columns)
df = df_head.append(df_other, ignore_index=True)

Y 軸資料異常

描述

 Y 軸本應該是 0 的起點,變成 -997m 負數

解決

後期優化

MySQL 時區問題

查詢

描述
$ lib/python2.7/site-packages/superset/config.py

 from dateutil import tz

 # Druid query timezone
 # tz.tzutc() : Using utc timezone
 # tz.tzlocal() : Using local timezone
 # other tz can be overridden by providing a local_config
 DRUID_IS_ACTIVE = True
 DRUID_TZ = tz.tzlocal()        # +08:00

 # DRUID_TZ = tz.gettz('Asia/Shanghai')
解決

展示

描述
  dttm.tz_convert(dttm.tzinfo._filename.split('zoneinfo/')[1]) - pytz.timezone(dttm.tzinfo._filename.split('zoneinfo/')[1]).localize(EPOCH)
解決
參考

Superset 升級

# 直接利用 pip install 的方式進行升級
$ pip freeze | grep superset
$ superset==0.13.2

$ pip install superset==-1
  versions: 0.12.0, 0.13.0, 0.13.1, 0.13.2, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.3, 0.15.4

$ pip install superset==0.15.4

# 發現之前的配置資料 都消失了,需要做一些 config 的調整
$ vim ./lib/python2.7/site-packages/superset/config.py

# SQLALCHEMY_DATABASE_URI = 'sqlite:///' + os.path.join(DATA_DIR, 'superset.db')
  SQLALCHEMY_DATABASE_URI = 'mysql+pymysql://root:[email protected]:3306/superset?charset=utf8'

$ vim /root/superset-0.15.4/bin/activate

  # VIRTUAL_ENV="/root/superset"
  VIRTUAL_ENV="/root/superset-0.15.4"

# then could just run "superset runserver -a 0.0.0.0 -p 9097"

Unknown column ‘datasources.filter_select_enabled’ in ‘field list’

描述
InternalError: (pymysql.err.InternalError) (1054, u"Unknown column 'datasources.filter_select_enabled' in 'field list'") [SQL: u'SELECT datasources.created_on AS datasources_created_on, datasources.changed_on AS datasources_changed_on, datasources.id AS datasources_id, datasources.datasource_name AS datasources_datasource_name, datasources.is_featured AS datasources_is_featured, datasources.is_hidden AS datasources_is_hidden, datasources.filter_select_enabled AS datasources_filter_select_enabled, datasources.description AS datasources_description, datasources.default_endpoint AS datasources_default_endpoint, datasources.user_id AS datasources_user_id, datasources.cluster_name AS datasources_cluster_name, datasources.offset AS datasources_offset, datasources.cache_timeout AS datasources_cache_timeout, datasources.params AS datasources_params, datasources.perm AS datasources_perm, datasources.changed_by_fk AS datasources_changed_by_fk, datasources.created_by_fk AS datasources_created_by_fk \nFROM datasources \nWHERE datasources.datasource_name = %(datasource_name_1)s \n LIMIT %(param_1)s'] [parameters: {u'param_1': 1, u'datasource_name_1': u'bi-dfp-oms-detail'}]
解決
$ superset db upgrade
$ superset refresh_druid

Issues with Druid timezones

描述

 Those methods that named tzutc and tzlocal in tz work for me…
 Oh no.. They are not working when i upgrade superset from v0.13.2 into v0.15.4, even if i try to use DRUID_TZ = tz.gettz(‘Asia/Shanghai’) :-(

解決
$ cd /root/superset-0.15.4
$ ./bin/python -m pip freeze | grep superset

  superset==0.13.2

$ ./bin/python -m pip uninstall superset
$ ./bin/python -m pip install superset==0.15.4
$ ./bin/python -m pip freeze | grep superset

  superset==0.15.4

$ ./bin/python ./bin/easy_install lib/pycharm-debug.egg
# config remote python

$ ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097
# nohup ./bin/python ./bin/superset runserver -a 0.0.0.0 -p 9097 2>&1 > logs/superset.log &

$ ./bin/python ./bin/superset db upgrade
$ ./bin/python ./bin/superset refresh_druid

pydevd 無法進行 remote debug

描述

 版本從 0.13.2 升級到 0.15.4,在 debug 的時候會啟動兩個程序(會導致 pydevd 無法進行 remote debug)

$ ps -ef | grep superset | grep -v grep

  root     22567  1632 19 12:05 pts/0    00:00:03 ./bin/python ./bin/superset runserver -d -p 9097
  root     22578 22567 24 12:05 pts/0    00:00:03 /root/superset-0.15.4/bin/python ./bin/superset runserver -d -p 9097
解決
直接用 cli.py 啟動 –not ok
$ vim ./lib/python2.7/site-packages/superset/config.py

  # append
  manager.run()

$ ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -a 0.0.0.0  -p 9097

$ ps -ef | grep superset | grep -v grep

  root     25238  1632 35 13:07 pts/0    00:00:03 ./bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
  root     25247 25238 55 13:07 pts/0    00:00:03 /root/superset-0.15.4/bin/python ./lib/python2.7/site-packages/superset/cli.py runserver -d -p 9097
嘗試解決 WARNING:werkzeug: * Debugger is active! 問題
$ vim lib/python2.7/site-packages/werkzeug/serving.py

  class ThreadedWSGIServer(ThreadingMixIn, BaseWSGIServer):

    """A WSGI server that does threading."""
    multithread = True

$ vim lib/python2.7/site-packages/flask/app.py

  options.setdefault('use_reloader', self.debug)

$ superset/__init__.py
參考

Sqlite3 切換為 MySQL

嘗試 SQLite 自帶的 dump 命令

# superset01                192.168.1.10        Superset
$ cd /root/.superset
$ ll -sail

  1285 43256 -rw-r--r--   1 root root 44288000 Jan 22 14:06 superset.db

$ sqlite3 superset.db
sqlite> .databases
  seq  name             file                                                      
  ---  ---------------  ----------------------------------------------------------
  0    main             /root/.superset/superset.db

  sqlite> .tables
  ab_permission            columns                  multiformat_time_series
  ab_permission_view       css_templates            query                  
  ab_permission_view_role  dashboard_slices         random_time_series     
  ab_register_user         dashboard_user           slice_user             
  ab_role                  dashboards               slices                 
  ab_user                  datasources              sql_metrics            
  ab_user_role             dbs                      table_columns          
  ab_view_menu             energy_usage             tables                 
  access_request           favstar                  url                    
  alembic_version          logs                     wb_health_population   
  birth_names              long_lat               
  clusters                 metrics                

# not suit for mysql
# sqlite> .output superset.sql
# sqlite> .dump

$ vim dump_for_mysql.py

  # https://github.com/EricHigdon/sqlite3tomysql

$ sqlite3 superset.db .dump | python dump_for_mysql.py > superset.sql

$ ls -sail

  1285 43256 -rw-r--r--   1 root root 44288000 Jan 22 14:06 superset.db
  18631 76968 -rw-r--r--   1 root root 78812197 Jan 22 14:35 superset.sql

$ vim superset.sql

  id INTEGER NOT NULL, 
  # 替換為 (主鍵) 自增長
  id INTEGER PRIMARY KEY NOT NULL AUTO_INCREMENT, 

$ scp superset.sql [email protected]192.168.1.12:/home/mysql

自己實現 sqlite3tomysql.py

# druid02    192.168.1.12    MySQL
$ ps -ef | grep mysql | grep -v druid | grep -v grep

  mysql    11435  8530  0 14:13 pts/4    00:00:00 /bin/sh /home/mysql/bin/mysqld_safe --defaults-file=/home/mysql/my.cnf
  mysql    12192 11435  0 14:13 pts/4    00:00:00 /home/mysql/bin/mysqld --defaults-file=/home/mysql/my.cnf --basedir=/home/mysql --datadir=/home/mysql/data --plugin-dir=/home/mysql/lib/mysql/plugin --log-error=/home/mysql/data/druid02.err --open-files-limit=8192 --pid-file=/home/mysql/data/druid02.pid --socket=/home/mysql/data/mysql.sock --port=3306
  mysql    12223  8530  0 14:13 pts/4    00:00:00 mysql -uroot -p -S /home/mysql/data/mysql.sock


$ su - mysql
$ mysql -uroot -p -S /home/mysql/data/mysql.sock
mysql> show databases;
mysql> create database superset;
mysql> show databases;
mysql> use superset;

# 執行 sqlite3tomysql.py
  mysql -uroot -p superset2 -S /home/mysql/data/mysql.sock  --default-character-set=utf8 < superset.sql.schema.sql
  mysql -uroot -p superset2 -S /home/mysql/data/mysql.sock  --default-character-set=utf8 < superset.sql.data.sql

# 避免表之間 外來鍵依賴,可以在 mysql 命令列中,使用 source .superset.sql.schema.sql 的方式,多次批量匯入

元資料儲存

<