1. 程式人生 > >Tableau學習筆記:join

Tableau學習筆記:join

The data that you analyze in Tableau is often made up of a collection of tables that are related by specific fields (that is, columns). Joining is a method for combining the related data on those common fields. The result of combining data using a join is a virtual table that is typically extended horizontally by adding columns of data.

Note: When joining tables, the fields that you join on must have the same data type. If you change the data type after you join the tables, the join will break.

For example, suppose you are analyzing data for a publisher. The publisher might have two tables. The first table contains ID numbers, first name, last name, and publisher type. The second table contains ID numbers, price, royalty, and title of published books. The related field between the two tables might be ID.

Table 1

ID First Name Last Name Publisher Type
20034 Adam Davis Independent
20165 Ashley Garcia Big
20233 Susan Nguyen Small/medium

Table 2

Book Title Price Royalty ID
Weather in the Alps 19.99 5,000 20165
My Physics 8.99 3,500 20800
The Magic Shoe Lace 15.99 7,000 20034

In order to analyze these two tables together, you can join the tables on ID to answer questions like, "How much was paid in royalties for authors from a given publisher?". By combining tables using a join, you can view and use related data from different tables in your analysis.

ID First Name Last Name Publisher Type Book Title Price Royalty
20034 Adam Davis Independent The Magic Shoe Lace 15.99 7,000
20165 Ashley Garcia Big Weather in the Alps 19.99 5,000

Overview of join types

In general, there are four types of joins that you can use to combine your data in Tableau: inner, left, right, and full outer. The tables you can join and the different join types you can use depend on the database or file you connect to. You can tell which join types your data supports by checking the join dialog after you've connected to your data and have at least two tables on the canvas.

Join Type Result Description
Inner

When you use an inner join to combine tables, the result is a table that contains values that have matches in both tables.

 

Left

When you use a left join to combine tables, the result is a table that contains all values from the left table and corresponding matches from the right table.

When a value in the left table doesn't have a corresponding match in the right table, you see a null value in the data grid.

Right

When you use a right join to combine tables, the result is a table that contains all values from the right table and corresponding matches from the left table.

When a value in the right table doesn't have a corresponding match in the left table, you see a null value in the data grid.

Full outer

When you use a full outer join to combine tables, the result is a table that contains all values from both tables.

When a value from either table doesn't have a match with the other table, you see a null value in the data grid.

Union Though union is not a type of join, union is another method for combining two or more tables by appending rows of data from one table to another. Ideally, the tables that you union have the same number of fields, and those fields have matching names and data types. For more information about union, see Union Your Data.

Combine tables from the same database

If the tables you need to analyze are from the same database, or workbook (for Excel), or directory (for text) then use the following procedure to combine tables. Combining tables that are from the same database require only a single connection in the data source. Typically, joining tables from the same database yields better performance. This is because querying data that is stored on the same database takes less time and leverages the native capabilities of the database to perform the join.

Note: Depending on the level of detail of the tables you want to combine, you might consider data blending instead. For more information, see Blend Your Data.

To join tables

  1. In Tableau Desktop: on the start page, under Connect, click a connector to connect to your data. This step creates the first connection in the Tableau data source.

    In web authoring: Select New Workbook and connect to your data. This step creates the first connection in the Tableau data source.

  2. Select the file, database, or schema, and then double-click or drag a table to the canvas.

    Note: If you're authoring on the web or signed in to Tableau Server (from Tableau Desktop) while you are setting up the data source, you have access to recommended tables to help make combining your data easier. For more information, see Use Certified and Recommended Data Sources and Tables.

  3. Double-click or drag another table to the canvas, and then click the join relationship to add join clauses and select your join type.

  4. Add one or more join clauses by selecting a field from one of the available tables used in the data source, a join operator, and a field from the added table. Inspect the join clause to make sure it reflects how you want to connect the tables.

    For example, in a data source that has a table of order information and another for returns information, you could use an inner join to combine the two tables based on the Order ID field that exists in both tables.

    Note: You can delete an unwanted join clauses by clicking the "x" that displays when you hover over the right side of the join clause.

  5. When you are finished, close the Join dialog.

After you've created a join, review the data grid to make sure that the join produces the results that you expect. For more information, see Review join results in the data grid. To troubleshoot your join, see Troubleshoot joins.

Continue to prepare your data source for analysis. You can rename and reset fields, create calculations, clean your data with Data Interpreter, change the data types of fields, and so on.

About null values in join keys

In general, joins are performed at the database level. If the fields used to join tables contain null values, most databases return data without the rows that contain the null values. However, if you've set up your single-connection data source to use an Excel, text, or Salesforce connection, Tableau provides an additional option to allow you to join fields that contain null values with other fields that contain null values.

To join on null values

  • After you've set up your data source, on the data source page, select Data > Join null values to null values.

For example, suppose you have two tables of data that you want to join: Orders_June and Orders_July.

Orders_June Orders_July
ID Location
1 New York
2  
3 Miami
ID Location
1 New York
2  
3 Miami

If you join on both the ID and Location fields, most databases return the following table of data:

Join (of Orders_June and Orders_July)

ID Location ID(Orders_July) Location (Orders_July)
1 New York 1 New York
3 Miami 3 Miami

If you are using a single Excel, text, or Salesforce connection in your data source, select Data > Join null values to null values to return the following table:

Join (of Orders_June and Orders_July)

ID Location ID(Orders_July) Location (Orders_July)
1 New York 1 New York
2 null 2 null
3 Miami 3 Miami

Note: This option is available for single-connection data sources that use text, Excel, and Salesforce connections. If you add a second connection to a data source that uses this option, the join reverts back to the default behavior of excluding rows with null values.

Combine tables from different databases

Beginning with Tableau version 10.0, if the tables you need to analyze are stored in different databases, or workbooks (for Excel), or directories (for text), use the following procedure to combine tables using a cross-database join.

Cross-database joins require that you first set up a multi-connection data source—that is, you create a new connection to each database before you join tables. When you connect to multiple databases, a data source becomes a multi-connection data source. Multi-connection data sources can be advantageous when you need to analyze data for an organization that uses different internal systems or when you need to work with data that is managed separately by both internal and external groups.

Note: In many cases, using a cross-database join is the primary method for combining your data. However, there are some cases that you might need to combine your data using data blending instead. For more information, see Blend Your Data.

After you've combined tables using a cross-database join, Tableau colors the tables in the canvas and the columns in the data grid to show you which connection the data comes from.

To join tables from different databases

  1. In Tableau Desktop: On the Start page, under Connect, click a connector to connect to your data. This step creates the first connection in the Tableau data source.

    In web authoring: Select New Workbook and connect to your data. This step creates the first connection in the Tableau data source.

  2. Select the file, database, or schema, and then double-click or drag a table to the canvas.

  3. In the left pane, under Connections, click the Add button (+ in web authoring) to add a new connection to the Tableau data source. A new connection is required if you have related data stored in another database.

    Note: If the connector you want is not available from the Connect list, cross-database joins are not supported for the combination of sources that you want to join. This includes connections to cube data (e.g., Microsoft Analysis Services), most extract-only data (e.g., Google Analytics and OData), and Tableau Server data sources. Instead of joining tables, consider using data blending. For more information, see Blend Your Data.

  4. Add one or more join clauses by selecting a field from one of the available tables used in the data source, a join operator, and a field from the added table. Inspect the join clause to make sure it reflects how you want to connect the tables.

    For example, in a data source that has a table of order information and another table of returns information, you could join the two tables based on the Order ID field that exists in both tables. Select the type of join.

    Note: You can delete an unwanted join clause by clicking the "x" that displays when you hover over the right-side of the join clause.

  5. When you are finished, close the Join dialog box.

    Tables and columns are colored to show you which connection the data comes from.

After you've created a cross-database join, continue to prepare your multi-connection data source for analysis. You can rename and reset fields, create calculations, clean your data with Data Interpreter, change the data types of fields, and so on.

To troubleshoot your join, see Troubleshoot joins.

About working with multi-connection data sources

Working with multi-connection data sources is just like working with any other data source, with a few caveats, discussed in this section.

Union data from within a connection

To union data, you must use text tables or Excel tables from the same connection. That is, you cannot union tables from different databases. In Tableau Desktop, you can union tables across different Excel workbooks and files in different folders. For more information, see the Union tables using wildcard search (Tableau Desktop).

If you need to union data from different databases, use Tableau Prep.

Collation

Collation refers to the rules of a database that determine how string values should be compared and sorted. In most cases, the collation is handled by the database. However, when you work with cross-database joins, you might join columns that have different collations.

For example, suppose your cross-database join used a join key comprised of a case-sensitive column from SQL Server and a case-insensitive column from Oracle. In cases like this, Tableau maps certain collations to others to minimize interpreting values incorrectly.

The following rules are used in cross-database joins:

  • If a column uses collation standards of the International Components for Unicode (ICU), Tableau uses the collation of the other column.

  • If all columns use collation standards of the ICU, Tableau uses the collation of the column of the left table.

  • If no columns use collation standards of the ICU, Tableau uses a binary collation. A binary collation means the locale of the database and data type of the columns determine how string values should be compared and sorted.

Note: Collation of Japanese characters, that is, Kana-sensitivity, depends on the database that you are connected to.

Calculations and multi-connection data sources

Only a subset of calculations can be used in a multi-connection data source.

In Tableau Desktop: You can use a specific calculation if it is both:

  • Supported by all the connections in the multi-connection data source

  • Supported by Tableau extracts.

In web authoring (Tableau Online and Tableau Server): You can use a specific calculation if it is supported by all the connections in the multi-connection data source.

Stored procedures

Stored procedures are not available for multi-connection data sources.

Pivot data from within a connection

To pivot data, you must use text columns or Excel columns from the same connection. That is, you cannot include columns from different databases in a pivot.

Make extract files the first connection (Tableau Desktop only)

When connecting to extract files in a multi-connection data source, make sure that the connection to the extract (.tde or .hyper) file is the first connection. This preserves any customizations that might be a part of the extract, including changes to default properties, calculated fields, groups, aliases, etc.

Note: If you need to connect to multiple extract files in your multi-connection data source, only the customizations in the extract in the first connection are preserved.

Extracts of multi-connection data sources that contain connections to file-based data (Tableau Desktop only)

If you're publishing an extract of a multi-connection data source that contains a connection to file-based data such as Excel, selecting the Include external files option puts a copy of the file-based data on the server as part of the data source. In this case, a copy of your file-based data can be downloaded and its contents accessed by other users. If there is sensitive information in the file-based data that you have intentionally excluded from your extract, do not select Include external files when you publish the data source.

For more information about publishing data sources, see Publish a Data Source.

About queries and cross-database joins

For each connection, Tableau sends independent queries to the databases in the join. The results are stored in a temporary table, in the format of an extract file.

For example, suppose you create connections to two tables, dbo.listings and reviews$. These tables are stored in two different databases, SQL Server and Excel. Tableau queries the database in each connection independently. The database performs the query and applies customizations such as filters and calculations, and Tableau stores the results for each connection in a temporary table. In this example, FQ_Temp_1 is the temporary table for the connection to SQL Server and FQ_Temp_2 is the temporary table for the connection to Excel.

SQL Server table

Excel table

When you perform a cross-database join, the temporary tables are joined together by Tableau Desktop. These temporary tables are necessary for Tableau to perform cross-database joins.

After the tables have been joined, "topn" filter is applied to limit the number of values shown in the data grid to the first 1,000 rows. This filter is applied to help maintain responsiveness of the data grid and the overall performance of the Data Source page.

Joined tables

Review join results in the data grid

After you have created a join on the canvas, review the data grid to make sure the join produces the results that you expect. If the data grid displays data that you don't expect, you might need to modify the join.

Results in the data grid

  • No data: If no data displays in the data grid, you might need to change the join type or a join field used in the join condition. If you suspect a mismatch between fields in the join, use a calculation instead. For more information, see Use calculations to resolve mismatches between fields in a join.

  • Duplicate data: If you see duplicate data, there a few things you can do. Consider changing the aggregation of the measure that use in your analysis, use a calculation, or use data blending instead. For more information about data blending, see Blend Your Data.

  • Missing data: If some data is missing from the data grid, you might need to change the join type or a join field used in the join condition. Again, if you suspect a mismatch between fields in the join, use a calculation instead. For more information, see Use calculations to resolve mismatches between fields in a join.

  • Many null values: If you see many null values that you do not expect, you might need to change the join type from the full outer type to the inner type.

  • All null values for one table: If all values for one table is null, there are no matches between the tables that you are joining. If this is not expected, consider changing the join type.

Use calculations to resolve mismatches between fields in a join

When the fields in a join condition don't match—that is a mismatch between the values in the fields used in a join condition, the data grid can show little or no data at all. A mismatch between fields can occur for several reasons but often caused by the differences in format of the string values or date values in the fields. In many cases, you can resolve mismatches between the fields in your join by using a calculation.

Most functions are available for you to use in a calculation to create and replace a field in the join condition, with the exception of aggregate functions and table calculation functions.

Note:Join calculations are not supported for QuickBooks Online, Marketo, Oracle Eloqua, Anaplan, ServiceNow ITSM, and web data connectors.

String mismatch

A common mismatch scenario when working with string data occurs when one of the fields on one side of the join condition is equivalent to two or more fields on the other side of the join condition. In this case, you can use a calculation to combine the two fields so that its format matches the other field in the join condition.

For example, suppose you want to join two tables that contain the following data:

Patron Contact
FIRST NAME LAST NAME BRANCH MEMBER SINCE UNITS BORROWED FEES SUGGESTED LIMIT
Alan

Wang

North 2000 1 0 15
Andrew Smith North 2000 36 3.50 15
Ashley Garcia South 2000 243 11.30 15
Fred Suzuki North 2000 52 .90 15
NAME MEMBER NUMBER EMERGENCY CONTACT RELATIONSHIP EMERGENCY NUMBER
Adam Davis 555-0324 Ellen Davis Partner 555-0884
Alan Wang 555-0356 Jean Wilson Mother 555-0327
Fred Suzuki 555-0188 Jim Suzuki Brother 555-3188

Henry Wilson

555-0100 Laura Rodriquez Partner 555-0103
Michelle Kim 555-0199 Steven Kim Partner 555-0125

The common fields between the two tables appear to be name. However, in the Patron table the first and last names are in separate columns and in the Contact table the first and last names are in the same column. To join the tables on names, you can use a calculation in the left side of the join condition to merge the first name and last name columns together.

The result is a calculated field on the left side of the join condition that is accessible only from the join dialog. This calculation converts the field in the Patron table into a format that now matches the format of the field in the Contact table on the right side of the join condition.

Using the calculation in the join produces the following combined table: 

FIRST NAME LAST NAME BRANCH MEMBER SINCE UNITS BORROWED FEES SUGGESTED LIMIT NAME PHONE NUMBER
Alan Wang North 2000 1 0 15 Alan Wang 555-0356
Fred Suzuki North 2000 52 .90 15 Fred Suzuki 555-0188

Date mismatch

A common mismatch scenario when working with date data occurs when the date values in one field of the join condition are captured at a different level of detail than the other field in the join condition. In this case you can use a calculation in the join condition to change the format of the field on one side of the join condition so that its format matches the other field in the join condition.

For example, suppose you have the following two tables of data:

Projector rental Patron
DATE RESERVATION TYPE REQUESTER ID

1/1/2000

Individual 233445589
1/28/2002 K-12 365948999
1/29/2002 Non-profit 233448888
12/5/2002 K-12 365948999
5/5/2003 Non-profit 334015476
3/12/2004

Non-profit

334015476
3/15/2006 City 211896980
7/8/2007 K-12 334015476
1/4/2008 Individual 560495523
3/8/2009 Non-profit 233445566
2/14/2014

Non-profit

233445566
12/21/2015 Non-profit 233445566
2/10/2016 Non-profit 233445566
ID FIRST NAME LAST NAME BRANCH MEMBER SINCE UNITS BORROWED FEES SUGGESTED LIMIT
454613981 Adam Davis West 2012 25 0 10
232502870 Alan

Wang

North 2000 1 0 15
298000916 Amanda Smith North 2001 54 6.4 15
233448978 Andrew Smith North 2000 36 3.50 15
233445566 Ashley Garcia South 2000 243 11.30 15
900005122 Brian Frank East 2011 12 .10 10
921491769 Elizabeth Johnson West 2010 19 .5 10
233445589 Fred Suzuki North 2000 52 .90 15
344556677 Henry Wilson South 2005 3 .2 15
939502870 Jane Johnson West 2017 0 0 10

To find out more information about new patron behavior, joining the Patron table to the Projector Rental table might provide some insight about which library services motivate new memberships. The common fields between the two tables appear to be "Date" and "Member since." However, the date values in each field are captured at different levels of detail. To join these tables on their respective date fields, use a combination of DATE functions in a calculation on each side of the join condition to make the level of detail in each field match.

Using the calculation in the join produces the following combined table:

DATE RESERVATION TYPE REQUESTER ID ID FIRST NAME LAST NAME BRANCH MEMBER SINCE UNITS BORROWED FEES SUGGESTED LIMIT

1/1/2000

Individual 233445589 232502870 Alan Wang

North

2000 1 0.00 15
1/1/2000 Individual 233445589 233445589 Fred Suzuki North 2000 52 0.90 15
1/1/2000 Individual 233445589 233445566 Ashley Garcia South 2000 243 11.30 15
1/1/2000 Individual 233445589 233448978 Andrew Smith North 2000 36 3.50 15

To determine if a patron rented the projector in the same year he or she started his or her membership, add one more clause to the join based on ID.

The result of the additional join condition shows that only one patron might have started his membership to rent a projector.

DATE RESERVATION TYPE REQUESTER ID ID FIRST NAME LAST NAME BRANCH JOINED UNITS BORROWED FEES SUGGESTED LIMIT
1/1/2000 Individual 233445589 233445589 Fred Suzuki North 2000

相關推薦

Tableau學習筆記join

The data that you analyze in Tableau is often made up of a collection of tables that are related by specific fields (that is, columns). Joining is a method

Linux學習筆記存儲管理

linux 磁盤管理 Linux系統中所有的硬件設備都是通過文件的方式來表現和使用的,我們將這些文件稱為設備文件,在Linux下的/dev目錄中有大量的設備文件,根據設備文件的不同,又分為字符設備文件和塊設備文件。字符設備文件的存取是以字符流的方式來進行的,一次傳送一個字符。常見的有打印

學習筆記javascript內置對象數組對象

b- sort splice 刪除 分隔 href 結果 join() strong 1.數組對象的創建 1.設置一個長度為0的數組 var myarr=new array(); 2.設置一個長度為n的數組 var myarr=new arr(n); 3.聲明一個

學習筆記javascript內置對象日期對象

etsec sel mil cond ava com 描述 學習筆記 asp 2.日期對象的常用函數 2.日期對象的常用函數 Date 對象方法 方法描述 Date() 返回當日的日期和時間。 getDate() 從 Date 對象返回一個月

Linux學習筆記btrfs

可擴展性 linux btrfs Technical Preview, 技術預覽版 BtrFS(B-tree文件系統,又稱為Butter FS或Better FS),2007由oracle開源後,得到了IBM、intel等廠商的大力支持,其目標計劃是替代linux目前的ext3/4,成為下

Linux學習筆記rpm程序包管理

源代碼 rpm 程序包 以CentOS為例,rpm程序包管理器的相關內容如下:CentOS的程序包管理器: 程序包的命名規則: 源代碼包: software_name-VERSION.tar.gz VERSION:major.mino

kafka學習筆記知識點整理

一個 eight true med 分組 pos 間接 fig ges 一、為什麽需要消息系統 1.解耦:  允許你獨立的擴展或修改兩邊的處理過程,只要確保它們遵守同樣的接口約束。 2.冗余:   消息隊列把數據進行持久化直到它們已經被完全處理,通過這一方式規避了數據

Emacs學習筆記移動

size exp http spc ssi 參考 put kill ati 參考網址:https://www.emacswiki.org/emacs/NavigatingParentheses Navigating over balanced expressions C

Linux學習筆記OSI七層模型

路由器 交換機 比特流 兼容性 linux OSI七層模型: OSI(Open System Interconnection,開放系統互連)七層網絡模型稱為開放式系統互聯參考模型 ,是一個邏輯上的定義,一個規範,它把網絡從邏輯上分為了7層。每一層都有相關、相對應的物理設備,比如路由器

Android學習筆記超能RecyclerView組件使用總結

popu bin view設置 and col cas mda rac data 個人認為 RecyclerView組件確實值得學習並用到我們的項目中去,前面學了相關的內容。今天再補充一些相關的東東。 1,實現對RecyclerView中的數據進行加入和刪除操作。

python學習筆記字符串

修改 結束 () 添加 cnblogs hid src 處理 linu string類型由多個字符組成,可以把字符串看成一個整體,也可以取得字符串中的任何一個部分。 函數len() 返回字符串的長度 >>> address = ‘www.baidu.c

tableau學習筆記—1

類型 date pan 故事 功能介紹 -c images 數據 tps 第一部分 第一章 數據可視化   1.1 用數據講故事   1.2 數據不只是數字   1.3 在數據中尋找什麽(關系、模式、異常)  第二章 Tableau概述   2.1 Tableau概述   

MVC學習筆記MVC實現用戶登錄驗證ActionFilterAttribute用法並實現統一授權

重置 ids filter .config detail close login out gif 在項目下新建一個文件夾來專門放過濾器類,首先創建一個類LoginFilter,這個類繼承ActionFilterAttribute。用來檢查用戶是否登錄和用戶權限。: u

V-rep學習筆記視覺傳感器2

存在 bsp ping repr sim isp cif ron depth   視覺傳感器的屬性設置欄中還有如下幾個選項: Ignore RGB info (faster): if selected, the RGB information of the sensor

學習筆記邏輯運算符也有優先級區別

php 運算符 優先級 and or不嘗試還真不知道,居然是因為優先級的差別導致,如果是真實編程中遇到,排錯就很困難了,幸虧這裏看到了:$a = true; //聲明一個布爾型變量$a,賦值為真 $b = true; //聲明一個布爾型變量$b,賦值為真 $c

Guava學習筆記Optional優雅的使用null

asset 不包含 你在 rgs 命名 靜態 不清晰 ces throw 在我們學習和使用Guava的Optional之前,我們需要來了解一下Java中null。因為,只有我們深入的了解了null的相關知識,我們才能更加深入體會領悟到Guava的Optional設計和使用上

9.Laravel5學習筆記在laravel中註冊自己的服務到容器中

pri script -128 ring nts date require 一次 name 問題描寫敘述 或許標題寫的不夠清楚。實際情況是,在我使用laravel的過程中。須要將自己的一個類,通過服務提供者註冊到IOC容器中,可是在實際操作過程中。

Linux學習筆記Linux系統的進程調度(任務調度)

任務 調度 今天我們學習了Linux系統的進程調度,進程調度是為了在未來某個時間點,讓系統自動執行我們事先編寫好的命令或腳本的列表,從而使得即使用戶不在計算機旁邊也可以按時完成任務。這樣有利於我們更好的進行任務計劃以及在需要執行任務的時候自動完成我們設定好的命令,從而完成任務。 當然為

NumPy學習筆記3、更加復雜的數組

ssi ram 有符號 -c .sh [0 tab sin pes 一、更多的數據類型 1、Casting (1) 在混合數據類型的運算中,最終運算結果的數據類型與size更大的數據類型相同。 >>> np.array([1, 2, 3]) + 1.5

NumPy學習筆記4、高級運算

scipy file matrix 系統 from degree span ctu src 一、多項式 舉個例子,: >>> p = np.poly1d([3, 2, -1]) >>> p(0) -1 >>> p.ro