1. 程式人生 > >Java實現GroupBy/分組TopN功能

Java實現GroupBy/分組TopN功能

tostring .com 定義 排序 ati char 內排序 tex nta

介紹

在Java 8 的Lambda(stream)之前,要在Java代碼中實現類似SQL中的group by分組聚合功能,還是比較困難的。這之前Java對函數式編程支持不是很好,Scala則把函數式編程發揮到了機制,實現一個group by聚合對Scala來說就是幾行代碼的事情:

val birds = List("Golden Eagle","Gyrfalcon", "American Robin", "Mountain BlueBird", "Mountain-Hawk Eagle")
val groupByFirstLetter = birds.groupby(_.charAt(0))

輸出:

Map(M -> List(Mountain BlueBird, Mountain-Hawk Eagle), G -> List(Golden Eagle, Gyrfalcon), 
 A -> List(American Robin))

Java也有一些第三方的函數庫來支持,例如Guava的Function,以及functional java這樣的庫。 但總的來說,內存對Java集合進行GroupBy ,OrderBy, Limit等TopN操作還是比較繁瑣。本文實現一個簡單的group功能,支持自定義key以及聚合函數,通過簡單的幾個類,可以實現SQL都比較難實現的先分組,然後組內排序,最後取組內TopN。

源碼可以在這裏下載;

實現

假設我們有這樣一個Person類:

package me.lin;
class Person {
 private String name;
 private int age;
 private double salary;
 public Person(String name, int age, double salary) {
 super();
 this.name = name;
 this.age = age;
 this.salary = salary;
 }
 public String getName() {
 return name;
 }
 public void setName(String name) {
 this.name = name;
 }
 public int getAge() {
 return age;
 }
 public void setAge(int age) {
 this.age = age;
 }
 public double getSalary() {
 return salary;
 }
 public void setSalary(double salary) {
 this.salary = salary;
 }
 public String getNameAndAge() {
 return this.getName() + "-" + this.getAge();
 }
 @Override
 public String toString() {
 return "Person [name=" + name + ", age=" + age + ", salary=" + salary
 + "]";
 }
}

對於一個Person的List,想要根據年齡進行統計,取第一個值,取salary最高值等。實現如下:

聚合操作

定義一個聚合接口,用於對分組後的元素進行聚合操作,類比到MySQL中的count(*) 、sum():

package me.lin;
import java.util.List;
/**
 *
 * 聚合操作
 *
 * Created by Brandon on 2016/7/21.
 */
public interface Aggregator<T> {
 /**
 * 每一組的聚合操作
 *
 * @param key 組別標識key
 * @param values 屬於該組的元素集合
 * @return
 */
 Object aggregate(Object key , List<T> values);
}
我們實現幾個聚合操作,更復雜的操作支持完全可以自己定義。

CountAggragator:
package me.lin;
import java.util.List;
/**
 *
 * 計數聚合操作
 *
 * Created by Brandon on 2016/7/21.
 */
public class CountAggregator<T> implements Aggregator<T> {
 @Override
 public Object aggregate(Object key, List<T> values) {
 return values.size();
 }
}
FisrtAggregator:
package me.lin;
import java.util.List;
/**
 *
 * 取第一個元素
 *
 * Created by Brandon on 2016/7/21.
 */
public class FirstAggregator<T> implements Aggregator<T> {
 @Override
 public Object aggregate(Object key, List<T> values) {
 if ( values.size() >= 1) {
 return values.get( 0 );
 }else {
 return null;
 }
 }
}
TopNAggregator:
package me.lin;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.List;
/**
 *
 * 取每組TopN
 *
 * Created by Brandon on 2016/7/21.
 */
public class TopNAggregator<T> implements Aggregator<T> {
 private Comparator<T> comparator;
 private int limit;
 public TopNAggregator(Comparator<T> comparator, int limit) {
 this.limit = limit;
 this.comparator = comparator;
 }
 @Override
 public Object aggregate(Object key, List<T> values) {
 if (values == null || values.size() == 0) {
 return null;
 }
 ArrayList<T> copy = new ArrayList<>( values );
 Collections.sort(copy, comparator);
 int size = values.size();
 int toIndex = Math.min(limit, size);
 return copy.subList(0, toIndex);
 }
}

分組實現

接下來是分組實現,簡單起見,采用工具類實現:

package me.lin;
import java.lang.reflect.Field;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;
import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;
import java.util.HashMap;
import java.util.Map;
/**
 * Collection分組工具類
 */
public class GroupUtils {
 /**
 * 分組聚合
 *
 * @param listToDeal 待分組的數據,相當於SQL中的原始表
 * @param clazz 帶分組數據元素類型
 * @param groupBy 分組的屬性名稱
 * @param aggregatorMap 聚合器,key為聚合器名稱,作為返回結果中聚合值map中的key
 * @param <T> 元素類型Class
 * @return
 * @throws NoSuchFieldException
 * @throws SecurityException
 * @throws IllegalArgumentException
 * @throws IllegalAccessException
 */
 public static <T> Map<Object, Map<String, Object>> groupByProperty(
 Collection<T> listToDeal, Class<T> clazz, String groupBy,
 Map<String, Aggregator<T>> aggregatorMap) throws NoSuchFieldException,
 SecurityException, IllegalArgumentException, IllegalAccessException {
 Map<Object, Collection<T>> groupResult = new HashMap<Object, Collection<T>>();
 for (T ele : listToDeal) {
 Field field = clazz.getDeclaredField(groupBy);
 field.setAccessible(true);
 Object key = field.get(ele);
 if (!groupResult.containsKey(key)) {
 groupResult.put(key, new ArrayList<T>());
 }
 groupResult.get(key).add(ele);
 }
 return invokeAggregators(groupResult, aggregatorMap);
 }
 public static <T> Map<Object, Map<String, Object>> groupByMethod(
 Collection<T> listToDeal, Class<T> clazz, String groupByMethodName,
 Map<String, Aggregator<T>> aggregatorMap) throws NoSuchMethodException, SecurityException, IllegalAccessException, IllegalArgumentException, InvocationTargetException {
 Map<Object, Collection<T>> groupResult = new HashMap<Object, Collection<T>>();
 for (T ele : listToDeal) {
 Method groupByMenthod = clazz.getDeclaredMethod(groupByMethodName);
 groupByMenthod.setAccessible(true);
 Object key = groupByMenthod.invoke(ele);
 if (!groupResult.containsKey(key)) {
 groupResult.put(key, new ArrayList<T>());
 }
 groupResult.get(key).add(ele);
 }
 return invokeAggregators(groupResult, aggregatorMap);
 }
 private static <T> Map<Object, Map<String, Object>> invokeAggregators(Map<Object, Collection<T>> groupResult, Map<String, Aggregator<T>> aggregatorMap) {
 Map<Object, Map<String, Object>> aggResults = new HashMap<>();
 for (Object key : groupResult.keySet()) {
 Collection<T> group = groupResult.get(key);
 Map<String, Object> aggValues = doInvokeAggregators(key, group, aggregatorMap);
 if (aggValues != null && aggValues.size() > 0) {
 aggResults.put(key, aggValues);
 }
 }
 return aggResults;
 }
 private static <T> Map<String, Object> doInvokeAggregators(Object key, Collection<T> group, Map<String, Aggregator<T>> aggregatorMap) {
 Map<String, Object> aggResults = new HashMap<String, Object>();
 if (group != null && group.size() > 0) {
 // 調用當前key的每一個聚合函數
 for (String aggKey : aggregatorMap.keySet()) {
 Aggregator<T> aggregator = aggregatorMap.get(aggKey);
 Object aggResult = aggregator.aggregate(key, Collections.unmodifiableList(new ArrayList<T>(group)));
 aggResults.put(aggKey, aggResult);
 }
 }
 return aggResults;
 }
}

上述代碼中,分組的key可以指定元素的屬性,也可以指定元素的方法,通過自己實現復雜方法和聚合函數,可以實現很強大的分組功能。

測試

根據屬性分組

下面測試一下根據屬性分組:

package me.lin;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class GroupByPropertyTest {
 public static void main(String[] args) throws NoSuchFieldException,
 SecurityException, IllegalArgumentException, IllegalAccessException {
 List<Person> persons = new ArrayList<>();
 persons.add(new Person("Brandon", 15, 5000));
 persons.add(new Person("Braney", 15, 15000));
 persons.add(new Person("Jack", 10, 5000));
 persons.add(new Person("Robin", 10, 500000));
 persons.add(new Person("Tony", 10, 1400000));
 Map<String, Aggregator<Person>> aggregatorMap = new HashMap<>();
 aggregatorMap.put("count", new CountAggregator<Person>());
 aggregatorMap.put("first", new FirstAggregator<Person>());
 Comparator<Person> comparator = new Comparator<Person>() {
 public int compare(final Person o1, final Person o2) {
 double diff = o1.getSalary() - o2.getSalary();
 if (diff == 0) {
 return 0;
 }
 return diff > 0 ? -1 : 1;
 }
 };
 aggregatorMap.put("top2", new TopNAggregator<Person>( comparator , 2 ));
 Map<Object, Map<String, Object>> aggResults = GroupUtils.groupByProperty(persons, Person.class, "age", aggregatorMap);
 for (Object key : aggResults.keySet()) {
 System.out.println("Key:" + key);
 Map<String, Object> results = aggResults.get(key);
 for (String aggKey : results.keySet()) {
 System.out.println(" aggkey->" + results.get(aggKey));
 }
 }
 }
}

輸出結果:

Key:10
 aggkey->3
 aggkey->Person [name=Jack, age=10, salary=5000.0]
 aggkey->[Person [name=Tony, age=10, salary=1400000.0], Person [name=Robin, age=10, salary=500000.0]]
Key:15
 aggkey->2
 aggkey->Person [name=Brandon, age=15, salary=5000.0]
 aggkey->[Person [name=Braney, age=15, salary=15000.0], Person [name=Brandon, age=15, salary=5000.0]]

根據方法返回值分組

測試根據方法返回值分組:

package me.lin;
import java.util.ArrayList;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class GroupByMethodTest {
 public static void main(String[] args) throws Exception {
 List<Person> persons = new ArrayList<>();
 persons.add(new Person("Brandon", 15, 5000));
 persons.add(new Person("Brandon", 15, 15000));
 persons.add(new Person("Jack", 10, 5000));
 persons.add(new Person("Robin", 10, 500000));
 persons.add(new Person("Tony", 10, 1400000));
 Map<String, Aggregator<Person>> aggregatorMap = new HashMap<>();
 aggregatorMap.put("count", new CountAggregator<Person>());
 aggregatorMap.put("first", new FirstAggregator<Person>());
 Comparator<Person> comparator = new Comparator<Person>() {
 public int compare(final Person o1, final Person o2) {
 double diff = o1.getSalary() - o2.getSalary();
 if (diff == 0) {
 return 0;
 }
 return diff > 0 ? -1 : 1;
 }
 };
 aggregatorMap.put("top2", new TopNAggregator<Person>(comparator, 2));
 Map<Object, Map<String, Object>> aggResults = GroupUtils.groupByMethod(persons, Person.class, "getNameAndAge", aggregatorMap);
 for (Object key : aggResults.keySet()) {
 System.out.println("Key:" + key);
 Map<String, Object> results = aggResults.get(key);
 for (String aggKey : results.keySet()) {
 System.out.println(" " + aggKey + "->" + results.get(aggKey));
 }
 }
 }
}

測試結果:

Key:Robin-10
 count->1
 first->Person [name=Robin, age=10, salary=500000.0]
 top2->[Person [name=Robin, age=10, salary=500000.0]]
Key:Jack-10
 count->1
 first->Person [name=Jack, age=10, salary=5000.0]
 top2->[Person [name=Jack, age=10, salary=5000.0]]
Key:Tony-10
 count->1
 first->Person [name=Tony, age=10, salary=1400000.0]
 top2->[Person [name=Tony, age=10, salary=1400000.0]]
Key:Brandon-15
 count->2
 first->Person [name=Brandon, age=15, salary=5000.0]
 top2->[Person [name=Brandon, age=15, salary=15000.0], Person [name=Brandon, age=15, salary=5000.0]]

以上就是GroupBy的簡單實現,如果問題,歡迎指出。

有興趣可以加一下854630135這個群去交流一下噢

歡迎交流。

Java實現GroupBy/分組TopN功能