The 3rd International Workshop
on Innovative Algorithms for Big Data
30 November 2017

The International Workshop on Innovative Algorithms for Big Data is intended to provide an international forum for researchers working in the areas of algorithms, data structure, and modeling for Big Data and applications using them. All researchers developing core technology, including (but not limited to) sublinear-time algorithms, sublinear-size data structure, and sublinear-size modeling for Big Data are welcome to attend the workshop.

Program

For the program of speakers from DS Lab, please visit here.
Time Details
9:30-9:55 Reception
9:55-10:00 Opening
10:00-10:45 (45min)
Wing-Kin Sung (National University of Singapore)
Issues on Aligning Short Reads Generated by Next Generation Sequencing

Due to the advance in next generation sequencing, our genome can be sequenced using approximately US$1000. The cost is going to be reduced in the near future. It gives hope to have a cheap solution for personalize medicine.

However, it is not an easy task to reconstruct a correct individual genome from such a big sequencing dataset. The main issue is the repeat regions in human genome, which make it difficult to position the sequencing reads on the human reference genome. In this talk, we will discuss issues and partial solutions to solve such a problem.

10:45-11:15 (30min)
Keisuke Goto (FUJITSU LABORATORIES LTD.)
In-Place Initializable Arrays
Initializing all elements of a given array to a specified value is basic operation that frequently appears in numerous algorithms and programs. Initializable arrays are abstract arrays that support initialization as well as reading and writing of any element of the array in constant worst case time. On the word RAM model, we propose an in-place algorithm which implements an initializable array of length \(N\) with only 1 extra bit, where \(w\) is the word size, and each element of the array can store an individual value of \(\ell \in \mathrm{O}(w)\) bits. Our algorithm significantly improves upon the previous best algorithm [Navarro, CSUR 2014] using \(N+\ell+\mathrm{o}(N)\) extra bits.
11:15-11:40 (25min)
Kazuki Ishiyama (The University of Tokyo)
Practical Space-efficient Data Structures for High-dimensional Orthogonal Range Searching
We consider the orthogonal range search problem. In this problem, we are given a point set \(P\) in \(d\)-dimensional space and an orthogonal query region \(Q\), and return some information on \(P \cap Q\). We focus on the counting query: count the number of points of \(P\) contained in \(Q\), and the reporting query: report all points of \(P\) in \(Q\). For \(2\)-dimensional case, Bose et al. proposed a space-efficient data structure supporting the counting query in \(\mathrm{O}(\lg n / \lg \lg n)\) time and the reporting query in \(\mathrm{O}(k \lg n / \lg \lg n)\) time, where \(n\) is the number of points of \(P\) and \(k\) is the number of points which are contained in \(Q\). For high-dimensional cases, the KDW-tree [Okajima, Maruyama, ALENEX 2015] and the data structure of [Ishiyama, Sadakane, DCC 2017] have been proposed. However, they are not efficient for very large \(d\), the number of dimensions. In our research, we propose practical space-efficient data structures for the problem. They run fast when \(d'\), the number of dimensions used in queries, is smaller than the data dimension \(d\). This kind of queries are typical in database queries.
11:40-13:00 Lunch
13:00-13:45 (45min)
Ilan Newman (The University of Haifa)
Testing for Forbidden Order Patterns in an Array

We study testing of sequence properties that are defined by forbidden order patterns.

An Order pattern of length \(k\) is defined by a permutation \(\pi = (\pi(1), \ldots ,\pi(k))\) of \(k\) elements. A sequence \(f: \{1,\ldots,n\} \rightarrow R\) of length \(n\) contains a pattern \(\pi\) if there is a subsequence of \(f\) in which the order of the elements in it is identical to the order defined by \(\pi\). Namely, there are indices \(i_1 < i_2 < \dots < i_k\), such that \(f(i_x) > f(i_y)\) whenever \(\pi(x) > \pi(y)\). If \(f\) does not contain \(\pi\), we say that \(f\) is \(\pi\)-free. For example, for \(\pi = (2,1)\), the property of being \(\pi\)-free is equivalent to being non-decreasing, i.e. monotone.

For a fixed order pattern \(\pi\), of constant length \(k\), and input sequence \(f\) stored in an array, we consider the property testing problem of distinguishing the case that \(f\) is \(\pi\)-free from the case that \(f\) differs in more than \(\epsilon n\) places from any \(\pi\)-free sequence. We show the following results:

  1. Being \(\pi\)-free can be tested in slightly sublinear number of queries for every fixed length \(\pi\).
  2. There is a clear dichotomy between the monotone patterns and the non-monotone ones: For monotone patterns of length \(k\), i.e., \((k,k-1,\dots, 1)\) and \((1,2,\dots,k)\), we design non-adaptive, one-sided error \(\epsilon\)-tests of \(\mathrm{poly}(\log n)\) query complexity.

    For non-monotone patterns of size k we show that any non-adaptive one-sided error test requires at least \(\Omega(\sqrt{n})\) queries. This general lower bound can be further strengthened for specific non-monotone \(k\)-length patterns to \(\Omega(n^{1-2/(k+1)})\).

  3. We show that adaptivity can make a big difference in testing non-monotone patterns. We develop an adaptive algorithm that for any length \(3\) pattern \(\pi\) tests \(\pi\)-freeness by making \(\mathrm{ploy}(\log n)\) queries.

For all algorithms presented here, the running times are linear in their query complexity.

This is a joint work with Deepak Rajendraprasad, Christian Sohler and Yuri Rabinovich.

13:45-14:15 (30min)
Takeaki Uno (National Institute of Informatics)
Data Polishing for Data Abstraction
Data mining, particularly pattern mining and cluster mining, aims to find interesting and characteristic local structures. In other words,they are "abstracting" properties of the data. A difficulty of the abstraction is on the modeling of abstracted structures; it is hard to describe "what is abstracted" and "how to abstract". In simple models, it is hard to avoid huge number of solutions that are quite similar to each other. In this talk, we propose a new approach called "data polishing" that takes an approach totally different from existing approaches. The idea is to modify the data by feasible hypothesis, so that ambiguities will be distinguished. The result is fine, and even for graph visualization we can see many clusters clearly, in SNS networks.
14:15-14:40 (25min)
Munehiko Sasajima (Kwansei Gakuin University)
Towards Development of Practically Efficient Property Testing - Experimental Study on Practical Effectiveness of a Graph Partitioning Algorithm
In a constant-time property testing algorithm for big graphs, an appropriate partitioning method for dividing the target graphs is required as its subroutine. The authors implemented a graph partitioning algorithm based on an existing property testing algorithm which assumes bounded-degree and minor-free graphs as its input, and computational experiments were carried out for the graphs representing real networks such as power-grids, social network services (SNS), and so on. It is observed that our algorithm works well for planar graphs and those "close to" planar graphs. For the scale-free graphs which contain vertices of high degree, although our algorithm can divide target graph into intended size of subgraphs, issues remain on controlling the number of edges bridging them.
14:40-14:55 Break
14:55-15:40 (45min)
Moritz M├╝hlenthaler (TU Dortmund University)
Solution Graphs of Combinatorial Problems
We consider solution graphs of combinatorial problems with respect to certain natural adjacency relations. These graphs are typically too large to be stored explicitly. We will give a brief overview of what is known about the complexity of the (st-)connectivity problem on such graphs. Furthermore, we will showcase some recent results on the st-connectivity problem in a particular class of solution graphs, namely graphs on common independent sets of matroids.
15:40-16:05 (25min)
Takehiro Ito (Tohoku University)
Reachability on the Solution Space of Graph Colorings
It has been studied intensively from the algorithmic viewpoint to analyze the solution space of (vertex-)colorings on a graph. In this talk, we consider the problem of asking the reachability (st-connectivity) on the solution space, that is, we wish to transform a given coloring into another given one by recoloring only a single vertex at a time, while maintaining always proper coloring.
16:05-16:10 Closing
17:00-19:00 Banquet

Venue

Tomy Hall, 8F Hospital Building A [map]
There are 5 elevators in the building. Please DO NOT use the first 2 elevators that you encounter after entering the building, which are for hospital patients. Please use the last 2 elevators that are located farthest from the entrance.
The Institute of Medical Science, The University of Tokyo [map]
4-6-1 Shirokanedai, Minato-ku, Tokyo, Japan

Registration for banquet

Deadline: 15 November 2017
If you wish to attend the banquet, please register from here.
Registration fee is 5,000 JPY. The venue is Medical Science Museum (Building No. 21).
No pre-registration is required for workshop

Organizers

Naoki Katoh
Kwansei Gakuin University
Tetsuo Shibuya
The University of Tokyo
Kazuyuki Tanaka
Tohoku University
Kazuhisa Makino
Kyoto University
Shin-ichi Tanigawa
The University of Tokyo
Yuya Seki
Tohoku University
Atsuki Nagao
Seikei University
Yushi Uno
Osaka Prefecture University

Sponsor

JST CREST: Foundations of Innovative Algorithms for Big Data

Contact information

iabd2017@smapip.is.tohoku.ac.jp