Language Integrated Query for SciDB in Scala

Viewer
Transcript

Language Integrated Query for SciDB in Scala Michael Bayne University of Washington [email protected]

1.

Introduction

SciDB is a new database management system aimed at better meeting the needs of scientists and scientific computation. In addition to providing an architecture and data model well suited for scientific computing, SciDB has the opportunity to leverage research in programming language design—aimed at better integration between database systems and programming languages—to provide a programmatic interface that is safe, powerful and easy to use. Recent advances in programming language and type system design have enabled the creation of so-called “language integrated query” database programming interfaces. The most well-known of which is Microsoft’s LINQ system [7]. They leverage the infrastructure of the host language, including the type-checker and integrated development environment, to support the task of programmatic database access. Such mechanisms enable database access code to be written concisely, in a type-safe manner, directly in the syntax of the host language, and using operations that act equivalently on data in-memory and data stored in the database. This paper investigates the applicability of a language integrated query approach to SciDB’s unique requirements, describes such a SciDB programming interface implemented in the Scala language, evaluates the described approach, and investigates its applicability to other languages.

2.

Background

2.1

SciDB

SciDB [11] is a databse system currently being designed and developed by a consortium of computer scientists from the database research community to address the unique needs of scientific data management. Relational databases turn out to be a poor fit for scientific computing, causing scientists to either develop their own custom data management software, or struggle to achieve their goals using relational databases or no data management systems at all. The result is duplicated effort and additional cost to scientific research. As scientific data collection capabilities increase, these undesirable costs are compounded. SciDB’s design differs from traditional relational databases in a number of key ways. It uses a multi-dimensional array rather than a relation as its basic data model. It is optimized

for mostly read-only data, where modifications are infrequent and do not replace original data but are rather layered on in an extra array dimension. It also targets very large datasets by assuming a distributed architecture as standard configuration as well as providing facilities for performing distributed computations directly on the data rather than moving prohibitively large quantities of data over the network. It also provides robust support for user defined functions to accommodate the wide variety of computations desired by scientists across many fields. 2.2

Scala

Scala [9] combines object-oriented and functional programming in a statically typed language with type inference. Its syntax is highly uniform: programs are trees of definitions which can be arbitrarily nested. It contains concepts common to object-oriented languages like classes and methods, and extends those with more fundamental abstractions, including singleton objects, traits for mixin inheritance, and type members that generalize type parameters. Functions are first-class values: they can be written as literals, passed as arguments or returned from methods. Methods abstract over both values and types. The type system is advanced and includes path-dependent types, intersection types, variance indication for type parameters, and higher-kinded types [8]. Scala is compiled to Java byte-codes, runs on the Java Virtual Machine, and provides easy interoperation with Java code.

3.

Goals

Language integrated query describes a general approach to database program interfaces that has its origins in embedded SQL and is currently best embodied by Microsoft’s LINQ framework. Its main advantages are its compile-time type safety, concise readable syntax, and integration with host language abstractions for processing sequences known as comprehensions [3]. Most language integrated query implementations have relied on special support in the underlying language compiler. Microsoft’s LINQ generalizes this support into expression trees [12], which allow reification of source language expressions into abstract syntax trees which can then be manipulated by the program as first-class constructs. Unfortuantely, such functionality is not present in most programming

object Samples extends ArraySchema { val name = " samples2009 " // o p t i o n a l array name // cell schema val flux = float32 ( " flux " ) val mask = int32 ( " mask " ) val variance = float32 ( " variance " ) // array d i m e n s i o n s val x = dim ( " x " , start =0 , length = 100 , stride = 10 ) val y = dim ( " y " , start =0 , length = 100 , stride = 10 ) val time = dim ( " time " , length = 4 ) }

Figure 1. Array schema declaration. languages, thus one goal of this research is to determine whether similar results can be obtained using more widely available language features. Another goal of our investigation is to determine whether comprehensions-style data access is as appropriate for SciDB’s array data model as it is for the traditional relational model.

4.

Approach

We now describe the implementation approach and details of our language integrated Scala client for SciDB. In the following sections we describe the syntactic representation chosen for the particular aspect of the library and the underlying language features that accommodate such an approach. Discussion of the limitations of the approach and the degree to which it meets our goals is left to Section 5. 4.1

Array Schema

An array schema is declared as a singleton object instance (denoted by Scala’s object keyword). The schema declaration is a subtype of the ArraySchema class which both defines the methods used in the schema declaration (e.g. float32(), dim(), etc.) and provides the interface by which array metadata (e.g. its name) is communicated to the library. An example array schema definition is shown in Figure 1. The declarations of array cell fields carry the underlying type of the data. Due to type inference, they are not directly visible to the programmer. A fully typed cell field declaration has the form: val flux :FieldExpr[Float] = float32("flux") The FieldExpr[T] classes are predicate constructors in that they define methods for creating query predicates. This is described in more detail in Section 4.4. The type parameter on the field expression, in the above example bound to Float, defines the type of the values held by the field in a cell. It is also used by the API to preserve type correctness when constructing query predicates. The instance returned by the field declaration methods is a subtype of FieldExpr[T] which contains specialized code for extracting raw data of the field’s type. In the prototype implementation, this is used to convert string format-

ted data into the correct primitive value. In a productionquality implementation, these specialized classes would contain code to cast bulk raw data obtained from the database to values of the correct type, possibly performing endian conversion or other transformations. Dimension declarations return instances of the class Dimension which is also a predicate constructor. Dimension omits the Expr suffix from its name as it appears elsewhere in the user-visible API, unlike FieldExpr which is never referenced directly. Dimensions are assumed to have type Int though they too could be parametrized to support non-integer enhanced dimensions as described in [11]. In the prototype implementation, the values start, length and stride provided for a dimension declaration exist purely for documentation. Those values could easily be used to perform runtime checks in the client prior to making requests to the database, but a more powerful use of that information is considered in Section 7 for languages that support dependent typing. A final note on schema declaration is that the names chosen for use in language constructs need not correspond to the names defined in the SciDB catalog. This avoids name clashes that might occur if a cell field has the same name as one of the array dimensions, or if a field or dimension has the name name which is used to convey the catalog name of the array. One solution to such a name collision would be to prefix the names in the following manner: val fTime = float32("time") val dTime = dim("time", length=4) 4.2

Array Access

SciDB array cells are modeled in the client library as instances of the class DBCell. Arrays are modeled as instances of the class DBArray. As is evidenced by the listing in Figure 2, these type names are not visible to the clients of the library, but we point out their names so that we can conveniently describe their function. Random access: Scala semantically models array dereference as function application. Arrays are treated as functions mapping an index value to the contents of the array at that index. This is defined via the Function1 interface which provides an apply method roughly as follows: trait Function1[T1,R] { def apply(v1:T1): R } Scala then defines the C-style concise syntax for function application as syntactic sugar for a call to the apply method. Thus array element access in Scala takes the following form: val data = Array(1, 2, 3, 4, 5) val elem = data(2) // has value 3 which is translated to: val elem = data.apply(2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

val conn = SciDB . connect ( ... ) val samples = Samples ( conn ) val f = samples ( 25 ) ( Samples . flux ) // random access for ( c < - samples ) { // i t e r a t i v e access Console . printf ( "%d %d %d %f %f\n", c ( Samples . x ) , , c ( Samples . y ) , c ( Samples . time ) , c ( Samples . flux ) , c ( Samples . variance ) ) } for ( (f , v ) < - samples get ( Samples . flux , Samples . variance ) ) { Console . printf ( " % f % f \ n " , f , v ) ) } val fluxen = samples toArray ( Samples . flux ) // float [] val masks = samples toArray ( Samples . mask ) // int []

Figure 2. Array access. It is useful to note that the actual implementation of arrays uses the underlying primitives supplied by the Java Virtual Machine and the associated bytecode instructions to efficiently perform array access. This is simply an optimized implementation of the equivalent semantic model provided at the language level. This choice of array access semantics allows DBArray to present the same interface as built-in Scala arrays. By implementing the Function1 interface specialized to return DBCell instances, the DBArray class can provide random access to SciDB arrays as shown in Figure 2 line 4. That line also demonstrates that the Scala function application syntax is used to provide concise access to field and dimension values. This is accomplished via two overloaded versions of the apply method defined on DBCell: class DBCell { def apply[T] (field :FieldExpr[T]) :T = ... def apply (dim :Dimension) :Int = ... } This interface illustrates the mechanism by which the type associated with the field declaration is transfered to an expression where a value of that field is obtained. The version of apply that accepts a FieldExpr captures the parametrized type of the field expression (represented by the type variable T) and uses it to define the type of the returned value. The same technique would be used with the Dimension application were it to be extended to support non-integer types. Sequential access: Scala uses comprehension expressions as its fundamental mechanism for transforming or iterating over sequences of values. Comprehensions are fully described by Wadler in [13]. Informally, they provide a notation for expressing a list of sequence generators and predicates on their terms which are combined to generate a single flattened sequence of terms matching the predicates. To participate in a comprehension expression in Scala, an object must implement the operations filter, map,

and flatMap. DBArray mixes in the Scala sequence trait Seq[DBCell] which defines the necessary operations. This enables the concise iterative access shown in Figure 2 lines 6-11. In order to enable the variable binding syntax shown in Figure 2 lines 13-16, the mechanism by which projection is implemented differs from that used for other array-algebra operators described in Section 4.3. Instead of simply returning a DBArray, configured with the specified projection— conceptually a Seq[DBCell]—a Seq[TupleN[T1..TN]] is returned where the elements of the tuple are bound to the elements projected from the array cell.1 Scala’s pattern matching support allows those values to be extracted and bound to variables in the comprehension expression resulting in a concise syntax. Bulk access: Lines 18 and 19 of Figure 2 demonstrate the bulk access syntax. Bulk access to invidual fields is provided to accomodate efficient transfer of retrieved data to numerical processing libraries which will likely expect data in the form of contiguous primitive arrays. The toArray method is structured to allow it to efficiently return raw array data provided by a low-level bulk data access client described in Section 7.2. It is possible that numerical processing libraries may use custom mechanisms to stream large quantities of data rather than represent them as contiguous in-memory arrays. This approach accommodates such alternatives. 4.3

Array-Algebra Operators

SciDB defines an array-algebra with a large number of operators [5]. These operators are essentially functions whose output is an array and whose input is one or more arrays and in some cases auxiliary arguments specifying operator behavior. An example of an array-algebra operator is the lookup operation: lookup(T,A). Its first argument is a template array T which contains as cell values indices into its second argument, the target array A. If A has dimensions i, j and k, then T would have fields i, j and k. The result array has the shape of the template array T and contents equal to the contents of A identified by each cell of T. We have chosen an object-oriented approach for modeling array-algebra operators. Thus they are defined as methods on the DBArray class. The definition for the lookup operation is as follows: class DBArray { def lookup (template :DBArray) :DBArray } By this approach, the target array is the receiver of the method call and thus does not appear in the operator declaration. This is shown in the following example usage: val indices = Indices(conn) 1 The

prototype client library emulates projection queries as those are not yet implemented in the SciDB prototype used for this research.

1 2 3 4 5 6 7 8 9 10 11 12 13 14

for ( (f , v ) < - samples subsample ( Samples . x > 50 && Samples . y > 50 ) get ( Samples . flux , Samples . variance ) ) { Console . printf ( " % f % f \ n " , f , v ) ) } val xtremes = samples filter ( Samples . variance > 50f ) for ( (f , v ) < - samples aggregate ( List ( Samples .x , Samples . y ) , avg ( Samples . time ) ) get ( Samples . flux , Samples . variance ) ) { Console . printf ( " Avg : % f % f \ n " , f , v ) ) }

Figure 3. Query predicates. val samples = Samples(conn) for ((f, v) <- samples lookup(indices) get(Samples.flux, Samples.variance)) { // ... } Array operator application results in the construction of an array-algebra tree which remains unevaluated until values are requested by using the resulting array in a sequence comprehension or accessing cell data via direct or bulk access. When database access is required, the array-algebra tree is converted into SciDB’s XML query representation and sent as a query to database. 4.4

Query Predicates

Some array-algebra operators require predicate expressions over cell field values or dimension values. Our goal is to provide concise, natural expression of these predicate expressions in a manner that can be type-checked at compile-time. Figure 3 lines 1-5 demonstrate the subsample operator which takes a predicate expression over an array’s dimensions as its argument. Its output is an array with the same valence (number of dimensions) but with a smaller range in one or more of its dimensions as defined by the predicate. The code provided to the subsample operator in the example constructs an expression tree which is supplied to the operator and later converted into a string representation of the expression for inclusion in the XML query sent to the database server. The mechanism by which this expression tree is constructed and type-checked is explained by stepping through the example expression. Samples.x is a FieldExpr as described in Section 4.1. The FieldExpr class mixes in the ValueExpr trait which defines a set of predicate constructor methods: trait def def def def

ValueExpr[T] { > (value :ValueExpr[T]) :BoolExpr >= (value :ValueExpr[T]) :BoolExpr < (value :ValueExpr[T]) :BoolExpr <= (value :ValueExpr[T]) :BoolExpr

def + (value :ValueExpr[T]) :ValueExpr[T] def - (value :ValueExpr[T]) :ValueExpr[T]

def * (value :ValueExpr[T]) :ValueExpr[T] def / (value :ValueExpr[T]) :ValueExpr[T] } These methods construct an expression tree node which captures the receiver (the left-hand-size of the expression) and the argument (the right-hand-side of the expression). The tree node may be of two types: another ValueExpr in the case of arithmetic operations, or a BoolExpr in the case of boolean operations. In the example, the ValueExpr.> method is called on Samples.x with the integer literal 50. 50 is not a ValueExpr and thus is not a legal argument, but Scala supports so-called implicit functions for converting objects of one type to another. In this case, we have defined implicit functions for promoting integer literals to instances of IntExpr: implicit def intToExpr (value :Int) = IntExpr(value) IntExpr mixes in ValueExpr[Int] and is thus a legal argument to the ValueExpr.> method. The result of the ValueExpr.> method is an expression tree node that captures its terms and mixes in the BoolExpr trait, which offers methods like the following: trait BoolExpr { def && (expr :BoolExpr) :BoolExpr def || (expr :BoolExpr) :BoolExpr } An identical process operates on the other clause, Sample.y > 50, resulting in a BoolExpr which is then supplied as an argument to the BoolExpr.&& method of the left subexpression. That captures both sub-expressions in a tree of expression nodes equivalent to the following: AndExpr( GreaterExpr(Samples.x, IntExpr(50)), GreaterExpr(Samples.y, IntExpr(50)) ) The AndExpr class also mixes in the BoolExpr trait, thus the final type of the expression tree is BoolExpr which is the type required by the subsample array operator: class DBArray { def subsample (pred :BoolExpr) :DBArray } This ensures that only complete boolean expressions are supplied to the operator. An attempt to supply, for example, Samples.x + 50 would result in a compilation error as that expression has type ValueExpr. Also note that Scala defines associativity precedence for methods with operator-like names based on the first character of the method name [8]. This results in the expected associativity when parsing method calls written in this in-

fix notation, though parentheses can be used as expected to supply FieldExpr[String] or FieldExpr[Bool] as the force an alternative association. argument to this sum function, it would be rejected by the Line 7 of Figure 3 demonstrates a similar predicate excompiler. In this way, the typing requirements for SciDB pression over array cell fields rather than array dimensions. built-in functions as well as user defined functions can be The filter operator returns an array of the same shape as its expressed and enforced in the client library. argument which contains 1 in every cell that matches the 4.5 Choice of singleton as cell model supplied predicate and 0 in cells that do not. The choice to model an array cell as a singleton rather than The same mechanism describe above acts to convert a class was intentional. By modeling array fields as constant the expression Samples.variance > 50f into an expresmembers of a singleton object, we enable their natural use sion tree. This example also serves to demonstrate the in predicate construction and other operations in the library. type checking mechanism used for predicate expressions. Were array schemas defined as classes, the names and types Sample.variance is defined as float32("variance") of the fields would not be available as first-class values which has type FieldExpr[Float]. The predicate conand would either have to be duplicated in separate metadata struction methods provided by ValueExpr require argumembers, or described by name using strings, giving up the ments with a matching parameterized type. Thus FieldExpr[Float] ability to statically check the types of expressions on fields. may only be compared to another term parameterized by This design choice requires that the programmer obtain Float. In the example expression 50f is a floating point cell values using function application syntax (i.e. cell(Samples.x)) literal which is promoted to type FloatExpr via an implicit rather than field access syntax (i.e. cell.x) but this verfunction like the one described above for values of type bosity is ameliorated by the variable binding form of the Int. FloatExpr is a subtype of ValueExpr[Float] and is sequence expression. Additionally, a common use case in thereby legal in the example expression. An attempt to comrelational queries is to perform a projection, join or aggrepare a FieldExpr[Float] with a non-Float value would gation, in which case a class modeling a specific relation result in a compilation error. is unsuitable because it contains either more fields than are Though possible with our approach, the prototype impledesired (in the case of a projection) or is incapable of reprementation has strictly avoided the implicit numeric conversenting the fields of joined tables or aggregation results. sions commonly found in programming languages (e.g. proMicrosoft’s LINQ relies on a concise syntax for creating motion of an integer value to a floating point value). The anonymous classes and binding the results to named fields choice to eliminate implicit numeric conversions reflects acas shown below: cumulating wisdom in the programming languages community that problems caused by such conversions justify the mivar customerCountries = nor inconvenience of requiring an explicit cast where converfrom c in customers sions are knowingly desired. In the realm of scientific comwhere c.Name.StartsWith("M") puting where the subtle introduction of error by such conorderby c.Country versions can meaningfully skew results, they are even more group c by c.Country into g problematic. select new { Country = g.Key, The final block of Figure 3, lines 9-14, demonstrates Count = g.Count() }; the application of the aggregate array-algebra operator and, We opt instead to take advantage of Scala’s pattern matching more interestingly, the use of a built-in function in the agsupport for constructing and deconstructing tuples which gregation expression. The aggregate operator requires a list enable the variable binding syntax shown on line 13 of of grouping dimensions and an aggregation function to run Figure 2. on the non-grouped dimensions. The built-in function avg is shown here, which takes as input a Dimension and re5. Evaluation turns an expression tree representing the appropriate SciDB We now consider in turn the benefits of the language intefunction application. grated query approach and the degree to which our impleSuch functions can be provided for dimensions as well as mentation achieved those benefits. Further we note whether cell fields and can be typed such that they enforce limitations the langauge features required to achieve our goals are genon their arguments. For example, the following definition of erally available in other languages. sum: Type-safety: The use of parameterized types in cell field declarations and the propagation of those types during preddef sum[T : Numeric] ( icate expression construction catches a large class of type field :FieldExpr[T]) :FuncExpr errors. Additionally, parameterized types are a widely availuses Scala’s type parameter view bounds mechanism to enable language feature, existing in Java, C++ and many other force that it is only applicable to cell field expressions with statically typed languages. The use of Scala’s type parameter numeric type (e.g. Int, Float, etc.). Were an expression to view bounds functionality to limit the arguments of function

expressions to correct types could be emulated in languages with static method overloading at the expense of some code duplication in the library definition. Unfortunately, some type errors remain possible. The investigated implementation does not guarantee that the cell fields and dimensions supplied as arguments to an arrayalgebra operator are defined in the correct array. This can be guaranteed via a runtime check, but compile time enforcement would be preferable. Object-ownership types [1] provide an interesting basis for such enforcement, but no widely used language supports such a type system and thus they would be infeasible as a broadly applicable solution. Concise syntax: By leveraging Scala’s support for flexible method names, infix notation for method invocation, and implicit functions, we are able to provide query predicate expressions that look syntactically natural. Instead of evaluating the expression in question, they construct an expression tree which is supplied to the library for use in generating the database query. Limitations of the inference rules for application of implicit functions place one restriction on the declaration of predicate expressions: the left-hand-side of such an expression must be a dimension or cell field. Thus Samples.x > 7 will parse correctly, whereas the logically equivalent 7 < Samples.x will fail to compile. In practice, we find the former expression more natural and more likely to be written by library users, but such lack of orthogonality is unfortunate. Many scripting languages, including Python, also support operator-style method names and infix notation for method invocation and are thus directly amenable to this approach. A similar effect could probably be achieved in languages that support operator overloading, like C++. Uniform data access: Scala’s modeling of array access as function application and translation of that application to a call to the apply method allows us to achieve random access to SciDB arrays that is equivalent to random access to in-memory arrays. Further, Scala’s support for sequence comprehensions allows the same idioms used to iterate over in-memory sequences to be applied to SciDB arrays. The generalized filtering, sorting and grouping mechanisms provided by sequence comprehensions turned out not to be applicable to SciDB’s arrays. Rather the richer set of array-algebra operations were more cleanly modeled as methods on the DBArray class. Because the result of all SciDB operators is itself an array, the chaining of arrayalgebra operations is already very naturally expressed as a sequence of method calls. Other languages, including Python, provide sequence comprehension support and would be amenable to an approach similar to ours. In the absence of sequence comprehensions, a simple iterator mechanism could be used to achieve less powerful but similarly natural integration.

6.

Related Work

The idea of using higher-order types to create a combinatorbased embedded query language originates with HaskellDB [4]. This work was generalized to provide uniform access to both relational and XML data in Cω [6], and was commercialized by Microsoft as the LINQ framework in the .NET platform. The HaskellDB work is conceptually closer to our effort in that it focuses on achieving language integrated query solely via the host language’s type system. However, Haskell’s lack of flexible function name support prevents the use of syntax equivalent to host language boolean and arithmetic expressions and substantially reduces the usability of the technique. ScalaQL [10] uses similar techniques to ours to create a language integrated query interface for traditional relational databases. They omit technical details and any discussion of schema definition, but their combinator-based construction of predicate expressions leverages the same Scala features to achieve syntactic conciseness and type-safety. AraRat [2] provides type-safe query construction in C++ using a combination of templates, operator overloading and pre-processor macros. As C++ lacks sequence comprehensions and a standard low-level database access library, AraRat focuses solely on correct query construction, ignoring the retrieval and processing of the data. It simply produces a well-formed string containing a SQL query and leaves database access and result processing to the user.

7.

Future Work

7.1

Additional Type-Checking

Object-ownership types may offer a basis for representing the type of the arrays used in array-algebra operations in the type of the arguments to those operations, thereby ensuring that the cell fields and dimensions supplied to those operations are in fact members of the involved arrays. Such work may also require the use of intersection types as a sequence of operations like join followed by project would require that a combination of the fields and dimensions of the arrays involved in the join be allowed as legal arguments to the projection. Another active area of type system research explores dependent types, one common application of which is to encode the length of an array in its type. This allows certain aspects of array bounds checking to be enforced at compilation time. Whether such an approach generalizes to multidimensional arrays with non-uniform dimensions and has application in the scientific data processing domain requires further investigation. 7.2

Low-level client library

The goal of providing a type-safe, concise, powerful programmatic interface to the SciDB database necessarily requires that some effort be expended on a language-bylanguage basis. This paper has explored the degree to which non-exotic language mechanisms can be used to achieve

these goals. We have not considered the goals of highperformance, efficient access to data sets at the scale expected for SciDB. We believe that these goals need not be addressed on a language-by-language basis. It is likely that an effective division of labor can be achieved with a single implementation of a high-performance, bulk-access programmatic interface that defers type-safe query construction and cell-by-cell data access to a languagespecific interface like the one explored in this paper. One question to be resolved is what the ideal boundary is between these layers and how expected usage patterns should influence the choice of that boundary. For example, our prototype implementation offers both “row-store”-style access to cell contents where the fields and dimensions of an individual cell are made available one cell at a time, and it offers “column-store”-style access where the entire contents of a single field can be delivered to the caller as a single contiguous in-memory array. Our expectation is that the latter will be critical for high-performance, but the SciDB physical storage design, its network protocol design and the expected use cases all likely need to be considered to arrive at an optimal design.

8.

Conclusions

We have identified the individual language features needed to meet the goals of a type-safe, concise programmatic interface to SciDB. We offered a design and prototype implementation that achieves these goals and which can be adapted to other languages with only minor adjustments in cases where the host language lacks necessary mechanisms. We conclude that most of the benefits of advanced database interface techniques like LINQ are applicable to SciDB’s unique requirement and can be achieved today in a wide variety of languages. Doing so will make SciDB easier to use by its scientific customers, and help to advance the pace of discovery as increasingly data-intensive research is undertaken in the sciences.

References [1] David Gerard Clarke. Object ownership and containment. PhD thesis, New South Wales, Australia, Australia, 2003. [2] Joseph (Yossi) Gil and Keren Lenz. Simple and safe SQL queries with C++ templates. In GPCE ’07: Proceedings of the 6th international conference on Generative programming and component engineering, pages 13–24, New York, NY, USA, 2007. ACM. [3] Simon Peyton Jones and P Wadler. Comprehensive comprehensions. Proceedings of the ACM SIGPLAN workshop on Haskell workshop, pages 61–72, 2007. [4] Daan Leijen and Erik Meijer. Domain specific embedded compilers. In PLAN ’99: Proceedings of the 2nd conference on Domain-specific languages, pages 109–122, New York, NY, USA, 1999. ACM.

[5] David Maier, Stan Zdonik, and Mike Stonebraker. SciDB model and operators. draft, 2009. [6] E Meijer and W Schulte. Unifying tables, objects and documents. Proceedings of the Workshop on Declarative Programming in the Context of Object-Oriented Languages, 2003. [7] Erik Meijer, Brian Beckman, and Gavin Bierman. LINQ: reconciling object, relations and XML in the .NET framework. In SIGMOD ’06: Proceedings of the 2006 ACM SIGMOD international conference on Management of data, pages 706–706, New York, NY, USA, 2006. ACM. [8] M Odersky, P Altherr, V Cremet, B Emir, and S Micheloud. The Scala language specification. Programming Methods Laboratory, 2009. [9] Martin Odersky, Lex Spoon, and Bill Venners. Programming in Scala: A Comprehensive Step-by-step Guide. Artima Incorporation, USA, 2008. [10] Daniel Spiewak and Tian Zhao. ScalaQL: Languageintegrated database queries for Scala. Software Language Engineering: 2nd Intnl. Conf., SLE 2009, 2009. [11] M Stonebraker, J Becla, D DeWitt, K Lim, D Maier, O Ratzesberger, S Zdonik, P Cudre-Mauroux, H Kimura, and KT Lim. Requirements for science data bases and SciDB. CIDR’07, pages 173–184, 2009. [12] Mads Torgersen. Querying in C#: how language integrated query (LINQ) works. In OOPSLA ’07: Companion to the 22nd ACM SIGPLAN conference on Object-oriented programming systems and applications companion, pages 852– 853, New York, NY, USA, 2007. ACM. [13] P Wadler. Comprehending monads. Proceedings of the 1990 ACM conference on LISP and functional programming, pages 61–78, 1990.

Relevant Query Feedback in Statistical Language ...