Queries validated by running on test data and manually checking results

Yüklə 491 b.

tarix	19.11.2017
ölçüsü	491 b.
	#11315

Queries validated by running on test data and manually checking results

Queries validated by running on test data and manually checking results

Test data typically generated manually and/or in query independent way
May not catch all errors

The XData system (Shah et al. ICDE 2011) provides a principled approach to automatically generating test data

Based on “killing” query “mutations”

Mutation: single change to a given query

Mutation: single change to a given query

Mutations model common programming errors, e.g.

Join vs. left/right/full outerjoin
Selection/join condition: e.g.
Wrong aggregate: e.g.
select vs. select distinct
And many more ..

Note: Mutant may be the intended query

Traditional use of mutation testing is to check coverage of dataset

Traditional use of mutation testing is to check coverage of dataset

Generate mutants of the original query Q by modifying Q in a controlled manner
A dataset kills a mutant Q’, if Q and Q’ give different results on the dataset
A suite of datasets is considered complete if each non-equivalent mutant of the given query is killed by at least one of the datasets

E.g. To kill below mutation, need a department without any high-salary instructors

Goal of XData system: generate datasets for testing SQL queries

Goal of XData system: generate datasets for testing SQL queries

Testing queries: Each test dataset + query result is shown to human tester for verification.

Requirement: number of datasets should be small, and each dataset should also be small in size.

Testing SQL assignments: Run correct query and student query on each of the datasets, flag if results differ
Note that we do not actually generate and execute mutants

XData [ICDE 2011] addresses data generation

Considers a subclass of SQL queries
Number of datasets is linear in size of the query
Completeness results for a subclass of mutations

Mutations between joins and outerjoin across all join orders
Mutations between {=,<>,<,<=,>,>=}
Mutations between {min, max}, and distinct/non-distinct versions of sum, count, avg (with no constraints on aggregate values)

Extends XData [ICDE 2011] to handle many commonly used SQL features

Extends XData [ICDE 2011] to handle many commonly used SQL features

NULL values
Constrained aggregation
String constraints,
Subqueries
Set operations
Date and time
Views
Insert/delete/update queries

Studies effectiveness of techniques for grading real SQL assignments

Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05]

Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05]

Describe a number of SQL query mutations
But do not address test data generation

Olston et al (SIGMOD09], Qex [LPAR10]

Generate data for the original query
But do not generate data to kill mutations

de la Riva et al [AST10]

Address data generation using constraints, with the Alloy solver
Consider only join/selection queries
Do not consider alternative join orders

Create template datasets with constraints

Create template datasets with constraints

Each attribute of each tuple is a variable
Add constraints on variables to model

Primary/foreign key constraints

E.g. ASSERT FORALL i EXISTS j (O_CRSE[i].dept_name= O_DEPT[j].dept_name);

Query constraints
Constraints to kill a specific mutation

This part is different for each template dataset generated
E.g. ASSERT NOT EXISTS (i: INT) (O_DEPT[1].dept_name=O_CRSE[i].dept_name) ;

Generate data for each template dataset

Using constraint solver, currently CVC3

Datatype declarations for each attribute

Datatype declarations for each attribute

DATATYPE Course_id= BIO101 | BIO301 | BIO399 | CS101 | END; Credits : TYPE = SUBTYPE (LAMBDA (x: INT) : x > 1 AND x < 5);

Array of tuples per relation

CRSE_TupleType: TYPE = [course_id, dept_name, credits];
O_CRSE: ARRAY INT OF CRSE_TupleType;
Each attribute of each tuple is a variable
whose value will be assigned by the solver
E.g. each of O_CRSE[1].course_id, O_CRSE[1].dept_name, O_CRSE[2].course_id, …, is a variable

Constraints for primary and foreign keys

Constraints for primary and foreign keys

f.k. from crse.dept_name to dept.dept_name

ASSERT FORALL i EXISTS j (O_CRSE[i].dept_name= O_DEPT[j].dept_name);

p.k. on COURSE_ID

ASSERT FORALL i FORALL j (O_CRSE[i].course_id = O_CRSE[j].course_id) => “all other attrs equal”

Constraints from join conditions

E.g. to ensure first tuples of course and department match

ASSERT (O_CRSE[1].dept_name= O_DEPT[1].dept_name) ;

To kill join to right-outer-join mutation, generate another dataset with above condition modified to:

ASSERT NOT EXISTS (i: INT) (O_CRSE[i].dept_name=O_DEPT[1].dept_name ) ;

Constraints from other query selection conditions

E.g. O_CRSE[1].credits < 5;
To kill mutations between {=, <, <=, ..}, generate datasets with above modified to = 5, and > 5 respectively.

Extensions to handle several SQL constructs/features

Extensions to handle several SQL constructs/features

CVC3 does not understand NULL values

CVC3 does not understand NULL values
Define values which act as placeholder NULL values, for e.g.,

DATATYPE COURSE_ID = BIO101 | BIO301 | BIO399 | CS101 | CS190 | NULL_COURSE_ID_1 | NULL_COURSE_ID_2

Add CVC3 constraints which distinguish between NULL and non-NULL values
CVC3 constraint to enforce foreign keys modified to allow null values
Generate constraints assigning null values to some variables, to kill mutations

Different placeholder NULL values assigned to different variables

Can now handle

Mutation of count(attr) to count (*), and
Mutations of IS NULL to NOT IS NULL

We consider following string constraints

We consider following string constraints

s1 relop const/s2, where relop in {=, <, <=, >, >=, <>} and s op const where op in {~=, LIKE, ILIKE, NOT LIKE, NOT ILIKE}
strlen(s) relop const
Constraints on upper(s) and lower(s)

CVC3 does not support string data type
We built our own string constraint solver

Other string solvers (Hampi, Kaluza, Rex) didn’t offer features we needed

For a query containing string and non-string constraints, we

First solve string constraints independently
Then solve non-string constraints using CVC3
We do not allow constraints between string and non-string variables

Collect all string constraints; reduce the number of constraints by propagating constants

Collect all string constraints; reduce the number of constraints by propagating constants
Construct a graph

With string variables as vertices
With edges derived from constraints of the form Vi < Vj

Traverse vertices Vi in increasing order of < constraint

For each such vertex Vi, find the lexicographically smallest string that satisfies all constraints for Vi

Done by

translating each constraint (LIKE const, < const, ..) on Vi into an automaton
intersecting automatons for the different constraints
and solving to get lexicographically smallest solution

For killing mutations between string comparisons {=, <>, <, <=, >, >=}

For killing mutations between string comparisons {=, <>, <, <=, >, >=}

Datasets generated for S1 = S2, S1 > S2, S1 < S2

For killing mutations between {LIKE, ILIKE, NOT LIKE, NOT ILIKE}

Dataset 1 satisfying S1 LIKE pattern
Dataset 2 satisfying S1 ILIKE pattern but not S1 LIKE pattern
Dataset 3 not satisfying both LIKE and ILIKE conditions

Example:

Example:

select r.A, sum(s.C) from r, s where r.B = s.B and s.C < 5 group by r.A having sum(s.C) > 20

Major Issue: CVC3 does not support relation types.
To specify aggregation constraints like SUM(s.C) > 20 in CVC3, we have to pre-decide the number of tuples in the input to the aggregation operation.
Before generating CVC3 constraints we need to estimate

Number of tuples required to satisfy the aggregation
Translate this number to appropriate number of tuples for each base relation
E.g. for above query

At least 5 tuples in join result, for a given r.A value
If r.A is a primary key, need 5 s tuples matching a given r tuple
Else need 5 r tuples, each with a matching s tuple

For each aggregated attribute A, we collect

For each aggregated attribute A, we collect

All aggregation constraints, domain constraints and non-aggregate constraints on A like A < 15
Invariant constraints on aggregation operators like

AVG * COUNT = SUM, MIN <= MAX, ..

Solve above constraints using CVC3 to find value for COUNT

Gives number of input tuples n required to satisfy aggregation constraint

If input to the aggregation is a join, we decide number of tuples of each base relation

Heuristic based on group by attributes, the joining attributes, and unique/p.k. constraints on joining attributes
Again using CVC3 to generate values
Details in paper

Create final set of constraints for each dataset based on above solution.

Nested subqueries in WHERE clause

Nested subqueries in WHERE clause

Some details in paper

Set Operations:

Generate dataset to kill mutations between UNION (ALL), INTERSECT (ALL), EXCEPT (ALL)

Handling Parameterized Queries

assign a variable to each parameter; solution given by CVC3 contains the value for the parameter

Date and Time

converted to integers

Handling Insert/Delete/Update Queries:

convert into SELECT queries and data is generated to catch mutations of the resultant SELECT queries

Used XData to grade SQL query assignments in an undergrad database course at IITB

Used XData to grade SQL query assignments in an undergrad database course at IITB

University schema from textbook (Database System Concepts, 6th ed, Silberschatz et al)
15 assignment queries chosen, each with about 70 submissions

Ubuntu system with Intel i5, 3.30 GHz CPU and 8 GB memory
Time to generate datasets ranged from 6.7 to 49 seconds with on

Most datasets had <= 2 tuples per relation, those with aggregates had at most 5.
Average of approx. 6 datasets per query

Compared with two sample University datasets provided with textbook, and with result of manual correction by TAs

Query 8: SELECT DISTINCT course_id, title FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’ AND course_id NOT IN (SELECT course_id FROM prereq)

Query 8: SELECT DISTINCT course_id, title FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’ AND course_id NOT IN (SELECT course_id FROM prereq)
# Queries : 79

Query 14: SELECT DISTINCT * FROM takes T WHERE T.grade IS NOT NULL AND (T.grade <> ‘F” OR NOT EXISTS (SELECT id, course_id FROM takes S WHERE S.grade <> ‘F’ AND T.id = S.id AND T.course_id = S.course_id))

Query 14: SELECT DISTINCT * FROM takes T WHERE T.grade IS NOT NULL AND (T.grade <> ‘F” OR NOT EXISTS (SELECT id, course_id FROM takes S WHERE S.grade <> ‘F’ AND T.id = S.id AND T.course_id = S.course_id))
# Queries : 67

Extended XData to handle a number of widely used constructs

Extended XData to handle a number of widely used constructs

These were important for grading student SQL assignments
Performance study shows benefits over using fixed datasets, and manual correction

Currently working on further extensions

disjunctions, extra/missing group by attribute, unintended equating of attributes in natural join
handling multiple queries in an application

Also working on a tool for grading SQL assignments

teaches instructor is equivalent to teaches instructor if there is a foreign key from teaches.ID to instructor.ID

teaches instructor is equivalent to teaches instructor if there is a foreign key from teaches.ID to instructor.ID
BUT: teaches σ dept=CS(instructor) is not equivalent to teaches σ dept=CS(instructor)
Key idea: have a teaches tuple with an instructor not from CS
Selections and joins can be used to kill mutations

Space of join-type mutants: includes mutations of join operator of a single node for all trees equivalent to given query tree

Space of join-type mutants: includes mutations of join operator of a single node for all trees equivalent to given query tree
Datasets should kill mutants across all such trees

Whether query conditions written as

Whether query conditions written as

A.x = B.x AND B.x = C.x or as
A.x = B.x AND A.x = C.x

should not affect set of mutants generated
Solution: Equivalence classes of attributes

For a class of mutations:

For a class of mutations:

Join to outerjoin
selection condition, unconstrained aggregation

Algorithm to generate test data to kill all non-equivalent mutants in above class, for a (fairly large) subset of SQL.

Under some simplifying assumptions

Only primary and foreign key constraints
Single block SQL queries; no nested subqueries
Expr/functions: Only arithmetic exprs
Join/selection predictates : conjunctions of {expr relop expr}

Schema: r(A,B), s(C,D), t(E)

Schema: r(A,B), s(C,D), t(E)
To kill this mutant we must ensure that for an r tuple there is no matching s tuple, but there is a matching t tuple
Generated test case: r(A,B)={(1,2)}; s(C,D)={}; t(E)={(2)}

Query 5: SELECT DISTINCT course.dept_name FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’

Query 5: SELECT DISTINCT course.dept_name FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’
# Queries : 72

Nested subqueries in the WHERE clause connected by IN, NOT IN, EXISTS, NOT EXISTS, ALL or ANY clause

Nested subqueries in the WHERE clause connected by IN, NOT IN, EXISTS, NOT EXISTS, ALL or ANY clause
We make some assumptions

Correlation conditions involve a single relation in the outer query
IN connectives involve attributes from only one outer relation
Nesting depth is at most 1

We first generate constraints to create tuples for the outer query result ignoring the subquery and then generate constraints for the subquery

Dataset generated for original query will kill mutations between IN and NOT IN, EXISTS and NOT EXISTS

Dataset generated for original query will kill mutations between IN and NOT IN, EXISTS and NOT EXISTS
For conditions of the form r.A relop scalar subquery, we generate data to kill mutations between different relops
To kill mutation of ALL versus ANY, we generate two tuples for the inner query such that one satisfies the comparison operator and other does not
Subquery Mutations : generate constraints to kill mutations of the subquery and then add constraints of the outer query

Yüklə 491 b.

Dostları ilə paylaş:

Queries validated by running on test data and manually checking results

Queries validated by running on test data and manually checking results

Queries validated by running on test data and manually checking results

The XData system (Shah et al. ICDE 2011) provides a principled approach to automatically generating test data

Mutation: single change to a given query

Mutation: single change to a given query

Traditional use of mutation testing is to check coverage of dataset

Traditional use of mutation testing is to check coverage of dataset

E.g. To kill below mutation, need a department without any high-salary instructors

Goal of XData system: generate datasets for testing SQL queries

Goal of XData system: generate datasets for testing SQL queries

XData [ICDE 2011] addresses data generation

Extends XData [ICDE 2011] to handle many commonly used SQL features

Extends XData [ICDE 2011] to handle many commonly used SQL features

Studies effectiveness of techniques for grading real SQL assignments

Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05]

Tuya and Suarez-Cabal [IST07], Chan et al. [QSIC05]

Olston et al (SIGMOD09], Qex [LPAR10]

de la Riva et al [AST10]

Create template datasets with constraints

Create template datasets with constraints

Generate data for each template dataset

Datatype declarations for each attribute

Datatype declarations for each attribute

Array of tuples per relation

Constraints for primary and foreign keys

Constraints for primary and foreign keys

Constraints from join conditions

Constraints from other query selection conditions

Extensions to handle several SQL constructs/features

Extensions to handle several SQL constructs/features

CVC3 does not understand NULL values

CVC3 does not understand NULL values

Define values which act as placeholder NULL values, for e.g.,

Add CVC3 constraints which distinguish between NULL and non-NULL values

CVC3 constraint to enforce foreign keys modified to allow null values

Generate constraints assigning null values to some variables, to kill mutations

Can now handle

We consider following string constraints

We consider following string constraints

CVC3 does not support string data type

We built our own string constraint solver

For a query containing string and non-string constraints, we

Collect all string constraints; reduce the number of constraints by propagating constants

Collect all string constraints; reduce the number of constraints by propagating constants

Construct a graph

Traverse vertices Vi in increasing order of < constraint

For killing mutations between string comparisons {=, <>, <, <=, >, >=}

For killing mutations between string comparisons {=, <>, <, <=, >, >=}

For killing mutations between {LIKE, ILIKE, NOT LIKE, NOT ILIKE}

Example:

Example:

Major Issue: CVC3 does not support relation types.

To specify aggregation constraints like SUM(s.C) > 20 in CVC3, we have to pre-decide the number of tuples in the input to the aggregation operation.

Before generating CVC3 constraints we need to estimate

For each aggregated attribute A, we collect

For each aggregated attribute A, we collect

Solve above constraints using CVC3 to find value for COUNT

If input to the aggregation is a join, we decide number of tuples of each base relation

Create final set of constraints for each dataset based on above solution.

Nested subqueries in WHERE clause

Nested subqueries in WHERE clause

Set Operations:

Handling Parameterized Queries

Date and Time

Handling Insert/Delete/Update Queries:

Used XData to grade SQL query assignments in an undergrad database course at IITB

Used XData to grade SQL query assignments in an undergrad database course at IITB

Ubuntu system with Intel i5, 3.30 GHz CPU and 8 GB memory

Time to generate datasets ranged from 6.7 to 49 seconds with on

Compared with two sample University datasets provided with textbook, and with result of manual correction by TAs

Query 8: SELECT DISTINCT course_id, title FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’ AND course_id NOT IN (SELECT course_id FROM prereq)

Query 8: SELECT DISTINCT course_id, title FROM course NATURAL JOIN section WHERE section.semester =‘Spring’ AND section.year =‘2010’ AND course_id NOT IN (SELECT course_id FROM prereq)

# Queries : 79

Query 14: SELECT DISTINCT * FROM takes T WHERE T.grade IS NOT NULL AND (T.grade <> ‘F” OR NOT EXISTS (SELECT id, course_id FROM takes S WHERE S.grade <> ‘F’ AND T.id = S.id AND T.course_id = S.course_id))

Query 14: SELECT DISTINCT * FROM takes T WHERE T.grade IS NOT NULL AND (T.grade <> ‘F” OR NOT EXISTS (SELECT id, course_id FROM takes S WHERE S.grade <> ‘F’ AND T.id = S.id AND T.course_id = S.course_id))

# Queries : 67

Extended XData to handle a number of widely used constructs

Extended XData to handle a number of widely used constructs

Currently working on further extensions