Commit graph

37 commits

Author SHA1 Message Date
Klaus Post 0987069e37
select: Fix integer conversion overflow (#10437)
Do not convert float value to integer if it will over/underflow.

The comparison cannot be `<=` since rounding may overflow it.

Fixes #10436
2020-09-08 15:56:11 -07:00
Harshavardhana caad314faa
add ruleguard support, fix all the reported issues (#10335) 2020-08-24 12:11:20 -07:00
Bruce Wang e464a5bfbc
Fix bug with fields that contain trimming spaces (#10079)
String x might contain trimming spaces. And it needs to be trimmed. For
example, in csv files, there might be trimming spaces in a field that
ought to meet a query condition that contains the value without
trimming spaces. This applies to both intCast and floatCast functions.
2020-07-21 12:57:09 -07:00
Harshavardhana 1bc32215b9
enable full linter across the codebase (#9620)
enable linter using golangci-lint across
codebase to run a bunch of linters together,
we shall enable new linters as we fix more
things the codebase.

This PR fixes the first stage of this
cleanup.
2020-05-18 09:59:45 -07:00
Klaus Post e4900b99d7
s3 select: Infer types for comparison (#9438) 2020-04-24 13:02:59 -07:00
Anis Elleuch 9902c9baaa
sql: Add support of escape quote in CSV (#9231)
This commit modifies csv parser, a fork of golang csv
parser to support a custom quote escape character.

The quote escape character is used to escape the quote
character when a csv field contains a quote character
as part of data.
2020-04-01 15:39:34 -07:00
Klaus Post 8d98662633
re-implement data usage crawler to be more efficient (#9075)
Implementation overview: 

https://gist.github.com/klauspost/1801c858d5e0df391114436fdad6987b
2020-03-18 16:19:29 -07:00
Anis Elleuch 35ecc04223
Support configurable quote character parameter in Select (#8955) 2020-03-13 22:09:34 -07:00
Aditya Manthramurthy cec8cdb35e
S3Select: Handle array selection in from clause (#9076) 2020-03-10 22:34:58 -07:00
Klaus Post e4020fb41f
SIMDJSON S3 select input (#8401) 2020-02-13 14:03:52 -08:00
Klaus Post f1e2e1cc9e S3 Select: Mismatched types don't match (#8608)
When comparing for equality, if types cannot be matched, they don't match.
2019-12-06 07:24:41 -08:00
Harshavardhana 5d3d57c12a
Start using error wrapping with fmt.Errorf (#8588)
Use fatih/errwrap to fix all the code to use
error wrapping with fmt.Errorf()
2019-12-02 09:28:01 -08:00
Klaus Post 1c90a6bd49 S3 Select: Convert CSV data to JSON (#8464) 2019-11-09 09:10:35 -08:00
Klaus Post 38e6d911ea S3 Select: Detect full object (#8456)
Check if select is `SELECT s.* from S3Object s` and forward it to All

Fixes #8371 and makes this case run significantly faster.
2019-10-30 13:46:55 +05:30
Klaus Post 51456e6adc Select: Support Square Bracket Lists (#8457)
Allows for S3 compatible `SELECT * from s3object s WHERE id IN [3,2]`

Fixes #8422
2019-10-30 11:34:40 +05:30
Klaus Post 002ac82631 S3 Select: Add parser support for lists. (#8329) 2019-10-06 07:52:45 -07:00
Klaus Post c1a17c2561 S3 Select: Aggregate AVG/SUM as float (#8326)
Force sum/average to be calculated as a float.

As noted in #8221

> run SELECT AVG(CAST (Score as int)) FROM S3Object on

```
Name,Score
alice,80
bob,81
```

> AWS S3 gives 80.5 and MinIO gives 80.

This also makes overflows much more unlikely.
2019-09-27 16:12:03 -07:00
Klaus Post 1c5b05c130 S3 select: Fix output conversion on select * (#8303)
Fixes #8268
2019-09-27 12:33:14 -07:00
Klaus Post dac1cf5a9a S3 Select: Parsing tweaks (#8261)
* Don't output empty lines.
* Trim whitespace from byte to int/float/bool conversions.
2019-09-17 17:21:23 -07:00
Klaus Post c9b8bd8de2 S3 Select: optimize output (#8238)
Queue output items and reuse them.
Remove the unneeded type system in sql and just use the Go type system.

In best case this is more than an order of magnitude speedup:

```
BenchmarkSelectAll_1M-12    	       1	1841049400 ns/op	274299728 B/op	 4198522 allocs/op
BenchmarkSelectAll_1M-12    	      14	  84833400 ns/op	169228346 B/op	 3146541 allocs/op
```
2019-09-17 05:56:27 +05:30
Yao Zongyou ec9bfd3aef speed up the performance of s3select on csv (#7945) 2019-08-31 00:07:40 -07:00
Yao Zongyou 60831e3299 aggregation functions' argument may already has been cast to numeric (#7876) 2019-07-05 10:38:38 -07:00
Yao Zongyou 037319066f fix unicode support related bugs in s3select (#7877) 2019-07-05 09:43:10 -07:00
Ryan Tam bd56f80250 Fix ignored alias for aggregate result in S3 Select (#7849)
The SQL parser as it stands right now ignores alias for aggregate
result, e.g. `SELECT COUNT(*) AS thing FROM s3object` doesn't actually
return record like `{"thing": 42}`, it returns a record like `{"_1": 42}`.
Column alias for aggregate result is supported in AWS's S3 Select, so
this commit fixes that by respecting the `expr.As` in the expression.

Also improve test for S3 select

On top of testing a simple `SELECT` query, we want to test a few more
"advanced" queries (e.g. aggregation).

Convert existing tests into table driven tests[1], and add the new test
cases with "advanced" queries into them.

[1] - https://github.com/golang/go/wiki/TableDrivenTests
2019-07-03 16:34:54 -07:00
Yao Zongyou 55092bede1 add timestamp compare support (#7832) 2019-06-25 11:05:37 -07:00
Yao Zongyou 90a3b830f4 fix typo and the string representation of the time.Time value (#7831) 2019-06-25 09:54:14 -07:00
Yao Zongyou 23b9df0694 Fix s3select TRIM function's nil pointer dereference bug (#7817) 2019-06-24 16:59:33 -07:00
kannappanr 5ecac91a55
Replace Minio refs in docs with MinIO and links (#7494) 2019-04-09 11:39:42 -07:00
Kirill Motkov 3d29ab4059 Rewrite if-else chains to switch statements (#7382) 2019-03-18 07:46:20 -07:00
Aditya Manthramurthy e463386921 Add JSON Path expression evaluation support (#7315)
- Includes support for FROM clause JSON path
2019-03-09 08:13:37 -08:00
Aditya Manthramurthy 8a405cab2f COUNT() function in select should return an int (#7243) 2019-02-13 16:32:59 -08:00
Harshavardhana df35d7db9d Introduce staticcheck for stricter builds (#7035) 2019-02-13 18:29:36 +05:30
Aditya Manthramurthy ee5b3622a5 Evaluate where clause in aggregation queries (#7235) 2019-02-12 13:54:26 -08:00
Aditya Manthramurthy f04f8bbc78 Add support for Timestamp data type in SQL Select (#7185)
This change adds support for casting strings to Timestamp via CAST:
`CAST('2010T' AS TIMESTAMP)`

It also implements the following date-time functions:
  - UTCNOW()
  - DATE_ADD()
  - DATE_DIFF()
  - EXTRACT()

For values passed to these functions, date-types are automatically
inferred.
2019-02-04 20:54:45 -08:00
Aditya Manthramurthy 2786055df4 Add new SQL parser to support S3 Select syntax (#7102)
- New parser written from scratch, allows easier and complete parsing
  of the full S3 Select SQL syntax. Parser definition is directly
  provided by the AST defined for the SQL grammar.

- Bring support to parse and interpret SQL involving JSON path
  expressions; evaluation of JSON path expressions will be
  subsequently added.

- Bring automatic type inference and conversion for untyped
  values (e.g. CSV data).
2019-01-28 17:59:48 -08:00
Bala FA e23a42305c Rebase minio/parquet-go and fix null handling. (#7067) 2019-01-16 21:52:04 +05:30
Bala FA b0deea27df Refactor s3select to support parquet. (#7023)
Also handle pretty formatted JSON documents.
2019-01-08 16:53:04 -08:00