Coder Social home page Coder Social logo

duckdb / duckdb-web Goto Github PK

View Code? Open in Web Editor NEW
136.0 11.0 260.0 86.39 MB

DuckDB website and documentation

Home Page: https://duckdb.org

License: MIT License

HTML 8.65% CSS 0.17% JavaScript 62.54% Python 6.87% Shell 1.05% SCSS 14.10% Ruby 0.83% Dockerfile 0.05% TeX 5.54% Makefile 0.02% Java 0.10% R 0.08%

duckdb-web's Introduction

DuckDB logo

DuckDB Website

This repository hosts the source code for the DuckDB website. Please file any questions or issues relating to the website or documentation here.

The DuckDB codebase is hosted in the DuckDB repository.

Building the site

To build the site with Jekyll (installed locally or run via Docker), check out our site build guide.

Contributing

Please consult the contributor's guide for instructions on how to contribute to the documentation.

duckdb-web's People

Contributors

alex-monahan avatar ankoh avatar bjornasm avatar carlopi avatar dependabot[bot] avatar domoritz avatar douenergy avatar eitsupi avatar example123 avatar franz-kafka avatar hannes avatar hawkfish avatar imgbotapp avatar jonathanauch avatar lnkuiper avatar maiadegraaf avatar mause avatar maxxen avatar michaeljohnalbers avatar mytherin avatar papparapa avatar pdet avatar quentingodeau avatar samansmink avatar soerenwolfers avatar szarnyasg avatar taniabogatsch avatar tiagokepe avatar tishj avatar tmonster avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duckdb-web's Issues

PLEASE RE-CLONE / RE-FORK THIS REPO

In the context of #123 we have rewritten the history of this repository to remove (many) redundant large files. This has reduced the size of this repo from 1.3GB to 75MB. However, if you have an outstanding repository cloned or forked the old history will still be present there. It is recommended that you re-clone or re-fork this repository to prevent that from causing problems.

Clarify use and differences of composite and nested types

I had some questions which I don't think are answered by the docs. The "Data Dypes Overview" briefly mentions ROW, MAP, and ARRAY, but not LIST and STRUCT, which are only mentioned in the "Nested Types" page.

  1. What is the difference (if any) between ARRAY, INT[], and LIST?
  2. Can the exact size of an array be used in a DDL statement (INT[3])? Are these limits respected?
  3. How is the type of a LIST or ARRAY specified in a DDL statement?
  4. Can LIST expressions ([1, 2, 3]) be used to insert into ARRAYs?
  5. What is the difference (if any) between STRUCT, ROW, and MAP?
  6. Can STRUCT expressions ({'foo': ...}) be used to insert into MAPs or ROWs?

Presumably they work like PG or standard SQL, but the docs could be expanded.

How to nest query?

I want to test nest query by duckdb. First I use create and insert to create a table with a map filed.
Then I want to query map data. But I didn't see some example in the documents, I trid many times, but not worked. So someone can help me ?

import duckdb
import time

if __name__ == "__main__":
    con = duckdb.connect()
    start = time.time()

    con.execute("create table mcule(id INTEGER, map_col MAP(VARCHAR ,VARCHAR ))")    
    con.execute("insert into mcule VALUES (1,map(['asia'],['asdfa']))")

    #con.execute("select * from mcule where map(map_col)(['asia'],['asdfa']"))
   # con.execute("select element_at(map_col,['asia'])")
   # print(con.fetchall())
   # con.execute("copy (select * from mcule) to 'nesttest.csv' (FORMAT 'CSV')")
   # con.execute("select people.name from 'nesttest.parquet'")
    con.execute("select * from mcule")
    print(con.fetchall())
    end = time.time()
    print("test time: " + str(end - start))

error

Traceback (most recent call last):
  File "createNesTTest.py", line 12, in <module>
    con.execute("select * from mcule where map(map_col)['asia']='asdfa'")
RuntimeError: We need exactly two lists for a map


Traceback (most recent call last):
  File "createNesTTest.py", line 12, in <module>
    con.execute("select * from mcule where map(['asia'],['asdfa'])")
RuntimeError: Conversion Error: Unimplemented type for cast (MAP<VARCHAR, VARCHAR> -> BOOLEAN)

DuckDB Live Demo sometimes fails on "SELECT distinct l_shipinstruct FROM lineitem"

I tried running "SELECT distinct l_shipinstruct FROM lineitem" and "SELECT distinct l_shipinstruct FROM lineitem order by 1 desc" on the https://duckdb.org/demo/ website and I either get an incorrectly rendered result or HTTP 503 errors.

But the query does work occasionally.

Request URL: https://duckdbdemo.project.cwi.nl//fetch?callback=jQuery3510355319707602344_1629874848617&ref=53knwZQ5mH&_=1629874848637
Request Method: GET
Status Code: 502 Proxy Error
Request URL: https://duckdbdemo.project.cwi.nl//query?callback=jQuery351010110534406317129_1629874584547&q=SELECT%20distinct%20l_shipinstruct%20FROM%20lineitem%20order%20by%201%20desc&_=1629874584573
Request Method: GET
Status Code: 503 Service Temporarily Unavailable

Screen Shot 2021-08-25 at 8 57 13 AM

Documentation Roadmap Discussion

I wanted to build a central list of documentation tasks in order to prioritize them. I've tied in existing issues where applicable. Feedback is welcome! Please let me know what I forgot or if the order of importance should be changed.

Tactical items:

  • Add list arguments to parquet documentation
  • Enhance nested types documentation (#110)
  • Document the filter clause
  • Interval Data Type (#137)
  • Revisit benchmarks (#78)
  • Document the new extension installation feature
  • Generate a single PDF for each commit (#964)
  • Document remaining undocumented functions that DuckDB already supports (duckdb/duckdb#2729 and related to #108 )
  • Document transferring to/from different RDBMS systems (#113)
  • Enhance compilation documentation (I've got some help needed here as I've only used Windows and haven't compiled DuckDB just yet!)
  • Document using configuration options in Python connection
  • Document compression options
  • Link to existing blog posts in relevant locations in the docs

Walkthroughs / tutorials (These could be blog posts maybe?)

  • Using various IDE's with DuckDB (DBeaver, Jupyter Lab SQL?, SQLite IDE's?)
  • Integrate with additional Python data tools (Dask, Modin, Vaex)
  • Integrate with Python orchestrators (Prefect, Dagster, Airflow)
    This can be a small or large effort. We could just do a demo, or integrate directly (Prefect has the ability to build custom connectors and has Postgres and SQLite already.)
  • Integrate with visualization engines (Redash, Metabase, Apache Superset). These likely require building small connectors for each library
  • Getting started with DuckDB for folks coming from a purely SQL background (Ex: PostgreSQL or SQL Server)

Larger items:

  • Integrate DuckDB WASM documentation in some way (Basic installation steps / simple example? Or just link to the other repo?)
    duckdb/duckdb-wasm#438
    duckdb/duckdb-wasm#375
  • Document the node.js Client API (#138) (This may be easier for me than the others since I've used it a little more, so I put it first)
  • Document the CLI (#125)
  • Document the Python Relational API (Is this likely to change much or is it ok to document now?)
  • Add examples for trickier functions, especially aggregates, etc.
  • Build user-editable examples using DuckDB WASM
    Ex: Interactive SQL IDE that can be pre-populated with examples from docs

the map function document is out of order

https://duckdb.org/docs/sql/functions/nested

Map Functions
| Function | Description | Example | Result | |:—|:—|:—|:—| | map[entry] | Alias for element_at | map([100, 5], ['a', 'b'])[100] | 42 | | element_at(map, key) | Return a list containing the value for a given key or an empty list if the key is not contained in the map. The type of the key provided in the second parameter must match the type of the map’s keys else an error is returned. | SELECT element_at(map([100, 5], [42, 43]),100); | 42 | | cardinality(map) | Return the size of the map (or the number of entries in the map). | cardinality( map([4, 2], ['a', 'b']) ); | 2 | | map() | Returns an empty map. | map() | {} |

CodeMirror SQL highlighting shows {}'s as errors

We are also missing some of the newer keywords like Qualify and Having. Is there a good way to pull a list of keywords with a DuckDB script where we could build a DuckDB-specific dialect for CodeMirror?

Thanks!

document the `INTERVAL` data type

Hi folks! Thank you for taking documentation seriously. Reading through the docs has helped onboard me to duckdb very quickly.

One thing I noticed: INTERVAL functions are documented by INTERVAL isn't. Could be that the function defs are enough to get started but it took me a Github issue discussion & looking at the source to understand the data type.

Best way to use Python Threads

Hey folks!

Before I write up a how-to-guide, would you mind taking a look at this approach to using Python threads? Is this the best practice? I couldn't get it to work with cursors, so if that is a better method than not checking the same thread, I'm open to changing this!

Thanks!

import duckdb
from threading import Thread, current_thread
import pandas as pd

def insert_from_thread(duckdb_con, results_df_dict):
  # Insert a row with the name of the thread
  thread_name = str(current_thread().name)
  results_df_dict[thread_name] = duckdb_con.execute("""INSERT INTO my_inserts VALUES (?)""", (thread_name,)).df()

duckdb_con = duckdb.connect(check_same_thread=False) # In Memory DuckDB
duckdb_con.execute("""CREATE OR REPLACE TABLE my_inserts (thread_name varchar)""")

thread_count = 10
threads = []
results_df_dict = {}

# Kick off multiple threads (in the same process) 
# Pass in the same connection as an argument, and an object to store the results
for i in range(thread_count):
    threads.append(Thread(target=insert_from_thread,
                            args=(duckdb_con, results_df_dict,),
                            name='my_thread_'+str(i)))

for i in range(thread_count):
    threads[i].start()

for i in range(thread_count):
    threads[i].join()

print(results_df_dict)
print(duckdb_con.execute("""SELECT * FROM my_inserts""").df())

Add UnNest to Nested Types Page

I just wanted to document this here before I forgot! I'm happy to make this change.

I think that the UnNest function should be mentioned in the Nested types section in addition to the SELECT overview.

In general, I think there might be a few other Postgres functions that aren't in the docs just yet, but we can track those in separate issues.

Document CLI commands

Feedback to #117

The DuckDB CLI has quite a few features (e.g. .tables) which are not yet documented.

I started writing this earlier this year but realized it's a larger task and abandoned it. I got this far:

## Installation

DuckDB can be installed as a binary. Please see the [installation page](/docs/installation?environment=cli) for details.

Other than SQL commands, SQLite-like instructions can be used:

    .help

* `.tables`: Print a list of tables

## History

The command history is saved in the home directory in `.duckdb_history`.

Interval functions documentation

Interval functions

Should be like

to_milliseconds(integer) | Construct a millisecond interval | to_milliseconds(5) | INTERVAL 5 MILLISECOND
to_microseconds(integer) | Construct a microsecond interval | to_microseconds(5) | INTERVAL 5 MICROSECOND

Benchmarks pages blank

The benchmark pages are showing as blank for me. I am getting a 503 (Service Temporarily Unavailable) from the duckdbdemo.project.cwi.nl calls.

cli alternative: sqlline

just a tip: no real need to build a cli, you can reuse sqlline if required so.
I understand the integration will not be as tight as possible, but it works, and enables one to use the same cli for all kind of databases:
 
combi of DuckDB JDBC driver and sqlline
see: https://duckdb.org/docs/data/parquet, and sqlline https://github.com/julianhyde/sqlline
 
sqlline -u "jdbc:duckdb:" -d "org.duckdb.DuckDBDriver" -n '' -p '' -e "select * from 'userdata1.parquet';"

Document `CREATE TEMP TABLE...`

I can not find any description on what a TEMP or TEMPORARY table is in CREATE TEMP TABLE... statements. Would be helpful with a brief description on this page https://duckdb.org/docs/sql/statements/create_table

One use case I have is that I consider it for "caching" aggregates:
create tmp as select user, count(*) v ... and then run queries like this select city, sum(v) v ... from tmp.

Is it materialised in memory? What if it does not fit in memory?
Is it removed automatically? When? How?
What is the intended use case for temporary tables?

Tutorials and How-To Guides

I found this guide on writing good documentation. It essentially advocates for splitting documentation into four distinct groups:

  • Tutorials
  • How-To Guides
  • References
  • Explanations

Most of the documentation we have is in the form of Reference, which, while useful, is perhaps not sufficient particularly for beginners.

What do you guys think about splitting the documentation into three separate sections -- Tutorials, How-To Guides and Reference. Most of the current material would go under the Reference section, and we could write up a few language-specific tutorials for getting started and how-to guides for accomplishing common tasks in each of these.

Thoughts?

create a node.js page in the client API section

The node module isn't mentioned in the documentation. I totally get that it's not as much a "data friendly" language as Python and R but there is a large class of data apps that could leverage node in the future. At the minimum a link to the GH readme would suffice.

Docs show 0.3.3 as latest release

Hey Folks!

It looks like the docs are still showing 0.3.3 as the latest release. How can I upgrade that to 0.3.4?

I searched through this repo for a way to increment the docs to the next version, but I couldn't find any past PR's that showed it.

Thanks!
-Alex

Question: Can I use the theme skeleton for my website?

Hello Duck DB team,

I really like the them and I would like to use it in order to build a website of my open source project. I tried to find license notes but I couldn't find any in this repository.

Do you think that I can use this project as boilerplate for my website where I will use only the layout but not the content (text and images).

If yes, then I can perhaps add a link to the original them,

Document the `storage_info` pragma

Hey folks!

On Discord, @hannes mentioned the storage_info pragma as a way to get the storage footprint of a table:

PRAGMA storage_info('my_table');

The columns provide a ton of info:

row_group_id,
column_name,
column_id,
column_path, 
segment_id,
segment_type,
start,
count, 
compression,
stats,
has_updates,
persistent,
block_id,
block_offset

I'm looking to calculate the storage size of a table, so I assume that means summing count here.

This would be an awesome pragma to document if possible! Very useful for those of us telling the machine to do a lot of big calculations in DuckDB.

Document DESCRIBE clause

Feedback to #117

The DESCRIBE clause is mentioned in the documentation but the command itself is not documented.

Example:

D create table t(x int primary key, y varchar);
D describe t;
┌───────┬─────────┬──────┬─────┬─────────┬───────┐
│ Field │  Type   │ Null │ Key │ Default │ Extra │
├───────┼─────────┼──────┼─────┼─────────┼───────┤
│ x     │ INTEGER │ NO   │     │         │       │
│ y     │ VARCHAR │ YES  │     │         │       │
└───────┴─────────┴──────┴─────┴─────────┴───────┘

(I believe the output above is not correct: x should be a Key)

Revise and extend WITH RECURSIVE examples

A while ago, I contributed the examples for WITH RECURSIVE (#158 #188)
https://duckdb.org/docs/sql/query_syntax/with

I have since found that these queries are quite inefficient, especially the bidirectional search if there is no path between the start and end nodes.

It would also want to add a proper bidirectional search algorithm, where the two BFS frontiers are advanced simultaneously from the start and the end node.

I don't have time to tackle this now, so I'm opening this issue as a reminder and will get back to it later this spring.

More Documentation on Rules for Identifying Parquet Files

Perhaps this is nothing more than my mis-reading of the website documentation, but I have a collection of parquet files that are saved with a '.parq' extension. It was only after significantly banging my head against the wall that I realized that Duck would not identify the extension as valid and was failing to interrogate the files.

con.execute(
"""
select *
from '/abs_path/2018_01_02.parq'
limit 10
""").fetch_df()

Returns RuntimeError: Catalog Error: Table with name /abs_path/2018_01_02.parq does not exist! LINE 3: from '/abs_path/2018_01_02.parq'

If I rename a file to .parquet extension, everything works as intended. This failure happened with both specific filenames and the glob syntax.

Not sure how common .parq is in the wild and if you need to support this, but it would be nice if the full rules for identifying queryable file extensions was more clear.

Document list functions

List functions such as unnest and string_split_regex are currently undocumented. An example for their use:

D insert into emails values (1, '[email protected];[email protected]'), (2, '[email protected]'), (3, '');
Error: Catalog Error: Table with name emails does not exist!
Did you mean "x"?
D create table emails(id int, addresses varchar);
D insert into emails values (1, '[email protected];[email protected]'), (2, '[email protected]'), (3, '');
D select id, unnest(string_split_regex(addresses, ';')) from emails;
D select id, unnest(string_split_regex(addresses, ';')) as email from emails;
┌────┬─────────┐
│ id │  email  │
├────┼─────────┤
│ 1  │ a@b.com │
│ 1  │ c@d.com │
│ 2  │ e@f.com │
│ 3  │         │
└────┴─────────┘

Repository size is very large

@Alex-Monahan Currently, the size of this repository is massive, approx. 1.2 GB. Even when cloning just the last commit, the resulting directory is 81 MB:

$ git clone --depth 1 [email protected]:duckdb/duckdb-web.git
$ du -hd0 duckdb-web
81M	duckdb-web

Here's a list of the top 50k largest files committed, generated with this script.

To nuke old commits, the BFG repository cleaner can be used. This will still require a force push and thus break the commit tree but there are not too many forks of this repository yet.

Improve documentation on CTEs

See duckdb/duckdb#2551
I'm "self-assigning" this since I'm working with recursive CTEs quite a lot.

  • Add hierarchical CTE example
  • Add generic graph query example
  • Check whether nested CTEs work (if not, document it)

The website does not have a link to the blog posts

I found the 'efficient SQL on pandas' blog post when someone posted it on Reddit. When I wanted to go back to it I could not find any links on the website that will take me to the blogs. If you want people to read the blog posts then I suggest you put a link to them on the home page.

"PREDECING" typo

Hi,

Thanks for an awesome package. Just a small typo that confused me more than it should as I was copy+pasting from examples:

https://duckdb.org/docs/sql/window_functions

SELECT points,
    SUM(points) OVER (
        ROWS BETWEEN 1 PREDECING
                 AND 1 FOLLOWING) we
FROM results

"PREDECING" should be "PRECEDING"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.