ferrerojeremy / ln2sql Goto Github PK
View Code? Open in Web Editor NEWA tool to query a database in natural language
License: GNU General Public License v3.0
A tool to query a database in natural language
License: GNU General Public License v3.0
If you've just arrived and don't want to spend too much time setting up and testing this project for your specific use case, take a look at the following example cases where it didn't work as expected.
Question | SQL Query |
---|---|
What is the id of the client with the username jdoe? | SELECT client.id FROM client WHERE client.username = 'OOV'; |
How many clients have made a reservation with a rating of 5? | SELECT COUNT(*) FROM client; |
What is the name of the client who made the most reservations? | SELECT client.name FROM client; |
How many clients have made a reservation at more than 1 hotel? | SELECT COUNT(*) FROM client; |
How many reservations were made by clients whose mobile number contains the digits 555? | SELECT COUNT(*) FROM reservation; |
What is the highest-rated hotel in the database? | SELECT * FROM hotel; |
What is the lowest-rated hotel in the database? | SELECT * FROM hotel; |
How many clients are over the age of 30? | SELECT COUNT(*) FROM client WHERE client.age = 'the'; |
What is the username of the oldest client in the database? | SELECT client.username FROM client; |
What is the username of the youngest client in the database? | SELECT client.username FROM client; |
How many reservations were made by clients under the age of 25? | SELECT COUNT(*) FROM reservation INNER JOIN client ON reservation.clientId = client.id WHERE client.age = OOV; |
Few more with relations:
Question | SQL Query |
---|---|
What are the name of all the hotel where the client with id is 123 made reservation? | SELECT hotel.name FROM hotel WHERE hotel.id = ‘123’; |
How many client have the same name as their username? | SELECT COUNT(*) FROM client WHERE client.name = OOV AND client.username = OOV; |
What is the username of the client with the most reservation? | SELECT client.username FROM client; |
What is the name of the hotel with the most reservation? | SELECT hotel.name FROM hotel; |
How many client have made reservation on the same date as another client? | SELECT COUNT() FROM client; SELECT COUNT() FROM client; |
What is the total number of reservation made at hotel with id is 777? | SELECT COUNT(*) FROM reservation WHERE reservation.id = ‘777’; |
What are the names of all the hotel where the client with id is 999 made reservation with a rating of 4 or higher? | SELECT * FROM hotel WHERE hotel.id = ‘999’; |
What is the age distribution of client who made reservation at hotel with id is 333? | SELECT client.age FROM client WHERE client.id = ‘333’; |
What is the name of the client who made the highest-rated reservation? | SELECT client.name FROM client; |
How many client have a username that contains the word “hotel”? | SELECT COUNT(*) FROM client WHERE client.username = OOV; |
What is the id of the client with the highest-rated reservation at hotel with id is 222? | SELECT client.id FROM client WHERE MAX(client.id) = ‘222’; |
How many reservation were made by client whose usernames contain the letter “a”? | SELECT COUNT(*) FROM reservation; |
What is the average rating for reservation made by client over the age of 40? | SELECT AVG(reservation.rating) FROM reservation INNER JOIN client ON reservation.clientId = client.id WHERE client.age = ‘the’; |
Credits: @harshadk-sourcefuse for testing these out.
Hello everyone,
Is there a way to fix a default column for a table when no column is specified.
For example, if I have a table "Bus", its default colum its "plate", so I can ask "Number of routes of bus 'BR1234'" instead of "Number of routes of bus with plate 'BR1234'".
Thanks to all people that worked on this amazin project.
Currently it works for MySQL dbs. It would be nice to make it work for postgres / Redshift / BigQuery.
Hi everyone,
I'm facing difficulty in understanding how exactly key word parsing to map Natural language(NL) query words and database keywords is done.It would be greate if you can breifly explain me how exactly In2sql modue searches and maps words from NL query to database values like column names and cell values. Meanwhile I wish to know whether we can use a data structure like hash table for this problem as defined below
To create a map between columns and table
Columns | Table_names
Column1 | Table_name1
Column2 | Table_name2
So to check if any word from NLP querry is a column name just check whether word is a key of hash table defined above
And for column value or cell value create a hash table like
Values | Column_names
Value1 | Column1
Value2 | Column1
.
.
Valuen | Column_m
where there will be m columns and n values in total in database. So to check whether a word from NLP query is a cell value, check if the key exists in above hash table. Furthermore, in case of cell value repeating in more than one columns then use chaining concept for hash table's key.
I am sorry, I am not getting how to use this tool.
Can anybody tell me the uses or steps from scratch so that I can understand and use this feature.
Unable to use timestamp, for example:
"what is minimum monoxide in pollution_data when time_stamp equals 2017-03-23"
my timestamp has both time and date.
Also tried using:
"what is minimum monoxide in pollution_data when extract(month from time_stamp) equals 3"
datatypes are:-
time_stamp
timestamp with time zone,
monoxide
double precision,
P.S.: not using real names of columns and table for privacy reasons
There is an error while starting the project in python3
In tables.sql.zip, the following lines are ignored during the parsing.
ALTER TABLE `city`
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`);
ALTER TABLE `emp`
ADD PRIMARY KEY (`id`),
ADD KEY `cityId` (`cityId`);
ALTER TABLE `emp`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT, AUTO_INCREMENT=31;
ALTER TABLE `emp`
ADD CONSTRAINT `emp_ibfk_1` FOREIGN KEY (`cityId`) REFERENCES `city` (`id`);
I am having trouble getting the thesaurus support working. Here's a command I'm running in order to test the pupil -> student rule in th_english.dat:
python2.7 ln2sql.py -d database/school.sql -l lang/english.csv -j output.json -t thesaurus/th_english.dat -i "Count how many pupil are there with firstname is John"
Although it works for the word "student", it outputs this error with the word "pupil" (as shown above):
ParsingException.ParsingException: Error: No table name found in sentence!
Am I invoking it incorrectly? I'm guessing this is simply a configuration mistake on my part, but I can't figure out what's wrong with the above command.
Hi, I tried to use thesaurus, but I'm getting an issue saying "argument of type 'NoneType' is not iterable".
Could anyone help on this?
Thanks!
Hi Team,
While running sample, "What is the average age of students whose name is Doe or age over 25?" it gives error "No table name found in sentence".
python -m ln2sql.main -d database_store/school.sql -l lang_store/english.csv -j output.json -i "What is the average age of students whose name is Doe or age over 25?"
It seems keyword "students" is not matching in database "school.sql".
Regards,
Dhaval
When running gui this error came.
Traceback (most recent call last):
File "ln2sql/ln2sql_gui.py", line 6, in
from .ln2sql import Ln2sql
SystemError: Parent module '' not loaded, cannot perform relative import
Any Solution?
Unable to form query containing multiple Group by. Is it supported?
If yes, what should be the input sentence syntax?
I think a huge sign should be placed in the readme file to inform users that this doesn't work with Postgres databases.
I spent hours trying to figure out why I was getting this traceback:
"AttributeError: 'NoneType' object has no attribute 'group'"
before going through the issues to discover that Postgres isn't supported.
The following entry python ln2sql.py -d database/tal.sql -l lang/french.csv -i "combien y a t'il d'élève dont le nom est EMMAR ?"
returns :
SELECT COUNT(*)
FROM eleve
WHERE eleve.nom = OOV;
but python ln2sql.py -d database/tal.sql -l lang/french.csv -i "Combien y a t'il d'élève dont le nom est EMMAR ?"
returns :
SELECT *
FROM eleve
WHERE eleve.nom = OOV;
Hi Team,
In normal world, we usually use business names while writing.
These names may or may not be same as technical name of columns in databases.
For e.g. "idStudent" is usually written as "student id"
Need to have feature to use alias for such colums.
Regards,
Dhaval
When installing the package for python3 it installs successfully
$ pip3 install ln2sql
Collecting ln2sql
Installing collected packages: ln2sql
Successfully installed ln2sql-0.1
but module can't be loaded
$ python -m ln2sql
venv/bin/python: No module named ln2sq
and the package is empty
$ ls -a venv/lib/python3.7/site-packages/ln2sql-0.1.dist-info/
. INSTALLER RECORD top_level.txt
.. METADATA WHEEL
Is this expected to be working? If so how is it meant to be used?
could you please put the school.sql in datasetbase ?
Is there an English version of the french scientific paper? I don't speak french but I want to read that paper.
thank you
As mentioned in the documentation joins are supported but I can't seem to generate a SQL with JOIN. I have tried with a sql dump with 2 tables with proper foreign key and primary keys.
SQL dump : tables.sql.zip
Please provide an example natural language query for JOINs
I have problems dealing with the between command. And also there is a problem when you use spaces in the equivalences of a word can some one help me
Currently the only way to run is to clone and execute. Will convert to a python package
https://python-packaging.readthedocs.io/en/latest/
A punctuation sign in end of sentence (like in english) can distort the detection of the last value of the WHERE.
This sentence:
Count how many city there are where the score is greater than 2?
give:
SELECT COUNT(*)
FROM city
INNER JOIN emp
ON emp.cityId = city.id
WHERE emp.score > '2?';
#58 i am facing some problem to solve this issue when i run this code it accepts only exact table name for example we have table whose name is 'class' and when i write table name like (Class,CLASS,classes) it is not giving me the output although i am using [flags=re.IGNORRECASE] how to solve this issue please help me.
def create_table(self, table_string):
lines = table_string.split("\n")
table = Table()
for line in lines:
if 'TABLE' in line:
table_name = re.search("`(\w+)`", line,flags=re.IGNORECASE)
table.name = table_name.group(1)
if self.thesaurus_object is not None:
table.equivalences = self.thesaurus_object.get_synonyms_of_a_word(table.name)
elif 'PRIMARY KEY' in line:
primary_key_columns = re.findall("`(\w+)`", line,flags=re.IGNORECASE)
for primary_key_column in primary_key_columns:
table.add_primary_key(primary_key_column)
else:
column_name = re.search("`(\w+)`", line,flags=re.IGNORECASE)
if column_name is not None:
column_type = self.predict_type(line)
if self.thesaurus_object is not None:
equivalences = self.thesaurus_object.get_synonyms_of_a_word(column_name.group(1))
else:
equivalences = []
table.add_column(column_name.group(1), column_type, equivalences)
return table
def alter_table(self, alter_string):
lines = alter_string.replace('\n', ' ').split(';')
for line in lines:
if 'PRIMARY KEY' in line:
table_name = re.search("TABLE `(\w+)`", line,flags=re.IGNORECASE).group(1)
table = self.get_table_by_name(table_name)
primary_key_columns = re.findall("PRIMARY KEY \(`(\w+)`\)", line,flags=re.IGNORECASE)
for primary_key_column in primary_key_columns:
table.add_primary_key(primary_key_column)
elif 'FOREIGN KEY' in line:
table_name = re.search("TABLE `(\w+)`", line,flags=re.IGNORECASE).group(1)
table = self.get_table_by_name(table_name)
foreign_keys_list = re.findall("FOREIGN KEY \(`(\w+)`\) REFERENCES `(\w+)` \(`(\w+)`\)", line,flags=re.IGNORECASE)
for column, foreign_table, foreign_column in foreign_keys_list:
table.add_foreign_key(column, foreign_table, **foreign_column)**
It's not a good practice to loop all different test cases in one tests. Grouping into categories like:
test_select()
test_select_where()
gives more context and allows adding tests more easily and detecting issues more easier. As I have already moved to pytest, just need to break it to chunks
It's not a good practice to loop all different test cases in one tests. Grouping into categories like:
test_select()
test_select_where()
gives more context and allows adding tests more easily and detecting issues more easier. As I have already moved to pytest, just need to break it to chunks
This has multiple parts:
a) It should be easy to pip install to use.
i.e I just want to setup the sql file to use once (generally you don't have to change that once setup) and just query as:
query = Ln2sql.to_query(sentence)
b) Thinking about reasonable defaults. eg. english as default, specifying the db file once when initialising the class and then doing as in a)
Hi there
I did not understand where and how you are using Thesaurus in ln2sql . Plz suggest
Thanks in advance
I am unable to replicate the example from README but with -t thesaurus_store/th_english.dat
flag (works fine without it):
python -m ln2sql.main -d database_store/city.sql -l lang_store/english.csv -t thesaurus_store/th_english.dat -i "Count how many city there are with the name blob?"
Traceback (most recent call last):
File "c:\users\user\appdata\local\continuum\anaconda3\Lib\runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "c:\users\user\appdata\local\continuum\anaconda3\Lib\runpy.py", line 85, in _run_code
exec(code, run_globals)
File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\main.py", line 26, in <module>
main()
File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\main.py", line 23, in main
).get_query(args.sentence)
File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\ln2sql.py", line 44, in get_query
queries = self.parser.parse_sentence(input_sentence, self.stopwordsFilter)
File "c:\users\user\desktop\projects\natural_language_sql\ln2sql-master\ln2sql\parser.py", line 711, in parse_sentence
input_word_list[i] in self.database_object.get_table_by_name(table_name).equivalences):
TypeError: argument of type 'NoneType' is not iterable
Python 3.6.1
Windows 10
ln2sql version: 0.2
My question was : "what is name in emp with the highest score?"
and the answer was :
SELECT emp.name
FROM emp
WHERE MAX(emp.score) = OOV;
Seems that subqueries are not taking into account. Am I wrong ?
When I moved the project to python3, I missed adding some backwards compatibility checks. Need to convert so it supports both python2.7+ and python3.5+
Exemple of correct declarations of pk and fk but unsupported by ln2sql :
ALTER TABLE `city`
ADD PRIMARY KEY (`id`),
ADD KEY `id` (`id`);
ALTER TABLE `emp`
ADD CONSTRAINT `emp_ibfk_1` FOREIGN KEY (`cityId`) REFERENCES `city` (`id`);
Hi everyone,
I am facing difficulty in forming a question based on joins. How to write a question to access multiple tables using joins?Is there any chance?
Can you please provide an example with school.sql file available in the database_store
The english statement 'What is the name of the professor who is teaching the field Algorithm ?' does not work. There comes an error saying 'there is at least one column 'field' which is not reachable from 'professor'.
The out put should be a simple query which should not be using a join. The example uses the city data model. Please help me out with this.
Command:
python3 -m ln2sql.main -d database_store/city.sql -l lang_store/english.csv -j output.json -i "Count how many city there are with the name blob?"
Generated sql:
SELECT COUNT(*)
FROM city
INNER JOIN emp
ON city.id = emp.cityId
WHERE emp.name = 'with';
All the values in NLP is detected as OOV in sql query. Is there any technique to identify the values in SQL generation
The following issue raises when I try to load the sql dump file.
if 'TABLE' in line:
111 table_name = re.search("(\w+)
", line)
--> 112 table.name = table_name.group(1)
113 if self.thesaurus_object is not None:
114 table.equivalences = self.thesaurus_object.get_synonyms_of_a_word(table.name)
AttributeError: 'NoneType' object has no attribute 'group'
Kindly suggest a fix to this issue.
@rupinder1133 if you feel like it 😊
The query given in the Readme works fine, but if stopwords path is provided the query output is wrong?
The English stopword file has words like 'name' and 'eighty' which will definitely remove essential words from the sentence.
But using the nltk.corpus.stopwords also makes no difference and gives the same wrong query output.
Currently the classes (eg. column.py) are quite Java like with setters and getters. Need to check if all values are mutable and if not we can just use:
@Property
and
@.setter instead.
setup.py install
python ln2sql_gui.py
Error:-
D:\work_parts\j\ln2sql\ln2sql>python ln2sql_gui.py
Traceback (most recent call last):
File "ln2sql_gui.py", line 6, in <module>
from .ln2sql import Ln2sql
ModuleNotFoundError: No module named '__main__.ln2sql'; '__main__' is not a package
Hi,
I have read the paper and checked out the library, and I am considering using a similar method to the one used here to enable users to query a database with a natural language.
I saw in the other issue that you are no longer continuing this project, could you tell me what is the reason?
Is it due to performance or accuracy reason?
Or maybe you have find some specific limitations that make this "non machine learning" approach not useable in real life?
One of the limitation I could see would be that some words that are not a noun actually dictate the table/colmun values should be selected from (for example, "who takes the biology class?" -> "SELECT * FROM student WHERE ...").
Based on your experience with this approach, would you advice for or against using a similar technique in production?
Thank you in advance for the reply!
I have not completely inspected this library, it looks very good and interesting, I would like to know some very simple instructions on how to add the Spanish language.
I guess it's the same way you add other languages, how could you do it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.