joestelmach / natty Goto Github PK
View Code? Open in Web Editor NEWJava natural language date parser
Home Page: http://natty.joestelmach.com/
License: MIT License
Java natural language date parser
Home Page: http://natty.joestelmach.com/
License: MIT License
When parsing a day of the week such as "Sunday at 10am", Natty will return this coming Sunday until that day begins at which point it rolls forward to the next Sunday. However, if I'm running a site that uses Natty on the U.S. east coast, Sunday begins 3 hours before it does on the west coast. Therefore, if at 10pm on a Saturday night a west coast customer enters "Sunday 10am" (expecting to get back the next day), Natty will actually return Sunday the following week.
Another problem is when "tomorrow" is used. If my local timezone is set to UTC and I have a parser configured for pacific time, on June 4th at 9pm pacific I would expect it to parse "tomorrow" as June 5th at 9pm PST (or June 6th at 4am UTC). Instead, it parses to June 6th at 9pm PST (or June 7th at 4am UTC).
I think that this could be solved if the parser would consider the time in the customer's time zone (as passed to the parser's constructor) rather than it's default time zone. Therefore, if a parser was instantiated using the pacific time zone, it would continue to return this coming Sunday right up until it became Sunday on the west coast.
Fried Chicken, Wedding Dinner (all these words get parsed as FRI ed Chicken and WED ding Dinner)
Would be good to internationalize this to accept non-US date formats. A UK formatted date, like 22/10/1986, does not parse. DateJS (a similar library for Javascript) can handle dates like that, so there might be something you can re-use.
examples:
FAILS: Watch School Spirits on June 20 on syfy channel
FAILS: Watch School Spirits on June 20 on
MATCHES: Watch School Spirits on June 20
for (Object component : _holidayCalendar.getComponents(VEVENT)) {
Component vevent = (Component) component;
String summary = vevent.getProperty(SUMMARY).getValue();
if(summary.equals(eventSummary)) {
PeriodList list = vevent.calculateRecurrenceSet(period);
for(Object p : list) {
DateTime date = ((Period) p).getStart();
// this date is at the date of the holiday at 12 AM UTC
Calendar utcCal = CalendarSource.getCurrentCalendar();
utcCal.setTimeZone(TimeZone.getTimeZone(GMT));
utcCal.setTime(date);
// use the year, month and day components of our UTC date to form a new local date
Calendar localCal = CalendarSource.getCurrentCalendar();
localCal.setTimeZone(_timeZone);
localCal.set(Calendar.YEAR, utcCal.get(Calendar.YEAR));
localCal.set(Calendar.MONTH, utcCal.get(Calendar.MONTH));
localCal.set(Calendar.DAY_OF_MONTH, utcCal.get(Calendar.DAY_OF_MONTH));
holidays.put(localCal.get(Calendar.YEAR), localCal.getTime());
}
}
}
_holidayCalendar.getComponents dont take parameters
also vevent.getProperty(SUMMARY)
and calculateRecurrenceSet(); function is not exist
recognize dates such as 'in the end of June', or 'the beginning of april'
Tried a date like '10th of next month' and it didn't parse. Great work!
Hey,
the problem is that date strings in different countries have different meaning. I'm refering e.g. to the following issues #3, #20 and #21 but there are a lot more cases of course.
Now the problem is that the timeZone cannot and should not be used as locale initializer. Because 1. there are different countries in the same time zone but more important: 2. there are German speaking people in the US or english ones in Germany etc. So a new locale attribute for the Parser class needs to be introduced where often a completely different parsing schema is necessary.
First, thanks for sharing the code, very nice lib!
I tried to parse "in october 2006" on http://natty.joestelmach.com/try.jsp, but Natty did not match the 2006, am I missing something?
Thanks, Renaud
Adding the time to any date other than the last one in an "and" concatenated list of dates, causes dates to be lost. Examples are shown below.
"June 25th and July 2nd and August 16th" works properly returning:
Mon Jun 25 00:23:58 UTC 2012
Mon Jul 02 00:23:58 UTC 2012
Thu Aug 16 00:23:58 UTC 2012
"June 25th at 9am and July 2nd at 10am and August 16th at 11am" is broken returning:
Mon Jun 25 09:00:00 UTC 2012
Mon Jul 02 10:00:00 UTC 2012
"June 25th at 10am and July 2nd and August 16th" as well as "June 25th and July 2nd at 10am and August 16th" is broken returning:
Mon Jun 25 10:00:00 UTC 2012
Mon Jul 02 10:00:00 UTC 2012
"June 25th and July 2nd and August 16th at 10am" works properly returning:
Mon Jun 25 10:00:00 UTC 2012
Mon Jul 02 10:00:00 UTC 2012
Thu Aug 16 10:00:00 UTC 2012
It would be helpful to support a setting that indicates all dates refer to today or later. For example, if today is June 15th and I parse a date of "January 5th", Natty will return a date in the past (January of this year). However, if I'm creating an app to make hotel reservations, past dates don't make any sense. If a customer tries to reserve January 5th, they obviously mean the following rather than the previous January. Therefore, I'd like to suggest adding a constructor to Parser with this signature:
Parser(TimeZone timeZone, boolean futureDatesOnly)
Or another parse() method signature like this:
List parse(String value, boolean futureDatesOnly)
This would allow us to give Natty a hint if we know that all parsed dates should be in the future.
Thanks!
"something september 7th" parses correctly
"september 7th something" cannot be parsed
"something happend here september 7th" parses correctly
"september 7th something happened here" cannot be parsed
If I wanted to use natty as a natural language interface for event creation on a calendar this is a deal breaker.
Following the example of Chronic gem for Ruby, it would be great for the library to have an option to return a date range based on the precision of entry. Here are some examples of what inputs/outputs would look like:
"Jan 20, 2010 9:10pm" => "Jan 20, 2010 9:10:00pm" to "Jan 20, 2010 9:10:59pm"
"Jan 20, 2010 9pm" => "Jan 20, 2010 9:00:00pm" to "Jan 20, 2010 9:59:59pm"
"Jan 20, 2010" => "Jan 20, 2010 00:00:00am" to "Jan 20, 2010 11:59:59pm"
"Jan 2010" => "Jan 1, 2010 00:00:00am" to "Jan 31, 2010 11:59:59pm"
"2010" => "Jan 1, 2010 00:00:00am" to "Dec 31, 2010 11:59:59pm"
I'm trying to understand the collection returns in the API and tested "each sunday in march", which did not parse. Should it have?
Basically I want to get time after 12 hours and 30 minutes from now.
Hello,
I've been looking at frameworks for natural language date parsing, and natty seems to be the most promising one I've found. I've also looked closely at DateJS, Chronic, and Wolfram|Alpha.
My thoughts on those:
So I'd be awesome to see these features implemented in natty:
Please let me know if these feature requests are practical/doable.
Thanks for the great framework!
Oz
example: the friday after next
in early 2002
early in 2002
first nine months of the financial year
first quarter of this year
first quarter of 2012
nineteen ten
nineteen twenty three
between 1980 and 1981
from 1980 to 1981
1980/81
the early 90s
the late 80s
1900 AD
5-20 Jan
Jan 5-20
July, 2000
10-July 00
10-July
Wed 10-July-00
July/99
10/July
Wed, 10/July/00
seven thirty tomorrow
twenty six minutes to twelve
ten to twelve
half past ten
ten o'clock
Parser.java is using a constructor of the generated DateParser.java that does not seem to exist.
Line 131 in Parser.java: DateParser parser = new DateParser(stream, listener);
The error is that there is no constructor which takes a ParseListener object
If i am parsing "next week" or "next month" using natty it will give an absolute date time of next week or next month.
next week / next month , it can be any date time in next week / next month.So it is very meaningful if we say the "from date time" and "to date time" which comprises the range.
What about adding a week/year/month/quarter/first half span like feature , if its not literally possible to map to a particular absolute date time.
A similiar fetaure like "monday to friday" which outputs two dates.
Eg :- 8th month last year , points to the 8th month of last year , ranging from 8th month first date to 8th month last date.
Similar relative dates with range
2012
2 weeks ago
2 weeks before now
2 years before now
4 months from now
4 weeks from now
4 years from now
8th month last year
8th month next year
8th month this year
first half of next year
early October
for 4 days
for 4 hours
for 4 minutes
for 4 months
for 4 seconds
for 4 weeks
for 4 years
jan 1 to 2
last month
last October
last june
last week
last year
march
next 10 years
next month
next october
next week
next year
this week
this year
this month
I'm working with a project where access to the parse locations is necessary, but we'd like the logs not be cluttered when an invalid date is entered.
Perhaps it should be possible to access the parse location even when debug is disabled?
Hi,
it would be great if natty would parse german dates too.
A german date has the form dd.mm.yyyy or dd.mm.yy or sometimes d.mm.yy(yy)
Over 20 countries use this date format too, please refer to
http://en.wikipedia.org/wiki/Date_format_by_country
Thanks in advance,
Sebastian
I guess the milliseconds cause the problem, because without the millisecond phrase it works perfectly.
I would be very glad, if you could fix this, because some google project (blogspot) use this format frequently.
Thanks in advance,
Akos
Hey, why is this a static method? Wouldn't it simplify tests etc if there is a third public constructor?
Parser p = new Parser(timeZone, referenceTime);
I tried to parse some natural language strings such as:
Fri., Sept. 23, 1-9 p.m., Sat., Sept. 24, 10:30 a.m.-9 p.m. and Sun., Sept. 25, 10:30 a.m.-5 p.m.
Sat., Sept. 24, 10:30 a.m.-9 p.m.
Sun., Sept. 25, 10:30 a.m.-5 p.m.
Fridays, 8:30 p.m. Continues through Sept. 30
Through Nov. 29
Just "trying it out" on December 31, 2012. I typed in "next friday" and got back "January 11, 2013". Then, I typed in "this friday" and got back "January 6, 2012".
Today is Monday, by the way.
It would be nice, for my application, to be able to tell the difference between the user specifying a date, and the user specifying a date-time. WalkerState is already calculating the information I want, as _timeGivenInGroup, it's just not exposing it anywhere I can get at it.
If I parse '3pm' I get a date with the time of 1pm. All parsed times are two hours behind in the returned parsed date. Probably timezone problem somewhere but not sure where.
Version 0.2.1
It worked when I preselect from a larger text input the actual date string, "Fri Mar 11 2011 9:38" for example.
But it fails when the date is surrounded by some text noise, like:
"Posted on Fri Mar 11 2011 | 9:38"
Brute force would be to try every combination till it succeeds. The relevant tokens might not be consecutive as in the example. I was wondering if there is reliable techniques to select the potential candidate. Supposing I have this preselection, together with Natty parser, it would pretty much solve my problem.
BTW, in Ruby built-in date library, it manages to parse the date, but not the time:
irb(main):001:0> require 'date' => true irb(main):002:0> format = "%m/%d/%Y %I:%M%p" => "%m/%d/%Y %I:%M%p" irb(main):003:0> date = Date.parse("Posted on Fri Mar 11 2011 | 9:38") => # irb(main):004:0> date.strftime(format) => "03/11/2011 12:00AM"
but it finds me dates out of non date formats:
irb(main):005:0> date = Date.parse("090 is a non date format.") => # irb(main):006:0> date.strftime(format) => "03/31/2011 12:00AM"
and Natty would give me, for "090", a "time" token:
DATE_TIME_ALTERNATIVE ----DATE_TIME -------- --------[@0,0:1='09',<56>,1:0], --------resync=090>
"now" is not a recognized word.
natty 0.2.1
Great work by the way... however, im trying to parse the date:
March 9-13, 2012
but it only recognises March 9th.
If you change it to
March 9 to 13, 2012
then it will pick out March 9th, and March 13th, but it will use the year as a time. so you get
March 9th 2012 20:12 and March 13th 2012 20:12.
Also, is there anyway that we can restrict the type of dates/times it looks for? for example, i dont want to look for relative dates, but does this require a complete re-creation of the parser from the grammar (without relative rules)? or is there some way to turn rules on and off?
Thanks
Alistair
A great feature which Chronic features but Natty lacks is the possibility to search for a date within a string. So, "blah blah blah at 5th of December blah blah" would evaluate to "5th of December"
The case of "bla bla bla 2 and 4 month" causes the parser to hang and never complete,.
Sometimes it is more helpful for an unspecified time to be treated as midnight, not now.
First off, this is great. Thanks you.
A few requests/thoughts on features:
3 days before
2 weeks before
6 months before
6 years before
All these expressions results an absolute date with effect "after"
For sentences containing holiday names such as Easter, Summer, Christmas appear to throw this exception for natty-0.5 and above.
The files it claims it can't access are available and on the classpath
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
at net.fortuna.ical4j.util.Configurator.<clinit>(Configurator.java:51)
at net.fortuna.ical4j.data.CalendarParserFactory.<clinit>(CalendarParserFactory.java:62)
at net.fortuna.ical4j.data.CalendarBuilder.<init>(CalendarBuilder.java:122)
at com.joestelmach.natty.WalkerState.getDatesForHoliday(WalkerState.java:535)
at com.joestelmach.natty.WalkerState.seekToHoliday(WalkerState.java:412)
at com.joestelmach.natty.generated.DateWalker.seek(DateWalker.java:1288)
at com.joestelmach.natty.generated.DateWalker.relative_date(DateWalker.java:694)
at com.joestelmach.natty.generated.DateWalker.date(DateWalker.java:636)
at com.joestelmach.natty.generated.DateWalker.date_time(DateWalker.java:550)
at com.joestelmach.natty.generated.DateWalker.date_time_alternative(DateWalker.java:489)
at com.joestelmach.natty.generated.DateWalker.parse(DateWalker.java:358)
at com.joestelmach.natty.Parser.singleParse(Parser.java:150)
at com.joestelmach.natty.Parser.parse(Parser.java:75)
Caused by: java.lang.ClassNotFoundException: org.apache.commons.logging.LogFactory
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 17 more
Any relative date greater than "31 years ago" generates a parser exception:
line 1:2 no viable alternative at input ' '
com/joestelmach/natty/generated/DateWalker.g: node from line 0:0 required (...)+ loop did not match anything at input '32 years ago'
RecurrenceTest fails to call initCalendarAndParser(), and thus NullPointerExceptions.
Here's some code to demonstrate the problem.
TimeZone.setDefault(TimeZone.getTimeZone("Etc/UTC")); //set default time zone to UTC so all dates are stored in UTC
Parser parser = new Parser(TimeZone.getTimeZone("GMT-06:00")); //set parser to use Mountain Time
parser.parse("July 1st at 10am");
This returns a date with this value:
2012-07-01 16:00:00 UTC
This is the correct time. Converting UTC to Mountain Time results in July 1st at 10am as requested. However, removing the time like this:
parser.parse("July 1st");
Results in the following date (when run at 7:03 Mountain Time):
2012-07-01 01:03:49 UTC
This is not correct. When this date is converted to mountain time, it becomes a time on June 30th instead of July 1st as requested. Since the current time is used when no time is specified, please fix it so that the parser uses the current time in the specified time zone (the time zone passed to its constructor rather than the default one).
Thanks!
"wednesday at noon" works great.
" wednesday at noon" fails to parse.
the parser is tolerant of trailing spaces, but doesn't seem tolerant of leading spaces.
Hi,
natty parses a date like 07-10-2011 as 10th July 2011 which is wrong. It is the 7th of October. The format DD-MM-YYYY is used in France, India, Ireland or Slovakia. The format MM-DD-YYYY does not exist, not even in the US. Please refer to
http://en.wikipedia.org/wiki/Date_format_by_country and
http://en.wikipedia.org/wiki/Date_and_time_notation_in_the_United_States
please keep in mind that only the US date starts with the month, followed by day and year. All other countries use a date starting with the day, followed by month and year. The format mm/dd/yyyy should therefore be an exception, not the default in natty.
Thanks in advance,
Sebastian
Implement relative times such as '5 minutes ago', or '2 hours from now'.
Hi,
I am not sure how complex it will be, but will it be possible to recognize season keywords(like fall, spring) as well as hollidays and important days like new year, thanksgiving?
Thanks
Hello.
It would be useful if I could override the current time / Calendar that Natty uses when parsing dates. As an example of why this would be useful:
I create a document with date text "tomorrow" at 11:59PM today, but don't get around to parsing it until 12:01AM tomorrow. I'd like to be able to tell Natty that this date should be parsed as if the current time were 11:59PM yesterday.
Thank you for this library!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.