Comments (4)
Sorry about that. I agree the documentation is pretty shoddy. What would you like to be able to do?
Did you look at https://github.com/dlwh/epic-demo ?
from epic.
My biggest problem I guess is to know what I can combine, what goes where and how things integrate with each other.
I'd like to know how I implement a simple feature to use when going through a text?
I'd like to know how to use multiple custom ones?
I've seen the Epic demos, and they all work
What are these representing?
preprocess?
- Do something with the data before running something on it, but what can be achieved here?
slab?
- A data source that you can do something with?
models?
- Reference to a set of features that can pick out certain things in a text? (pre build ones are language feature detectors?)
parser?
- Something that goes through the text to work out what is necessary?
trees?
- A representation of what words are, like noun and after that there's a verb etc?
sequences?
- Segment data to pick up if it is a set of two words or one?
Some of these concepts, I think it would be much easier to get started if they can be explained. Why they are there, and what I can do with them. If I'm looking for a certain feature, where should I look?
Might be a lot to answer, but I do think you got something useful here and I'd like to see it being developed further!
/Marcus
from epic.
Thanks. That is helpful.
At the moment, the internals of Epic (making features, etc) are kind of
targeted at people with a good bit of NLP ML expertise. Really some of the
external bits are too. I would like to make it more friendly, but it's a
long way from that, obviously.
On Sun, Dec 21, 2014 at 3:16 PM, Marcus Sjölin [email protected]
wrote:
My biggest problem I guess is to know what I can combine, what goes where
and how things integrate with each other.I'd like to know how I implement a simple feature to use when going
through a text?I'm not sure what you mean here?
I'd like to know how to use multiple custom ones?
Featurizers in Epic can be added together with the "+" operator to create
composite featurizers.
"Featurizers" turn a sentence into a set of features. I think you might
have a misconception about what I mean by features (which is the standard
ML terminology?), which is property of (part of) an input data point (like
a sentence) that can be used to predict the appropriate output.I've seen the Epic demos, and they all work
What are these representing?
preprocess?
- Do something with the data before running something on it, but what
can be achieved here?preprocess can:
-
segment sentences
val segmenter = MLSentenceSegmenter.bundled().get
segmenter.segment(text) -
Tokenize sentences into words and punctuation.
epic.preprocess.tokenize(sentence) -
Do both at once (epic.preprocess.preprocess) as demonstrated in the demo.
-
Extract content from arbitrary files or urls using Apache Tika
(epic.extractText(url))
slab?
- A data source that you can do something with?
Slabs hold annotations (parse trees, named entities, etc) for a text in a
uniform way. We're actually reworking them, so don't put a lot of effort
into learning them.models?
- Reference to a set of features that can pick out certain things in a
text? (pre build ones are language feature detectors?)Something like that. Models refer to the result of a machine learning
algorithm, with a featurizer, some weights, and a dynamic program which can
build structures over a text, like (I overload terminology and sometimes
use "model" to mean everything except the weights.)parser?
- Something that goes through the text to work out what is necessary?
Parsers produce parse trees, as below.
trees?
- A representation of what words are, like noun and after that there's
a verb etc?That and how the words are related to one another: what are the noun
phrases in a sentence, what verb has what object, etc.
http://en.wikipedia.org/wiki/Parse_tree
If you didn't know what these were going in, they will probably not be
useful to you---I'm working in the background on a format that's more
useful to laymen, but it will be some time.
sequences?
- Segment data to pick up if it is a set of two words or one?
There are two kinds of predictions we have under sequences: something that
assigns a label to every word (e.g. part of speech tags like noun, verb,
etc), and those that assign a label to disjoint contiguous sequences of
words (e.g. which phrases are people, places, or things.)Some of these concepts, I think it would be much easier to get started if
they can be explained. Why they are there, and what I can do with them. If
I'm looking for a certain feature, where should I look?Might be a lot to answer, but I do think you got something useful here and
I'd like to see it being developed further!/Marcus
—
Reply to this email directly or view it on GitHub
#18 (comment).
from epic.
Thanks! That was really helpful, I think these answers were what I needed to grasp how things are connected. I now see more clearly how the process from input to output should be formed and what I can use in between. Thanks a lot!
Good going with the library as well, there seem to be a lot of work put into this.
/Marcus
from epic.
Related Issues (20)
- Compilation failed HOT 3
- EpicSeqDemo doesn't compile
- correct models for 0.4-SNAPSHOT HOT 2
- "Parsing" with gold segmentation HOT 2
- Can't build - module not found: org.scalanlp#breeze_2.10;0.12-SNAPSHOT HOT 2
- org.scalanlp#breeze_2.11;0.12-SNAPSHOT: not found HOT 1
- Where is the parsing model?
- Serialization with Epic and Breeze Dependencies
- Exception in thread "main" java.lang.NullPointerException
- at org.mapdb.Volume$ByteBufferVol.getLong(Volume.java:300) HOT 1
- Lacks documentation HOT 1
- Exception in thread "main" java.lang.IllegalAccessError: DB has been closed
- Publish epic-parser-en-span for scala 2.12 HOT 3
- en-sent-segmenter.model.ser.gz is incorrectly deserialized in scala 2.12
- Implementation of CRF parser in another language HOT 1
- Dependencies Old! Bug HOT 2
- Couldn't deserialize model
- Failed to train CTB
- github code is not aligned with what published on Maven?
- POS tagging fails on word "1stgeneration" with java.lang.AssertionError HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from epic.