Coder Social home page Coder Social logo

arnekt-iecsil's Introduction

ARNEKT-IECSIL

Information Extractor for Conversational Systems in Indian Languages (IECSIL)

Natural Language Processing which branched as a subfield of Artificial Intelligence and Linguistics to deal with text data, that has been proven to be the leading form of generated datum amongst others in today's digital era, has evolved consistently from the time of its origin. Holding a place in various applications that utilize text datum, Information Extraction (IE) is one such area under constant research. Tremendous growth has been influenced in this area, by having applications in information search, question answering, document summarization and etc., to extract hidden information from text datum. With the inclusion of polysemy, ambiguity and unpredictability in natural text this process becomes a tedious task to handle. This is when the need to move from rule based approach to statistical approach came into picture.

Named Entity Recognition (NER) which is a statistical approach and subtask of Information Extraction (IE), was put forward as part of MUC Conference dated late 1970s. Being the subtask of Information Extraction, the purpose behind the usage of NER is to identify and classify the key words from the natural language digital texts. Key words signifies the contextual terms that conveys meaning of the actual sentence. These key words are referred to as entities (Eg: names of person, place, location, temporals and number expression) in NER, which first needs to be identified and then classified into their respective categories. These entities in turn hold a relationship amongst each other, such as Person and Organization with the relation 'works at' or 'owns', Person and Location with the relation 'lives at' and etc. Relation Extraction (RE) is the very next useful step in IE after NER. To be noted these relations would change from language to language and hence is a language dependent process.

By getting to know the basic facts and usage of IE with NER and RE as its key factors, it is necessary to know its efficacy in different languages. It is well known that IE works considerably well with English language from applications like Google search, frameworks like Stanford CoreNLP, OpenNLP and many more. The same does not hold well for Indian Languages due to its morphologically rich structure and agglutinative nature. The usage of inflectional words which are built-in the actual contextual words or phrases would make it difficult to extract the entity of interest. A single word or phrase would incorporate multiple terminologies within it such as the gender, tense and the actual keyword. Such complexities would make the process of Information Extraction even more wearisome when dealt with Indian Languages.

The above facets has eventually driven us to conduct a track ARNEKT - Information Extractor for Conversational Systems in Indian Languages (ARNEKT-IECSIL) in Forum for Information Retrieval and Evaluation (FIRE 2018) . The motive of this track would be to come up with a language independent model or framework that would support all Indian Languages in extracting information from the same. This would not only prove beneficial for IE but would also serve as a necessary block in other applications like Chat-bots, Personal assistance systems, Coreference resolution and other text classification tasks.

Motivated by the need of Information Extractor described above, we have the following two tasks:

Task A : Named Entity Recognition (NER)

Corpora for five Indian languages (Hindi, Tamil, Malayalam, Telugu and Kannada) would be provided. Task A is to identify and classify the named entities to one of the many classes.

Task B : Relation Extraction (RE)

Continuation to Task A, corpora with named entities for five Indian languages (Hindi, Tamil, Malayalam, Telugu and Kannada) would be provided. Task B is to extract the relation amongst the entities provided. Note :

It is not mandatory for participants to participate in all the languages and also in both the sub-tasks, but the final ranking would be based on the average of system performance on all five languages. For more details kindly refer Evaluation.

Participate in ARNEKT-IECSIL 2018 shared task to claim your chance of glory โ€“ the best people will be shortlisted for an interview process with ARNEKT Solutions Pvt. Ltd, Pune. To top it up, the top three teams would take exciting prizes with them and all the participants will be awarded with certificates.

arnekt-iecsil's People

Contributors

barathiganesh-hb avatar

Watchers

 avatar  avatar

arnekt-iecsil's Issues

Password for the zip files

@BarathiGanesh-HB Some of the zip files arnekt-iecsil-ie-corpus_test_2.zip and arnekt-iecsil-ie-corpus_train.zip require a password. Can you upload the password removed zipfiles or include the password for unzipping the files?

regarding Participation

Hi Bharathi ganesh,

i know i'm late to ask this question i just came to know about this competetion, i would like to participate in it but i'm not looking at any prizes or interview just wanted to participate in it, is there any way that i can get access to the data (password)

password of zip files

Hello,when i download the train corpus ,it asked for a password,
the following files asked for a password to unzip:

  • arnekt-iecsil-ie-corpus_train.zip
  • arnekt-iecsil-ie-corpus_test_2.zip

can you tell me the password?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.