Coder Social home page Coder Social logo

lapalme / arpi_eccc Goto Github PK

View Code? Open in Web Editor NEW

This project forked from rali-udem/arpi_eccc

0.0 0.0 0.0 2.16 MB

Weather bulletin generator using data from the Industrial Problem Solving Workshop 2021, Environment and Climate Change Canada

License: GNU General Public License v3.0

Python 99.67% JavaScript 0.33%

arpi_eccc's Introduction

A prototype text generator for Weather Bulletins

This project was prompted by an exercise suggested by Environment and Climate Change Canada (ECCC) in the context of the Industrial Problem Solving Workshop 2021. Its goal was to develop a weather bulletin generator using a neural approach from the information contained in the Meteocode, an specialized data format developed by ECCC.

Before the workshop, Guy Lapalme and Fabrizio Gotti created a jsonl version of more than 200K bulletins made available in MeteoCode by ECCC. They also developed an API for managing time zones and for evaluation using the BLEU metric. They are described in this document.

During the workshop, a neural text generator was developed only for temperature.

Before the workshop, Guy Lapalme intended to develop an alternative rule-based approach to the problem to serve as baseline, but time constraints in the workshop did not allow such comparisons.

After the workshop, Guy Lapalme decided to pursue the development in Python of a generator of complete bulletins in both French and English. He also made the display of its output more easily comparable with the original bulletin. For the generator, he also wanted to use this exercise as a use-case of jsRealB, a bilingual text realizer that he has been maintaining over the last years.

But, as a first step, Guy Lapalme decided to simply rely on Python string manipulation and formatting. This explain why some sentences are somewhat awkward.

Program call

pubpro [-b int] [-c float] [-e] jsonl_file

where

-b int  : number of full bulletins that are generated
-c float: proportion of bulletins that are generated
-e      : perform BLEU evaluation

Source Programs

  • pubpro.py : main program that drives the whole system

  • ECdata.py : linguistic information about weather systems

  • forecast.py : generate text for different aspects of the Meteocode

  • jsRealBclass.py : Python interface to jsRealB (currently not used)

  • Meteocode.py : Class that interface the json API of the Meteocode

  • ppJson.py : compact pretty-print of a json structure (useful for debugging)

  • in the arpi_eccc directory

    • nlg_evaluation.py : BLEU evaluation of the results
    • timeranges.txt : data indicating the time zone shift for a given type of bulletin
    • utils.py : utilities functions for dealing with time zones
  • in the jsRealB directory : tools for interfacing with the jsRealB text realizer in JavaScript

Bulletins in json (in the data directory)

  • arpi-2021-train-1.jsonl : pretty-printed version of a json entry
  • arpi-2021-train-10.jsonl : 10 first lines of the train corpus
  • arpi-2021-train-1000.jsonl : 1000 first lines of the train corpus
  • Fields.md : description of the fields of Meteocode

arpi_eccc's People

Contributors

lapalme avatar rali-udem avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.