Coder Social home page Coder Social logo

medical-bert's Introduction

UK-based Software engineer with domain experience in automated control systems, medical informatics, and financial services.

medical-bert's People

Contributors

andrewpatterson2018 avatar officialpatterson avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar

medical-bert's Issues

allow experiment config to be read from a JSON file

Currently, we save the experiment config as a JSON file so that we know what has been run. To replicate an experiment found in a JSON file we need to either add each one to the application argument list or to modify the default JSON file.

This issue intends to add a new flag so that when a JSON file is specified we read in config from that instead.

LSTM of tokens in BertClassifier

Currently, the official version of BERT and the one that we work with the now which performs a tanh activation on the CLS token hidden state, where the hidden state is a vector of 768. The output of this layer is what the classification head uses as its features.

The next step therefore is to replace this BertPooler with a layer that performs LSTM of the hidden states of all tokens apart from the CLS and SEP tokens.

For the sake of comparison, tanH activation can then be applied to this layer before giving to the FC head.

Mean pooling of tokens classifier

Currently, the official version of BERT and the one that we work with the now which performs a tanh activation on the CLS token hidden state, where the hidden state is a vector of 768. The output of this layer is what the classification head uses as its features.

The next step therefore is to replace this BertPooler with a layer that performs mean pooling of the hidden states of all tokens apart from the CLS and SEP tokens.

For the sake of comparison, tanH activation can then be applied to this layer before giving to the FC head.

Allow GCP bucket to be used as output dir

Currently, all output data is stored locally on the server on which the code is run. The problem with this is that as the number of experiments increases, we end up using more of the disk space. This is both expensive and difficult to share.

Therefore, we need to be able to choose a GCP bucket as an output directory.

Additionally, given that the software is sued to evaluate by others, we must maintain the old method of allowing a localfilesystem store.

Why does L3C code release perform better?

The L3C code released start of December performs better than the most recent release. Attempt to replicate the results. Report back the results, the difference in results, and the difference in code.

Cleaan up requirements.txt

It might be that this code has old dependencies.

Start a fresh venv on a development server and run the code to see whats missing/needed.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.