garvincasimir / elasticsearch-azure-paas Goto Github PK

View Code? Open in Web Editor NEW

17.0 17.0 8.0 2.5 MB

Visual Studio Project which creates an Elasticsearch cluster on Microsoft Azure using worker roles

License: MIT License

C# 85.32% PowerShell 14.68%

azure c-sharp elasticsearch worker-role

elasticsearch-azure-paas's People

Contributors

Stargazers

Watchers

Forkers

tormodu rvukovic yonglehou kneeclass kunlqt cata cdoru lgsonic

elasticsearch-azure-paas's Issues

Elasticsearch binding 'problem' when deployed to Azure

This is not really an issue with the project, but something I learned that might save others time. Elasticsearch 2.4.1 (don't know about other versions) will bind to the loopback address by default if you don't specify an address in the configuration file. This is OK when running in the Emulator, but can be problematic when you deploy to Azure because even if you define the correct endpoint in the role settings, you won't be able to connect to Elasticsearch. Binding to the address '0' will allow the Elasticsearch instance to accept requests on the endpoint you have defined.

Azure files share

I have been testing the performance of Elastic Search using the Azure files share persistent storage for Elastic Search indexes and the performance is not good.
I ran two tests one using local storage and one using the file share, both tests indexed 13 million documents. When using local storage indexing took about 45 minutes, using the share it took over 2 hours.

There should be a config value specifying whether to use local storage or azure files.
For systems using an external data store to populate the index, there may not be any need for a persistent disk for the index.

[feature request] file based discovery

With file based discovery, i think it will make it easier to upgrade to latest ES by getting rid of azure-runtime java plugin
https://www.elastic.co/guide/en/elasticsearch/plugins/current/discovery-file.html

Bootstrap Bulk Data

It would be nice if there was a simple process for adding bulk data to the cluster when it is first created. A good sample dataset can be found at courtlistener.com. The bulk processing power shell script I created for processing the data is a good place to start.

Error using Azure worker role temp folder for download

In softwareManager.cs line 40 download is called using:

_artifact.Download(_binaryArchive);

This causes the download method to use the Azure Worker role temp folder. This folder has a size limit of 100mb witch is too small to download both Elastic Search and Java.
Changing the call to download to this fixes the problem:

 _artifact.Download(_binaryArchive, false);

JAVA_HOME not set

I added some more logging and have found the reason why Elastic search does not start on new instances during scaling.
When Elastic Search is started it complains about missing JAVA_HOME variable. This variable has been set, but is somehow not picked up.
By restarting the worker role everything works as expected.

By starting the Elastic Search process like this:

 _process = new Process();
 _process.StartInfo = new ProcessStartInfo
  {
        FileName = startupScript,
        UseShellExecute = false,
        RedirectStandardOutput = true
    };
_process.Start();
_process.BeginOutputReadLine();

and the capturing the standardoutput like this:

_process.OutputDataReceived +=
   delegate(object sender, DataReceivedEventArgs args)
    {
         var output = args.Data;
          if (output == "JAVA_HOME environment variable must be set!")
          {
               Trace.TraceInformation("JAVA_HOME not set restarting");
                throw new Exception("JAVA_HOME not set restarting");
           }
      };

It is possible to listen for the text "JAVA_HOME environment variable must be set!" which elasticsearch.bat file outputs and restart the worker role by throwing an exception. Not very elegant, but it works as a temporary solution.

RoleRoot missing separator character when deployed to Azure

When deployed to Azure the RoleRoot point to a drive not a folder like in the emulator.
This leads to an invalid path when constructing PackagePluginPath. The path will look like this:
E:approot\plugins. The code should append a valid separator character to the drive like:
roleRoot = roleRoot + @"";

Implement internal load balancing

Configure worker roles (Elasticsearch Nodes) to have a single internally load balanced endpoint on the default Elasticsearch port.

Store additional plugins in storage

It would be nice if we moved marvel and any additional plugins to a configurable storage container. The only plugin required by this solution is the discovery plugin so we can make an exception and keep it in the solution. We don't want to be dictating what plugins should be used. We also don't want to make it difficult for people to add their own plugins for their purposes.

Elasticsearch command line plugin install

This project currently supports installing plugins by placing the zip files in a special storage container. However, using a local zip file is only one of the ways an Elasticsearch plugin can be installed. It would be nice if the project supported the following pattern

plugin --install <org>/<user/component>/<version>

I am not sure how this would work but we would need to store a list of these org/user/component/version combinations somewhere then feed it into the plugin installer for processing. It would be nice if the service configuration supported arbitrary lists of strings in a setting.

<ListSettings>
    <ListSetting name="ElasticsearchPlugins">
        <Setting>elasticsearch/marvel</Setting>
        <Setting>elasticsearch/shield</Setting>
   </ListSetting>
</ListSettings>

es_heap_size

We need a way to set the es_heap_size. I think this value should be set automatically based on the available memory on the worker role. The recommended setting is 50% of the available memory. It should also be possible to override this setting through a config value.

Elastic search recommends setting this value using the environment variable es_heap_size, but I think it would be a better approach setting the Xms and xmx parameters when starting elastic search to minimize dependency.

Can this be run in web jobs?

I'm pretty unfamiliar with how this actually works. I have a really low traffic site but I need ES and was hoping I could just add it via a web job.

Elastic Search shutdown

I have been trying to debug this part of the code trying to shutdown Elastic Search:

public virtual void Stop()
{
   if (_process != null)
        {
            return;
        }

        if (_process.HasExited)
        {
            return;
        }

        _process.CloseMainWindow();
}

Too me it seems like _process is never null which means that _process.CloseMainWindow(); is never called.

Should the correct implementation be this?

    public virtual void Stop()
    {
        if (_process == null)
        {
            return;
        }

        if (_process.HasExited)
        {
            return;
        }

        _process.CloseMainWindow();

    }