The following are some issues I've found that aren't simple text changes. I've tried to make it clear what file the specific issues are from.
Preface.asciidoc]
The Preface says:
"[The Elasticsearch documentation] assumes that you are intimately familiar with information retrieval concepts, distributed systems, the query DSL and a host of other topics. This book makes no such assumptions. It has been written so that a complete beginner — to both search and distributed systems — can pick it up and start building a prototype within a few chapters."
and
"We explain concepts from first principles, helping novices gain sure footing in the complex world of search."
I don't think novices will know what distributed scalable real-time search and analytics engine, text search, structured search, analytics, structured and unstructured data, and the like are. I'm also not sure what "first principles" mean in this case, but it would be great if you could provide a general description of these concepts somewhere early in the book, maybe in a pre-Preface Preface. As is, there are too many terms that a novice, or even more experienced readers, might not know.
[05_What_is_it.asciidoc]
I'm bothered by the quick mention of Apache Lucene. It's described as a search
engine library. Is this enough information for new users?
"Document store" is used for the first time. A new user won't know how this
compares to a traditional relational database.
[10_Installing_ES.asciidoc]
"When installing Elasticsearch in production, you can use the method described above, "
What does installing into production have to do with anything. Installing from source
or packages can be done in any tier.
Say whether to install the jre or the jdk
Does it make sense to include "View in Sense" links in the text? Ideally these would
only appear in the online version. I'm not sure how you're planning on handling difference where something will appear online but not onpaper.
You mention Elasticsearch cluster in several places, but you don't
define it until after you've you've used it, e.g.
"You probably don’t want Marvel to monitor your local cluster, so you can disable data collection with this command:"
"This means that your Elasticsearch cluster is up and running, and we can start experimenting with it."
A new reader won't know what the difference is between a single Elasticsearch instance, such as the one they just installed, and a Elasticsearch cluster.
It's little issues like this that cause initial confusion. I know you explain this
later but it's important to bring up terms in the right order.
"echo 'marvel.agent.enabled: false' >> ./config/elasticsearch.yml""
What is "data collection"? If this is really necessary could you explain why. Otherwise this won't make sense to a new user.
"A cluster is a group of nodes with the same cluster.name "
How would a new user know what a cluster name is, or how to set one?
[15_API.asciidoc]
"If you are using Java, then Elasticsearch comes with two built-in clients which you can use in your code:
Node client
The node client joins a local cluster as a non-data node. In other words, it doesn’t hold any data itself, but it knows what data lives on which node in the cluster, and can forward requests directly to the correct node. "
This will be very confusing. A reader will probably think of a client as something
that talks to a server. Yet, in the description of the "node client", you say the client
"joins" a cluster.
Is it correct to say "Elasticsearch provides official clients for several languages, and there are numerous community-provided clients and integrations, all of which can be found in the Guide."? Is Elasticsearch really providing clients, or rather, libraries?
[20_Document.asciidoc]
The term "objects" was being used in too many ways so I changed the wording of the first couple of paragraphs.
Would a novice know what a "serialization format" is?
"Although the original user object was complex"
Where is the original user object shown? This is a confusing reference.
"Converting an object to JSON for indexing in Elasticsearch is much simpler than the equivalent process for a flat table structure."
If you're going to mention this, you should give an example of why this is true.
[25_Tutorial_Indexing.asciidoc]
What does it mean to build "dashboards over the data"? The "over" is what I don't understand.
"Elasticsearch and Lucene use a structure called an inverted index for exactly the same purpose."
Since you said that Lucene is the search engine, does it make any sense to include Elasticsearch in this sentence?
"The request body — the JSON document — contains all the information about this employee. "
Is it correct to call the request body a JSON document rather than the JSON object?
"which allows us to build much more complicated, robust queries"
What's a "robust" query?
"Our query will change a little to accommodate a filter, which allows us to execute structured searches efficiently:"
Saying this adds no value because a new reader won't see how this search is more
or less efficient than what was shown before.