gistlabs / mechanize Goto Github PK
View Code? Open in Web Editor NEWmechanize for java
Home Page: http://gistlabs.com/software/mechanize-for-java/
License: Mozilla Public License 2.0
mechanize for java
Home Page: http://gistlabs.com/software/mechanize-for-java/
License: Mozilla Public License 2.0
For simulating different user different sequences of action should exist. These sequences can be managed in a single collection (SequenceCollection). The sequences can be removed, added, and get and also randomly being drawn.
While doing load testing a User needs to be simulated. A user here is doing a sequence drawn from a collection if the sequence ends another sequence will start until the user is stopped.
The stop will signal the agent performing the requests to stop running the sequence -> needs a way to halt a sequence.
Scenario:
This is my link!
page.links().get(byInnerHtml("This is my link!");
should work. But it does not since it is compared against the inner HTML.
Doing load testing a collection of concurrent users needs to be setup and controlled. A user pool manages different users all randomly drawing sequences to perform from a sequence collection. The user pool uses an agent per simulated user.
This way one can create a load testing / integration testing environment for static and partly dynamic web pages and also for rest services once the rest / json support is added.
I'm trying to use mechanize to sign in to an amazon page, but running into an issue. For some reason it doesn't find the email input field that is on that page, yet does find the password, and a bunch others too. Here is the code I'm using, and I'm trying against master (3ddf8f8):
String amazonUrl = "http://www.amazon.com/gp/digital/fiona/manage/ref=gno_yam_myk";
MechanizeAgent agent = new MechanizeAgent();
Page signinPage = agent.get(amazonUrl );
Form form = signinPage.forms().get(0);
form.get("email"); // This is null
form.get("ap_email"); // This is null
form.get("password"); // This is not null
Looking at the form field in a debugger, the email input element doesn't seem to be there. The signinPage.asString contains:
<input id="ap_email" name="email" value="" type="email" size="30" maxlength="128" tabindex="1" autocorrect="off" autocapitalize="off" />
See http://jsonselect.org/ for an example application to JSON.
This maps special abilities of a node element (HtmlElement, HtmlTextNode) towards an special attribute that can be reached using getAttribute and also is part of the attribute names of the element.
this test,
@Test(expected=IllegalArgumentException.class)
public void testExpectPost() throws Exception {
agent.addPageRequest("POST", "http://test.com/form", newHtml("OK", ""));
Page result = agent.get("http://test.com/form");
assertEquals("OK", result.getTitle());
}
will succeed then fail (the test method accurately gets the exception, the MechanizeTestCase.afterTest() method then fails the test with
junit.framework.AssertionFailedError: Unexecuted page request: com.gistlabs.mechanize.MechanizeMock$PageRequest@1055e55f
at junit.framework.Assert.fail(Assert.java:50)
at com.gistlabs.mechanize.MechanizeTestCase.afterTest(MechanizeTestCase.java:33)
It would be nice to be able to have afterTest() detect if failure has already occurred, and then do nothing.
Picking up from hijack of #31
Running some simple code to login to amazon under an android 4.1 emulator. The code works when run as plain java locally.
For some reason, the cookies in the response object aren't what they should be. May have something to do with the inclusion of httpclient as part of the sdk, but not sure where to start looking.
public void testSignIn() throws IOException {
String username = "";
String password = "";
MechanizeAgent agent = new MechanizeAgent();
String manageKindleUrl = "http://www.amazon.com/gp/digital/fiona/manage/ref=gno_yam_myk";
Page signinPage = agent.get(manageKindleUrl);
debug(signinPage);
Form form = signinPage.forms().get(0);
form.get("email").setValue(username);
((Checkable) form.get("ap_signin_existing_radio")).setChecked(true);
form.get("password").setValue(password);
Page managePage = form.submit();
debug(managePage);
}
private void debug(Page page) {
System.out.println("\n\n\n");
System.out.println("**** Page Headers ****");
System.out.println(page.getResponse().toString());
System.out.println("**** Page Cookies ****");
for (Cookie cookie : page.getAgent().cookies()) {
System.out.println(cookie.toString());
}
System.out.println("**** Page Body ****");
System.out.println(page.asString());
}
Add HtmlElement, HtmlNode, HtmlTextNode and HtmlPage.htmlElements() to hide Jsoup when dealing with normal HTML document inspectation tasks.
The current MechanizeAgent.post() API allows <String,String> arguments. This prevents sending a file through this api.
(Perhaps the post(String, Parameters) method supports this now... but not clear).
Suggest changing signature to:
public Page post(String uri, Map<String, Object> params)
where we document Object can be one of: String, String[], or ContentBody (otherwise runtime exception thrown)
Allowing multi-part support for the doRequest() builder mechanism would simplify the complete submit(form,..) handling. Also it will make doRequest more feature complete.
To mimic browser behaviour (especially when load testing) it is necessary to load all images of a page. Since every browser has an image cache only images newly being encountered by the agent must be fetched.
so something like page.images().getMissing(ImageCollection). Within the image collection it can be stated (subclass) whether the images should be writen to buffer, saved to file or just marked as being fetched.
Using the same ImageCollection across multiple pages one can mimic the image loading browser behaviour for an entire sequence of actions.
I think this:
https://github.com/GistLabs/mechanize/blob/master/src/main/java/com/gistlabs/mechanize/parameters/Parameters.java#L41
should be:
add(name, value.toString())
since the following code ends up not setting any params in the post:
Map params = new HashMap();
params.put("contentType", "All");
params.put("count", Integer.toString(count));
Page ownershipData = agent.post(ownershipUrl, params);
See branch for failing test case.
By supporting and, brackets and not query will be powerful enough to express any logical statement according to a single element.
Introduce AgentPool as a collection of agents that can be controlled as a whole.
new AgentPool(numberOfAgents)
The AgentPool should offer a listener service and interceptors:
Interceptors: before and after (agent, sequence), init(agent)
Listener: begin, end(agent, sequence)
This allows to measure timings for the duration of a sequence for a single user
AgentPool.process(SequenceCollection, duration) runs a load test / for the given amount of time.
Expose a method to directly post from mechanize agent.
For example:
Page result = agent.post(urlString, mapParams);
Requested from mat in comments http://gistlabs.com/software/mechanize-for-java/
To simplify load testing and integration testing a user needs to be simulated. The user should have a behaviour which is described by a sequence of actions. The sequence will have (random) idle times applied simulating user waiting time during the different user actions.
agent.run(sequence) - Using an agent a sequence will be played
Use Node.getAttributeNames for the matching and *.
Evolvable, versionable APIs can take advantage of default values.
Reconsider:
com.gistlabs.mechanize.form.FormTest
// TODO JDH: confirm that we should fail in this case... see versioning API for counter example
@Test(expected = UnsupportedOperationException.class)
public void testSettingValueOfHiddenInputFails() {
There only seems to be an older version on http://mvnrepository.com/artifact/com.gistlabs/mechanize
Can you update to 0.9.1 ?
agent.do(url).set(param).add(param).post/get() instead of simply using agent.get(url) or agent.post(url, new Parameters().set(x,x).add(x,y)).
Using the do method it is also possible to expose a get query builder that easily composes get parameter queries and should also be able to parse the already present parameters of the url.
Currently multi-part submit and doRequest are not tested and should be tested using a real web site.
I get a trace when running with jsoup 1.6.4 which goes away with 1.7.1 (from the code posted in #34)
java.lang.StringIndexOutOfBoundsException: String index out of range: 0
at java.lang.String.charAt(String.java:686)
at org.jsoup.helper.DataUtil.parseByteData(DataUtil.java:98)
at org.jsoup.helper.DataUtil.load(DataUtil.java:54)
at org.jsoup.Jsoup.parse(Jsoup.java:118)
at com.gistlabs.mechanize.html.HtmlPage.loadPage(HtmlPage.java:44)
at com.gistlabs.mechanize.Page.(Page.java:60)
at com.gistlabs.mechanize.html.HtmlPage.(HtmlPage.java:39)
at com.gistlabs.mechanize.html.HtmlPageFactory.buildPage(HtmlPageFactory.java:28)
at com.gistlabs.mechanize.MechanizeAgent.toPage(MechanizeAgent.java:151)
at com.gistlabs.mechanize.MechanizeAgent.request(MechanizeAgent.java:90)
at com.gistlabs.mechanize.RequestBuilder.post(RequestBuilder.java:122)
at com.gistlabs.mechanize.form.Form.submit(Form.java:296)
at com.gistlabs.mechanize.form.Form.submit(Form.java:276)
at com.gistlabs.mechanize.integration.test.AmazonSignInIT.testSignIn(AmazonSignInIT.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:45)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:42)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:263)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:68)
at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:47)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:231)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:60)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:229)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:50)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:222)
at org.junit.runners.ParentRunner.run(ParentRunner.java:300)
at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Consider https://github.com/jayway/JsonPath (replacing JSoup when json is returned).
use a filter for identifying links and forms, so those aspects of Page will still function.
See http://testutils.org/multi-mechanize/
@MartinKersten, this may have ideas similar to the issues you've defined in this milestone.
by(...) allows to match against attributeName or a array of attributeNames
--> Util/Builder.array(T...values) would be great, too.
Some type documentations have @Version 1.0 in some source files. Has to be removed.
Put a comment in with this copyright heading. We'll add an open source license for rights before making public.
// Copyright (C) 2012 Gist Labs, LLC. All Rights Reserved.
The generalized glue code between css-selectors and JSON Elements is something that I'd like to see maintained in the css-selectors project.
If/when this pull request is accepted we should remove our own com.gistlabs.mechanize.json.query package entirely and use the css-selectors code.
A single Page type is too limiting for a full RESTful multi-content client. The name Page is common and easily understood, but perhaps too ambiguous.
Suggested hierarchy of types:
-Resource // the root of all returned types from the web, has bytestream
|
|- ImageResource // also has getImage()
|
@MartinKersten, I assume this is what you were suggesting. Please comment if this isn't what you were thinking about. I've been convinced :)
The mapping is using Special_Attributes
Take advantage of the following HTTP Headers:
For some requests an accept header must be set for accepting JSON or XML rather than HTML. This can be done using the do(url) method.
agent.do(url).accept("application/xml") for instance. Default is accepting html.
We need to offer a non-multipart send. (for POST/PUT).
We currently have these post() signatures:
public Page post(String uri, Map<String, String> params) throws ... {
public Page post(String uri, Parameters params) {
I suggest adding the following as well:
public Page post(String uri, Parameters params, byte[] body) {
public Page post(String uri, Parameters params, InputStream body) {
public Page post(String uri, Parameters params, File body) {
I anticipate the complexity will be in handling parameters (encoding and content type issues).
See for example this code, particularly the reset multipart data calls:
https://github.com/sonatype/async-http-client/blob/master/api/src/main/java/com/ning/http/client/RequestBuilderBase.java#L479
We should remove package cycles.
This will increase readability and remove a lot of duplicated logic
See http://stackoverflow.com/questions/2618573/what-version-of-apache-http-client-is-bundled-in-android-1-6 for one opinion on the actual version of HttpClient that is bundled in Android.
We definitely want to support Android and this review would be good to confirm it.
@MartinKersten and @wr0ngway, any feedback and opinions from you both is welcome on making sure that we support Android well.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.