sortable's Introduction
The Input consists of two files with JSON lines format 1) Product containing product_name,manufacturer,model,announced-date Sample {"product_name":"Samsung_TL100","manufacturer":"Samsung","model":"TL100","announced-date":"2009-01-07T19:00:00.000-05:00"} 2)Listing containing title,manufacturer,currency,price Sample {"title":"Samsung TL100 12.2 Megapixel Digital Camera - Black","manufacturer":"Samsung","currency":"USD","price":"50.00"} On Analysing we can note that product_name key of products file has entry in listings file with "_" replaced by space in the title key. So on the first note we can read the each line of prodcuts file and check with listings file like 1)Read the product_name from products file 2)Replace _ from product_name with space 3)find the replaced word occurence in listings file title key 4)If title contains the replaced word ,fetch that line and add in JSON object. Doing so is time consuming since products file conatins 743 records (lines) and listing s file contains 20196 records (lines) So to do the word search,for each operations ,we have to do 700 * 20196 operations. Instead if we make a datastructure of listings like map of the listings file containing title value,for every read from product fiel we can fetch the value map passing in the key. But the issue is Map dosent allows duplicate key,but there is goggle api guava which allows duplicate key called multimap. Process involved 1)Make Datastructure of listings file (mapListings()) a) Read the first word of the title and map the title value to it b) Read the title and map the rest of the line to it. while ((lines = br.readLine()) != null) { JSONObject jsonObj = (JSONObject) jsonParser.parse(lines); title = jsonObj.get("title").toString(); titleFirstWord = title.substring(0, (title.contains(" ") ? title.indexOf(" ") : title.length())); keyWord = titleFirstWord.replace(titleFirstWord.charAt(0), Character.toUpperCase(titleFirstWord.charAt(0))); titleKeyMap.put(keyWord, "@" + title); titleValueMap.put(title, lines); } So now titleKeyMap contains name like Sony,Casio,Samsung and each one has multiple values and titleValueMap conatins title key and rest of the lines as value. 2) Read Products file (readProductsFromFile()) a) Read each line of the file products.txt b) For each line check the occurence in the mapped datastructure (titleKeyMap and titleValueMap) try { BufferedReader br = new BufferedReader(new FileReader("./products.txt")); while ((lines = br.readLine()) != null) { JSONObject jsonObj = (JSONObject) jsonParser.parse(lines); productName = (jsonObj.get("product_name").toString()); searchProductName = productName.substring(0, (productName.contains("_") ? productName.indexOf("_") : productName.indexOf("-"))); replacedProductName = productName.replace((productName.contains("_") ? "_" : "-"), " "); titleValue = new StringTokenizer(titleKeyMap.get(searchProductName).toString(), "@"); this.printValues(titleValue, productName, replacedProductName, titleValueMap, ps); } 3) Checking the occurence from the passed values readProductsFromFile() public void printValues(StringTokenizer listingsValue, String productName, String replacedProductName, Multimap<String, String> titleValueMap, PrintWriter ps) a)listingsValue contains the title of the listings. b)Pass this listings value to the titleValueMap as key so we get the lines c)Add productName into JSON object as product_name and sorted values as JSON Array. d)As the output requires the format in { "product_name": String "listings": Array[Listing] } e) And JSON dosen't guarantee the order,we will add into Map and pass that map odject to JSON JSONArray listings = new JSONArray(); Map obj = new LinkedHashMap(); obj.put("product_name", productName); while (listingsValue.hasMoreElements()) { tokenVal = listingsValue.nextElement().toString(); tokenValue = tokenVal.substring(0, ((tokenVal.length() > 2) ? (tokenVal.length() - 2) : tokenVal.length())); if (tokenVal.contains(replacedProductName)) { arrayValue = titleValueMap.get(tokenValue).toString(); arrayValue = arrayValue.replaceAll((arrayValue.contains("[") ? "\\[" : "\\]"), ""); System.out.println("Check"+listings.contains(arrayValue.toString().replaceAll("\"", ""))); if(!listings.contains(arrayValue.toString().replaceAll("\"", ""))) listings.add(arrayValue.toString().replaceAll("\"", "")); } } obj.put("listings", listings); ps.println(JSONObject.toJSONString(obj)); So we get the desired output as Sample: {"product_name":"Sony_Cyber-shot_DSC-W310","listings":["{title:Sony Cyber-shot DSC-W310 - Digital camera - compact - 12.1 Mpix - optical zoom: 4 x - supported memory: MS Duo, SD, MS PRO Duo, MS PRO Duo Mark2, SDHC, MS PRO-HG Duo - silver,manufacturer:Sony,currency:USD,price:108.75}, {title:Sony Cyber-shot DSC-W310 - Digital camera - compact - 12.1 Mpix - optical zoom: 4 x - supported memory: MS Duo, SD, MS PRO Duo, MS PRO Duo Mark2, SDHC, MS PRO-HG Duo - silver,manufacturer:Sony,currency:USD,price:113.55}]"]}
sortable's People
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.