April's Code Share: Faster than standard?

April's Code Share: Faster than standard?

The standard libraries that come with modern languages provide most of the data structures and many of the common algorithms that are typically taught in computer science courses. How often do you really think of the characteristics of the collections you are using or do you always default to java.util.ArrayList and java.util.HashMap? Do you ever wonder what is the most appropriate sorting algorithm for the data you are processing?

This months challenge is to implement a sorted map (note you don’t need to implement the SortedMap interface though, only Map) without using any of the standard java collections or algorithms. You may only use arrays, scalar types and classes you have implemented yourself. Once you have your map implementation you need to use it to implement the following three operations as efficiently as you can without looking at the source code of the java standard library.

Download the English Open Word List (EOWL) http://dreamsteep.com/component/docman/doc_download/3-the-english-open-word-list-eowl.html?Itemid=30
In the zip file you will find a subdirectory EOWL-v1.1.1/LF Delimited Format you will find a set of text files named “? Words.txt”

The first part of the challenge is to read the contents of the files “A Words.txt” to “Z Words.txt) in order into your Map implementation with the words being he keys to the map and the initial value being zero.

In order to avoid timing file operations you should first read all the words into one of the standard collections provided by your language (eg ArrayList) and then time how long it takes to copy the words into your Map<String,Integer> implementation.

The second part of the challenge is to read the entire contents of the text file http://www.gutenberg.org/cache/epub/30255/pg30255.txt (including Gutenberg prefix and suffix) and for each word in the text file retrieve the count from your map and increment it by one. The end result should be your map of English words containing the count of each word in the file (and zero for all words not encountered in the file). Any words in the text file that were not in the initial word list (ie are not in map already) should be ignored. Again to avoid including the time for file operations in your times you should load the text file into a string first and then time the process of iterating through the words in the string and incrementing the count.

Finally, you should produce a list of the word counts sorted (alphabetically, ascending) by word. If using Java the result should be contained in a java.util.ArrayList<WordCount> where WordCount is the following class:

public class WordCount {
   public final String word;
   public final int count;

   public WordCount(String w, int c) {
       this.word = w;
       this.count = c;
   }
}

We aim  to provide a common test harness in Java which will provide the word list and text file so you should implement the operations described above by implementing the interface Benchmark (below) and providing a no-argument constructor:

package org.ljc.april2012;

import java.util.ArrayList;
import java.util.Map;

public interface Benchmark {

   Map<String,Integer> loadWords(ArrayList<String> words);
   
   void addWordCounts(Map<String,Integer> words, String text);
   
   ArrayList<WordCount> getCounts(Map<String,Integer> words);
}

If using a language other than java you are free to implement your own equivalents of WordCount and Benchmark.

If we have the time we’ll stage a contest comparing the performance of each implementation.

You might like to try Caliper to time your code http://code.google.com/p/caliper/

You can download the code for the above classes from:
WordCount: http://www.davesnowdon.com/resources/ljc/codeshareapril2012/WordCount.java
Benchmark: http://www.davesnowdon.com/resources/ljc/codeshareapril2012/Benchmark.java
When the test harness is ready we’ll update the event with a link.

 

Please Note -

At previous events GDC members have found this venue/particular room hard to find.

This event is being held in the Francis Bancroft Building - Room Number - FB 1.15. 

Please see this map for further information (the building is numbered 26 on this map) http://www.qmul.ac.uk/docs/about/26065.pdf 

The actual signs at the building will say 'Bancroft' building only rather than Francis Bancroft.

Directions -

From Mile End Station you will need to turn left out of the station and then continue for 5 mins approx. up the Mile End Road and then cross over the road to the QM campus and go into the East Gate.  This is the first entrance to QM you come to when coming from the direction of Mile End Tube.  Continue straight on up Westfield Way for a couple of minutes and then bear left at the Curve cafe.  The Bancroft Building is on your right (number 26 on the map).


This event is being hosted in conjunction with the LJC.

Join or login to comment.

  • Yuki W.

    It's really a good time spending with those excellent developers. Actually, I know little about that, but I do learn a lot. And I like the way people discuss together, learn from each other, and share their thoughts.

    April 13, 2012

3 went

Our Sponsors

  • Our Blog

    Here is a link to our blog

  • RecWorks Ltd

    Fixing Tech Recruitment using the Power of Community

  • jClarity

    Mentoring and a hands on Student Programme for learning industry skills

People in this
Meetup are also in:

Sometimes the best Meetup Group is the one you start

Get started Learn more
Rafaël

We just grab a coffee and speak French. Some people have been coming every week for months... it creates a kind of warmth to the group.

Rafaël, started French Conversation Group

Sign up

Meetup members, Log in

By clicking "Sign up" or "Sign up using Facebook", you confirm that you accept our Terms of Service & Privacy Policy