From: Samir T.
Sent on: Monday, February 4, 2013, 12:43 PM
After trying it and seeing your point, this made no sense to me either, so I tried porting it to Ruby, which has a much more advanced regex engine than Java:

  COMMENT = "Comment"
  DOMAIN = "Domain"
  EXPIRES = "Expires"
  MAX_AGE = "Max-Age"
  PATH = "Path"
  VERSION = "Version"
  SECURE = "Secure"

  COOKIE_REGEX = /(?:^|\s)([^\s\(\)\[\]\{\}=,\"\"\/?@:;]+)=([^\s\(\)\[\]\{\}=,\"\"\/?@:;]+)
                   (?:
                     ;\s+(#{COMMENT}=[^\(\)\[\]\{\}=,\"\"\/?@:;]+)|
                     ;\s+(#{DOMAIN}=[^\s\(\)\[\]\{\}=,\"\"\\@:;]+)|
                     ;\s+(#{EXPIRES}=[^\(\)\[\]\{\}=\"\"\/?@;]+)|
                     ;\s+(#{MAX_AGE}=\d+)|
                     ;\s+(#{PATH}=[^\s\(\)\[\]\{\}=,\"\"\\?@:;]+)|
                     ;\s+(#{VERSION}=[\d]+)|
                     ;\s+(#{SECURE})
                   )*/ix

  COOKIE = "cookie_name=cookie_value; " +
           DOMAIN + "=www.cookie.com; " +
           COMMENT + "=cookie_comment; " +
           MAX_AGE + "=86399; " +
           PATH + "=/cookie; " +
           EXPIRES + "=Sun, 03 Feb[masked]:31:57 GMT; " +
           SECURE + "; " +
           VERSION + "=1"

  puts COOKIE_REGEX.match(COOKIE).captures

This works perfectly fine. I can only conclude it's Java's fault.

— Samir.


On Sun, Feb 3, 2013 at 7:57 PM, Karl Bennett <[address removed]> wrote:
So,

I'm trying to write an über regex to capture all the parts of a Set-Cookie header value.

This is what I've come up with:

private static final Pattern COOKIE_REGEX = Pattern.compile(
    "(?i)(?:^|\\s)([^\\s\\(\\)\\[\
\]\\{\\}=,\"\"\\\\/?@:;]+)=([^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\/?@:;]+)" +
    "(?:" +
        ";\\s+(
Comment=[^\\(\\)\\[\\]\\{\\}=,\"\"\\\\/?@:;]+)|" +
        ";\\s+(
Domain=[^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\@:;]+)|" +
        ";\\s+(
Expires=[^\\(\\)\\[\\]\\{\\}=\"\"\\\\/?@;]+)|" +
        ";\\s+(
Max-Age=\\d+)|" +
        ";\\s+(
Path=[^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\?@:;]+)|" +
        ";\\s+(
Version=[\\d]+)|" +
        ";\\s+(
Secure)|" +
        "(this is here to force the Secure capture. Don't know why it's needed.)" +
    ")*");

// Raw regex.
// (?i)(?:^|\s)([^\s\(\)\[\]\{\}=,""\\/?@:;]+)=([^\s\(\)\[\]\{\}=,""\\/?@:;]+)(?:;\s+(Comment=[^\(\)\[\]\{\}=,""\\/?@:;]+)|;\s+(Domain=[^\s\(\)\[\]\{\}=,""\\@:;]+)|;\s+(Expires=[^\(\)\[\]\{\}=""\\/?@;]+)|;\s+(Max-Age=\d+)|;\s+(Path=[^\s\(\)\[\]\{\}=,""\\?@:;]+)|;\s+(Version=[\d]+)|;\s+(Secure)|(this is here to force the Secure capture. Don't know why it's needed.))*

Now if you run this regex over a sample cookie value such as:

cookie_name=cookie_value; Domain=www.cookie.com; Comment=cookie_comment; Max-Age=86399; Path=/cookie; Expires=Sun, 03 Feb[masked]:56:51 GMT; Secure; Version=1

It works fine and we get the following captures:

[
    "cookie_name=cookie_value; Domain=www.cookie.com; Comment=cookie_comment; Max-Age=86399; Path=/cookie; Expires=Sun, 03 Feb[masked]:56:51 GMT; Secure; Version=1",
    "cookie_name",
    "cookie_value",
    "Comment=cookie_comment",
    "Domain=www.cookie.com",
    "Expires=Sun, 03 Feb[masked]:57:52 GMT",
    "Max-Age=86399",
    "Path=/cookie",
    "Version=1",
    "Secure"

]

Now as you might have already noticed there is a rather odd capture at the end of that regex. The problem is, if it's not there the "Secure" fragment won't be captured. Could someone please tell me why this is? Is my regex wrong? It is a little bonkers.

Here is some starter code to help with debugging:

public class CookieRegexTest {

    public static final String COMMENT = "Comment";
    public static final String DOMAIN = "Domain";
    public static final String EXPIRES = "Expires";
    public static final String MAX_AGE = "Max-Age";
    public static final String PATH = "Path";
    public static final String VERSION = "Version";
    public static final String SECURE = "Secure";

    private static final Pattern COOKIE_REGEX = Pattern.compile(
            "(?i)(?:^|\\s)([^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\/?@:;]+)=([^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\/?@:;]+)" +
                    "(?:" +
                    /**/";\\s+(" + COMMENT + "=[^\\(\\)\\[\\]\\{\\}=,\"\"\\\\/?@:;]+)|" +
                    /**/";\\s+(" + DOMAIN + "=[^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\@:;]+)|" +
                    /**/";\\s+(" + EXPIRES + "=[^\\(\\)\\[\\]\\{\\}=\"\"\\\\/?@;]+)|" +
                    /**/";\\s+(" + MAX_AGE + "=\\d+)|" +
                    /**/";\\s+(" + PATH + "=[^\\s\\(\\)\\[\\]\\{\\}=,\"\"\\\\?@:;]+)|" +
                    /**/";\\s+(" + VERSION + "=[\\d]+)|" +
                    /**/";\\s+(" + SECURE + ")|" +
                    /**/"(this is here to force the Secure capture. Don't know why it's needed.)" +
                    ")*");

    private static final String COOKIE = "cookie_name=cookie_value; " +
            DOMAIN + "=www.cookie.com; " +
            COMMENT + "=cookie_comment; " +
            MAX_AGE + "=86399; " +
            PATH + "=/cookie; " +
            EXPIRES + "=Sun, 03 Feb[masked]:31:57 GMT; " +
            SECURE + "; " +
            VERSION + "=1";

    private static String printGroups(Matcher matcher) {

        StringBuilder builder = new StringBuilder();

        for (int i = 1; i < matcher.groupCount(); i++) builder.append(matcher.group(i)).append('\n');

        return builder.toString();
    }

    public static void main(String[] args) {

        Matcher matcher = COOKIE_REGEX.matcher(COOKIE);
        matcher.matches();

        System.out.println(printGroups(matcher));
    }
}


Cheers,
Karl




--
Please Note: If you hit "REPLY", your message will be sent to everyone on this mailing list ([address removed])
This message was sent by Karl Bennett ([address removed]) from LJC - London Java Community.
To learn more about Karl Bennett, visit his/her member profile
Set my mailing list to email me As they are sent | In one daily email | Don't send me mailing list messages

Meetup, POB 4668 #37895 NY NY USA 10163 | [address removed]