Subject: Decision to ACCEPT WITH MANDATORY MINOR REVISIONS (AQ) - SPL-13377-2013, Joint action segmentation and classification by an extended hidden Markov model From: "jarenas@tsc.uc3m.es" Date: 12/07/2013 9:29 AM To: Ehsan Zare Borzeshi CC: "jarenas@tsc.uc3m.es" , "jeronimo.arenas@gmail.com" , Ehsan Zare Borzeshi , "o.perezconcha@unsw.edu.au" , Richard Xu , "xuyida@hotmail.com" , Massimo Piccardi 11-Jul-2013 Mr. Ehsan Zare Borzeshi University of Technology, Sydney null Sydney New South Wales Australia ** {This applies to SUBMITTING AUTHOR Accounts ONLY: You can find any possible ATTACHMENTS FROM THE REVIEWERS by going to the "Manuscript with Decisions" status link in your Author Center and clicking on "view decision letter". They are located at the bottom of decision letter under "Files attached" heading} Dear Mr. Zare Borzeshi, I am writing to you concerning the above referenced manuscript, which you submitted to the IEEE Signal Processing Letters. Comments from the reviewers are attached at the end of this email. (**See note below about attachments). In summary, all reviewers agree that the paper has clearly improved with respect to the first version, but there are still some minor issues that need to be addressed. Therefore, I reach the conclusion that the paper can be accepted, but that certain minor revisions are necessary (giving this an "AQ" status, ACCEPTED WITH MANDATORY CHANGES). Please try to update your manuscript addressing all remaining issues. In particular, I would like you to pay particular attention to the comments of Reviewer #2 regarding the experimental work. The need of including some measure of statistical significance has been pointed out both by this reviewer, and also by Reviewer #3. I hope that you will carefully consider these comments and take necessary actions. Your revised manuscript must be uploaded within 2 (TWO) weeks to your account in Manuscript Central (http://mc.manuscriptcentral.com/spl-ieee ). If it is not possible for you to submit your revision within two weeks, you must communicate this directly to the EIC, Prof. Anna Scaglione, a.scaglione@ieee.org, ascaglione@ucdavis.edu, and seek approval for an extension. Be sure to copy the managing AE in your message to the EIC. If you do not submit your paper within the given two weeks, the ScholarOne system will lock you out for you to submit your revised .R paper. This is why an actual extension needs to be given so the due date for your revised work can be adjusted in the system. However, author extensions are typically not granted and it is expected you submit your revised manuscript to the system in a timely fashion. The account will have the number SPL-13377-2013 .R1, where `R1' indicates the revision. Please use this number and do not create a new submission. If you do not see this number listed, please send an email to the IEEE Signal Processing Society Publications Office, Lisa Jess at l.jess@ieee.org, and she will assist you in finding this number. * If you have any questions regarding the reviews, please contact the Associate Editor who managed your paper. Inquiries regarding the submission of your final electronic materials should be directed to the Production Department at spl@ieee.org. All other inquiries should be directed to the Admin., Lisa Jess. Best regards, Dr. Jerónimo Arenas-García Associate Editor jarenas@tsc.uc3m.es, jeronimo.arenas@gmail.com * If you have any questions regarding the reviews, please contact the Associate Editor who managed your paper. Reviewer Comments: Reviewer: 1 Recommendation: AQ - Publish In Minor, Required Changes Comments: The authors have addressed the major concerns I had before and improved the manuscript a lot. I'd suggest the following minor corrections/improvement for the final version: (Note: page and line numbers below are in reference to the single column version) a. P15, L3, Due -> Due to b. P15, L24, (number of points per frame) -> (i.e. number of points…) c. P16, L40, The Conclusion -> Section IV… or The conclusion section… d. P23, L9, Z. Z, W. Li and Z. Liu -> W. Li, Z. Zhang and Z. Liu… e. In Table 1, did N=256 gave the best accuracy for the “Bag-of-features” method? If so, please state it in the text. f. In Table II, did N=512 gave the best accuracy for the “Bag-of-features” method? If so, please state it in the text. g. In Table II, the accuracy of HMM-MIO (Gaussian) is close to zero (which is much worse than random guess). This seems out of expectation. Please give an explanation or discussion on this. h. P20,L42-43, “As features, we have used STIPs as for the previous dataset, but sub-sampling them one in ten to limit the overall data size”, how the sub-sampling was done. Please give details. Additional Questions: 1. Is the topic appropriate for publication in this transaction?: Perhaps 2. Is the topic important to colleagues working in the field?: Moderately So Explain: 3. How would you rate the technical novelty of the paper?: Somewhat Novel 4. How would you rate the English usage? : Satisfactory 6. Rate the references: Satisfactory null: Reviewer: 2 Recommendation: R - Reject (Paper Is Not Of Sufficient Quality Or Novelty To Be Published In This Transactions) Comments: The new version of the manuscript deals satisfactorily with most of the suggestion made in the review. However, I have still doubts about the significance of the results achieved by the HMM-MIO in the experiments to recommend the acceptance of the article. The method is evaluated in two segmentation plus recognition of action datasets, one synthetic and one real. The proposed method seems to clearly outperform the baseline in the synthetic dataset, but performances seem to be the same in the real dataset. The following comments would give a clearer insight about the real advantages of the proposed method over the base line. 1) I'd like to see a slightly more detailed statistical analysis of the results. I think the tables should show averages and standard deviations over at least ten different train/test partitions on the two datasets. In the CMU case this would help solve the apparent tie in performances. 2) The proposed method relies on 4 parameters D, M, \nu, S that are explored in somewhat reasonable ranges of values. The baseline (bag-of-features) relies only on two parameters, number of clusters and window size, W. Only the number of clusters is treated evenly with respect to the HMM-MIO parameter set, since W is fixed to 32 in both tasks, despite of the fact that the duration of the actions seem very different in the two datasets. The KTH data sequences include 24 actions in 2000 frames (roughly 83 frames per action) while the CMU data sequences include 14 actions in 15000 frames (over 1000 frames per action in average). I think that W can be a key parameter for the performance of the bag-of-features approach and deserves the same treatment in the experimental setup. In my view most practitioners would prefer the BoF approach since its performance in the real data is the same and they only need to care about one parameter: the number of clusters in k-means, which is a parameter generally easy to tune. Perhaps some discussion about other advantages of HMM-MIO (such us computational cost, whatever) over BoF is needed to incline potential readers to the use of the proposed method. Moreover, the usual way of addressing the effect of these untrained parameters is to select them by cross-validation. The manuscript shows a range of test set performances, which just illustrate the best and worse performances. In the BoF case, where only one parameter has been tuned, I'd also add the mean or median value of the range. In the HMM-MIO, where 4 parameters have been explored, the extremes of the range of values do no give enough detail about the expected performance of the algorithm. 3) There is a very significant drop in performance from the synthetic to the real data. I think that the paper would benefit from a more detailed discussion about the differences between the two datasets. With respect to prior information about the datasets, the article could include the average action length for each class (that would also help guess their prior probabilities). Perhaps the introduction of the error on the training data can give some insight about why one data is more difficult to learn than the other. The display of the confusion matrices of both experiments for HMM-MIO and BoF will also help find out the differences in performance. I know there is a constraint in the length of the paper. I suggest to remove Fig 3, merge Tables I and II and remove section III.C to get space for these suggestions. In fact, I think the discussion about CRF should not be in the experimental section since CRF's are not used in the experiments. If needed, this discussion should be shortened and moved to the introduction. Finally, I've found the following typos in the one-column version: page 1 line 39: joint segmentation page 2 line 19: reclining head lead page 5 first line of equation (2) has lots of errors. Additional Questions: 1. Is the topic appropriate for publication in this transaction?: Yes 2. Is the topic important to colleagues working in the field?: Yes Explain: 3. How would you rate the technical novelty of the paper?: Somewhat Novel 4. How would you rate the English usage? : Satisfactory 6. Rate the references: Satisfactory null: Reviewer: 3 Recommendation: AQ - Publish In Minor, Required Changes Comments: I feel that the authors have addressed the significant issues I had with the first submission of this paper, and I commend them on working with a challenging data set instead of just stitched KTH. I suggest a few additional (minor) changes prior to publication. 1. There are some misspellings and grammatical mistakes. I recommend that another pass at copy editing is made. 2. When comparing the improvement between methods, take care between stating something is X percent better versus X percentage points better. In your manuscript you state the former when meaning the latter. 3. In Section IIA, please change the following sentence to increase clarity of method: "...we add dealing with space irregularity by partitioning the area of the frame depicting the actor..." to "...we add dealing with space irregularity by partitioning the bounding box containing the actor... 4. For the CMU-MMAC data, was there a reason for the fixed subject partitioning of the video samples? Would it be better to perform an N-fold cross validation instead? 5. In Table II, I would prefer to see the results based on repeated trials (over N-fold cross validation) with an explicit measure of the statistical significance between HMM-MIO and Bag-of-Features methods. Additional Questions: 1. Is the topic appropriate for publication in this transaction?: Yes 2. Is the topic important to colleagues working in the field?: Moderately So Explain: 3. How would you rate the technical novelty of the paper?: Somewhat Novel 4. How would you rate the English usage? : Needs improvement 6. Rate the references: Satisfactory null: ----------------------------------------------------------- http://mc.manuscriptcentral.com/spl-ieee ----------------------------------------------------------- IEEE SPS Homepage http://www.signalprocessingsociety.org/ ----------------------------------------------------------- SPS Publication Information http://www.ieee.org/organizations/society/sp/pub.html -----------------------------------------------------------