php - Regex for words connected by hyphen and underscore while keeping punctuation -


i have been reading, searching , trialling different ways write regex such p{l}, [a-z] , \w can't seem results after.

problem

i have array made of full sentences punctuation, parsing through array using following pre_match, works in keeping words , punctuation.

preg_match_all('/(\w+|[.;?!,:])/', $match, $matches)

however, have words these:

  • word-another-word
  • more_words_like_these

and able retain integrity of these words (connected) current preg_match breaks them down individual words.

what tried

preg_match_all('/(p{l}-p{l}+|[.;?!,:])/', $match, $matches) 

and;

preg_match_all('/((?i)^[\p{l}0-9_-]+|[.;?!,:])/', $match, $matches) 

that found here

but cannot achieve desired outcome:

array ( [0] a, [1] word, [2] like_this, [3] connected, [4] ; ,[5] with-relevant-punctuation) 

ideally able account special characters of these words have accents

just insert hyphen character class. note hyphen needs appear @ beginning or end of set of characters. otherwise it'll considered range symbol.

(\w+|[-.;?!,:]) 

regular expression visualization

examples

live demo

https://regex101.com/r/yi3tm4/2

sample text

however, have words these:  word-another-word more_words_like_these  , able retain integrity of these words (connected) current preg_match breaks them down individual words. 

sample matches

the other words captured before, words hyphens captured

omitted match 1-9 brevity   match 10 1.  [39-56] `word-another-word`  match 11 1.  [57-78] `more_words_like_these`  omitted match 12+ brevity  

explanation

node                     explanation ----------------------------------------------------------------------   (                        group , capture \1: ----------------------------------------------------------------------     \w+                      word characters (a-z, a-z, 0-9, _) (1 or                              more times (matching amount                              possible)) ----------------------------------------------------------------------    |                        or ----------------------------------------------------------------------     [-.;?!,:]                character of: '-', '.', ';', '?',                              '!', ',', ':' ----------------------------------------------------------------------   )                        end of \1 ---------------------------------------------------------------------- 

Comments

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -