bulk rename (or correctly display) files with special characters

RobbieV

I have a bunch of directories and subdirectories that contain files with special characters, like this file:

robbie@phil:~$ ls test�sktest.txt 
test?sktest.txt

Find reveals an escape sequence:

robbie@phil:~$ find test�sktest.txt -ls 
424512 4000 -rwxr--r-x   1 robbie   robbie    4091743 Jan 26 00:34 test\323sktest.txt

The only reason I can even type their names on the console is because of tab completion. This also means I can rename them manually (and strip the special character).

I've set LC_ALL to UTF-8, which does not seem to help (also not on a new shell):

robbie@phil:~$ echo $LC_ALL
en_US.UTF-8

I'm connecting to the machine using ssh from my mac. It's an Ubuntu install:

robbie@phil:~$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=7.10
DISTRIB_CODENAME=gutsy
DISTRIB_DESCRIPTION="Ubuntu 7.10"

Shell is Bash, TERM is set to xterm-color.

These files have been there for quite a while, and they have not been created using that install of Ubuntu. So I don't know what the system encoding settings used to be.

I've tried things along the lines of:

find . -type f -ls | sed 's/[^a-zA-Z0-9]//g'

But I can't find a solution that does everything I want:

  1. Identify all files that have undisplayable characters (the above ignores way too much)
  2. For all those files in a directory tree (recursively), execute mv oldname newname
  3. Optionally, the ability to transliterate special characters such as ä to a (not required, but would be awesome)

OR

  1. Correctly display all these files (and no errors in applications when trying to open them)

I have bits and pieces, like iterating over all files and moving them, but identifying the files and formatting them correctly for the mv command seems to be the hard part.

Any extra information as to why they do not display correctly, or how to "guess" the correct encoding are also welcome. (I've tried convmv but it doesn't seem to do exactly what I want: http://j3e.de/linux/convmv/)

Gilles 'SO- stop being evil'

I guess you see this invalid character because the name contains a byte sequence that isn't valid UTF-8. File names on typical unix filesystems (including yours) are byte strings, and it's up to applications to decide on what encoding to use. Nowadays, there is a trend to use UTF-8, but it's not universal, especially in locales that could never live with plain ASCII and have been using other encodings since before UTF-8 even existed.

Try LC_CTYPE=en_US.iso88591 ls to see if the file name makes sense in ISO-8859-1 (latin-1). If it doesn't, try other locales. Note that only the LC_CTYPE locale setting matters here.

In a UTF-8 locale, the following command will show you all files whose name is not valid UTF-8:

grep-invalid-utf8 () {
  perl -l -ne '/^([\000-\177]|[\300-\337][\200-\277]|[\340-\357][\200-\277]{2}|[\360-\367][\200-\277]{3}|[\370-\373][\200-\277]{4}|[\374-\375][\200-\277]{5})*$/ or print'
}
find | grep-invalid-utf8

You can check if they make more sense in another locale with recode or iconv:

find | grep-invalid-utf8 | recode latin1..utf8
find | grep-invalid-utf8 | iconv -f latin1 -t utf8

Once you've determined that a bunch of file names are in a certain encoding (e.g. latin1), one way to rename them is

find | grep-invalid-utf8 |
rename 'BEGIN {binmode STDIN, ":encoding(latin1)"; use Encode;}
        $_=encode("utf8", $_)'

This uses the perl rename command available on Debian and Ubuntu. You can pass it -n to show what it would be doing without actually renaming the files.

Collected from the Internet

Please contact [email protected] to delete if infringement.

edited at
0

Comments

0 comments
Login to comment

Related

From Dev

bulk rename (or correctly display) files with special characters

From Dev

Rename Files & Folders With Special Characters

From Dev

rename files in folder containg special characters linux

From Dev

rename files in folder containg special characters linux

From Dev

Special characters display correctly in browser but are scrambled in phpMyAdmin

From Dev

Bulk rename files with numbering

From Dev

Rename bulk files

From Dev

Renames files with special characters using os.rename()

From Dev

How to rename all files with special characters and spaces in a directory?

From Dev

How to bulk-rename files with invalid encoding or bulk-replace invalid encoded characters?

From Dev

chrome cannot display special characters é though it shows correctly in text editor

From Dev

chrome cannot display special characters é though it shows correctly in text editor

From Dev

Bulk rename files ascending in number

From Dev

Bulk rename files - remove space

From Dev

Rename files with unreadable filenames in bulk

From Dev

Rename command special characters in filename

From Dev

Rename special characters in filenames to underscore

From Dev

Rename command special characters in filename

From Dev

Batch Script to Bulk Rename Files - Strip off Parenthesis and all Characters between

From Dev

Deleting files with special characters

From Dev

Deleting files with special characters

From Dev

Insert special characters in the database with a bulk query

From Dev

Bulk loading with special characters using Python

From Dev

How to rename files in bulk from command line?

From Dev

Bulk copy and rename files by replacing a pattern

From Dev

Rename columns with special characters in python or Pyspark dataframe

From Dev

HTML TextArea - display special characters

From Dev

Smarty: How to display "{" and "}" special characters

From Dev

Display special characters with pipe dilemeter

Related Related

  1. 1

    bulk rename (or correctly display) files with special characters

  2. 2

    Rename Files & Folders With Special Characters

  3. 3

    rename files in folder containg special characters linux

  4. 4

    rename files in folder containg special characters linux

  5. 5

    Special characters display correctly in browser but are scrambled in phpMyAdmin

  6. 6

    Bulk rename files with numbering

  7. 7

    Rename bulk files

  8. 8

    Renames files with special characters using os.rename()

  9. 9

    How to rename all files with special characters and spaces in a directory?

  10. 10

    How to bulk-rename files with invalid encoding or bulk-replace invalid encoded characters?

  11. 11

    chrome cannot display special characters é though it shows correctly in text editor

  12. 12

    chrome cannot display special characters é though it shows correctly in text editor

  13. 13

    Bulk rename files ascending in number

  14. 14

    Bulk rename files - remove space

  15. 15

    Rename files with unreadable filenames in bulk

  16. 16

    Rename command special characters in filename

  17. 17

    Rename special characters in filenames to underscore

  18. 18

    Rename command special characters in filename

  19. 19

    Batch Script to Bulk Rename Files - Strip off Parenthesis and all Characters between

  20. 20

    Deleting files with special characters

  21. 21

    Deleting files with special characters

  22. 22

    Insert special characters in the database with a bulk query

  23. 23

    Bulk loading with special characters using Python

  24. 24

    How to rename files in bulk from command line?

  25. 25

    Bulk copy and rename files by replacing a pattern

  26. 26

    Rename columns with special characters in python or Pyspark dataframe

  27. 27

    HTML TextArea - display special characters

  28. 28

    Smarty: How to display "{" and "}" special characters

  29. 29

    Display special characters with pipe dilemeter

HotTag

Archive