Discuss Scratch

EncloTester
Scratcher
15 posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

Title says it.

Last edited by EncloTester (Aug. 29, 2016 21:56:33)

MathWizz
Scratcher
100+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

https://github.com/bits/UTF-8-Unicode-Test-Documents

running Chromium 42.0.2311.90 with Flash Player 15.0.0.189 on Arch Linux 3.19.5-1-ck
MathWizzJsScratch && sb.js & Amber (coming soon! maybe)
jokebookservice1
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

Alternativey, you could write a script to do it for you in another programming language, that way you could learn more
EncloCreations
Scratcher
500+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

yeah but on python, it prints only the ascii characters (string.printable)

Last edited by EncloCreations (Aug. 30, 2016 14:25:46)

jokebookservice1
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

EncloCreations wrote:

yeah but on python, it prints only the ascii characters (string.printable)
I don't think so:
print (chr (1000))
Works as expected. Iterating from 0 to the last Unicode code-point and appending chr(iterator) to your out file should do the trick
BookOwl
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

This script should work:
OUT_NAME = "file.txt" # Change this to whatever filename you want
with open(OUT_NAME, "w", encoding="utf8") as f:
    for i in range(0x110000):
        f.write(chr(i))
        f.write("\n") # Put each char on it's own line

who needs signatures
kittyhacker101
Scratcher
100+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

You can find Unicode characters at http://unicode-table.com/en/

A Gamer / Computer Programmer with 6 cats and a server room.

● Website : https://kittyhacker101.tk/
Firedrake969
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

$ python -c "'\n'.join([chr(i) for i in range(0x110000)])" > utf8.txt

probably would work, haven't tried it

'17 rickoid

bf97b44a7fbd33db070f6ade2b7dc549
WooHooBoy
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

Hey, I bet this will generate the RTL character and mess up Notepad's viewing, that would be ‮pretty neat

Last edited by WooHooBoy (Sept. 1, 2016 01:00:59)


considered harmful
Firedrake969
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

WooHooBoy wrote:

Hey, I bet this will generate the RTL character and mess up Notepad's viewing, that would be ‮pretty neat
xD

Last edited by Firedrake969 (Sept. 1, 2016 01:01:55)


'17 rickoid

bf97b44a7fbd33db070f6ade2b7dc549
lugga
Scratcher
500+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

BookOwl wrote:

This script should work:
OUT_NAME = "file.txt" # Change this to whatever filename you want
with open(OUT_NAME, "w", encoding="utf8") as f:
    for i in range(0x110000):
        f.write(chr(i))
        f.write("\n") # Put each char on it's own line
Traceback (most recent call last):
File "everyutf8char.py", line 4, in <module>
f.write(chr(i))
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

It doesn't seem to work
lugga
Scratcher
500+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

WooHooBoy wrote:

Hey, I bet this will generate the RTL character and mess up Notepad's viewing, that would be ‮pretty neat
What RTL character?
Right to left?
WooHooBoy
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

lugga wrote:

WooHooBoy wrote:

Hey, I bet this will generate the RTL character and mess up Notepad's viewing, that would be ‮pretty neat
What RTL character?
Right to left?
Yep

considered harmful
jokebookservice1
Scratcher
1000+ posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

lugga wrote:

WooHooBoy wrote:

Hey, I bet this will generate the RTL character and mess up Notepad's viewing, that would be ‮pretty neat
What RTL character?
Right to left?
Yes, a control character that tells the text viewer to display the text ahead as right-to-left. It messes stuff up because of that, and is pretty dangerous in file names
exe.rcise.music.mp4
is actually
4pm.cisum.esicr.exe
Which is really dangeous.

lugga wrote:

BookOwl wrote:

This script should work:
OUT_NAME = "file.txt" # Change this to whatever filename you want
with open(OUT_NAME, "w", encoding="utf8") as f:
    for i in range(0x110000):
        f.write(chr(i))
        f.write("\n") # Put each char on it's own line
Traceback (most recent call last):
File "everyutf8char.py", line 4, in <module>
f.write(chr(i))
UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed

It doesn't seem to work
Based on a quick scan from Wikipedia, some characters are supposed to be used in pairs.
EncloTester
Scratcher
15 posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

Bookowls method returns a encode error, as stated above.
EncloTester
Scratcher
15 posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

OUT_NAME = “file.txt” # Change this to whatever filename you want
with open(OUT_NAME, “w”, encoding=“utf8”) as f:
for i in range(0x110000):
try:
f.write(chr(i))
f.write(“\n”) # Put each char on it's own line
Try doesnt work here either :P
EncloTester
Scratcher
15 posts

Does anyone know where I can find a .txt file with a list of every utf-8 character

OUT_NAME = "file.txt" # Change this to whatever filename you want
with open(OUT_NAME, "w", encoding="utf8") as f:
for i in range(0x110000):
try:
f.write(chr(i))
f.write("\n") # Put each char on it's own line
except UnicodeEncodeError:
pass
finally:
print('Written' + str(i))

That works… kinda…

Last edited by EncloTester (Sept. 3, 2016 05:07:37)

Powered by DjangoBB